CommunityNews

CommunityNews

Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs

In this technical report, we tackle the challenges of training large-scale Mixture of Experts (MoE) models, focusing on overcoming cost inefficiency and resource limitations prevalent in such systems. To address these issues, we present two differently sized MoE large language models (LLMs), namely Ling-Lite and Ling-Plus (referred to as “Bailing” in Chinese, spelled Bǎilíng in Pinyin). Ling-Lite contains 16.8 billion parameters with 2.75 billion activated parameters, while Ling-Plus boasts 290 billion parameters with 28.8 billion activated parameters. Both models exhibit comparable performance to leading industry benchmarks. This report offers actionable insights to improve the efficiency and accessibility of AI development in resource-constrained settings, promoting more scalable and sustainable technologies. Specifically, to reduce training costs for large-scale MoE models, we propose innovative methods for (1) optimization of model architecture and training processes, (2) refinement of training anomaly handling, and (3) enhancement of model evaluation efficiency. Additionally, leveraging high-quality data generated from knowledge graphs, our models demonstrate superior capabilities in tool use compared to other models. Ultimately, our experimental findings demonstrate that a 300B MoE LLM can be effectively trained on lower-performance devices while achieving comparable performance to models of a similar scale, including dense and MoE models. Compared to high-performance devices, utilizing a lower-specification hardware system during the pre-training phase demonstrates significant cost savings, reducing computing costs by approximately 20%. The models can be accessed at inclusionAI (inclusionAI).

Read in full here:

Popular General Dev topics Top

AstonJ
The obligatory speed test thread :smiley: Check here: https://www.speedtest.net When complete, click on the share link and copy and pas...
New
Exadra37
Please tell us what is your preferred monitor setup for programming(not gaming) and why you have chosen it. Does your monitor have eye p...
New
AstonJ
Always interested in seeing what apps people use and how they organise their phones/home screens! Here’s mine…
New
AstonJ
Curious to know which languages and frameworks you’re all thinking about learning next :upside_down_face: Perhaps if there’s enough peop...
New
AstonJ
We have a thread about the keyboards we have, but what about nice keyboards we come across that we want? If you have seen any that look n...
New
mafinar
I always start with excitement and then get busy on 9/10th day. This year, like the year before this, and the year before that, I intend ...
New
New
foxtrottwist
A few weeks ago I started using Warp a terminal written in rust. Though in it’s current state of development there are a few caveats (tab...
New
AstonJ
This might be my next keyboard (the down arrow on my Apple Magic Keyboard has stopped working :icon_rolleyes:) https://cdn.shopify.com/s...
New
New

Other popular topics Top

axelson
I’ve been really enjoying obsidian.md: It is very snappy (even though it is based on Electron). I love that it is all local by defaul...
New
AstonJ
Just done a fresh install of macOS Big Sur and on installing Erlang I am getting: asdf install erlang 23.1.2 Configure failed. checking ...
New
Exadra37
I am asking for any distro that only has the bare-bones to be able to get a shell in the server and then just install the packages as we ...
New
PragmaticBookshelf
Learn different ways of writing concurrent code in Elixir and increase your application's performance, without sacrificing scalability or...
New
wmnnd
Here’s the story how one of the world’s first production deployments of LiveView came to be - and how trying to improve it almost caused ...
New
PragmaticBookshelf
Author Spotlight James Stanier @jstanier James Stanier, author of Effective Remote Work , discusses how to rethink the office as we e...
New
First poster: bot
Large Language Models like ChatGPT say The Darnedest Things. The Errors They MakeWhy We Need to Document Them, and What We Have Decided ...
New
New
AstonJ
If you’re getting errors like this: psql: error: connection to server on socket “/tmp/.s.PGSQL.5432” failed: No such file or directory ...
New
AstonJ
This is a very quick guide, you just need to: Download LM Studio: https://lmstudio.ai/ Click on search Type DeepSeek, then select the o...
New