CommunityNews

CommunityNews

Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs

In this technical report, we tackle the challenges of training large-scale Mixture of Experts (MoE) models, focusing on overcoming cost inefficiency and resource limitations prevalent in such systems. To address these issues, we present two differently sized MoE large language models (LLMs), namely Ling-Lite and Ling-Plus (referred to as “Bailing” in Chinese, spelled Bǎilíng in Pinyin). Ling-Lite contains 16.8 billion parameters with 2.75 billion activated parameters, while Ling-Plus boasts 290 billion parameters with 28.8 billion activated parameters. Both models exhibit comparable performance to leading industry benchmarks. This report offers actionable insights to improve the efficiency and accessibility of AI development in resource-constrained settings, promoting more scalable and sustainable technologies. Specifically, to reduce training costs for large-scale MoE models, we propose innovative methods for (1) optimization of model architecture and training processes, (2) refinement of training anomaly handling, and (3) enhancement of model evaluation efficiency. Additionally, leveraging high-quality data generated from knowledge graphs, our models demonstrate superior capabilities in tool use compared to other models. Ultimately, our experimental findings demonstrate that a 300B MoE LLM can be effectively trained on lower-performance devices while achieving comparable performance to models of a similar scale, including dense and MoE models. Compared to high-performance devices, utilizing a lower-specification hardware system during the pre-training phase demonstrates significant cost savings, reducing computing costs by approximately 20%. The models can be accessed at inclusionAI (inclusionAI).

Read in full here:

Where Next?

Popular General Dev topics Top

First poster: bot
A field guide to help you recognize achievement, spot A field guide to help you recognize achievement, spot bottlenecks, and debug your d...
New
First poster: bot
It has some interesting features: It’s entirely wireless (the left half speaks Bluetooth to the right half, and the right half speaks B...
New
First poster: bot
Rewrite it in Rust by ridiculousfish · Pull Request #9512 · fish-shell/fish-shell. (Sorry for the meme; also this is obligatory.) I thi...
New
First poster: jkdiaz
Dark mode isn’t as good for your eyes as you believe. The shadowy display mode has leagues of fans claiming it helps reduce eye strain, ...
New
First poster: DevotionGeo
To avoid being replaced by LLMs, do what they can’t. What LLM’s can’t do yet
New
CommunityNews
We’re a tiny team @deepseek-ai pushing our limits in AGI exploration. Starting this week , Feb 24, 2025 we’ll open-source 5 repos – one ...
New
First poster: alvinkatojr
Over the last decade, we’ve seen great advancements in distributed systems, but the way we program them has seen few fundamental improvem...
New
First poster: alvinkatojr
There are countless articles why developers should not focus on Frameworks too much and instead learn to understand the underlying langua...
New
New
New

Other popular topics Top

wolf4earth
@AstonJ prompted me to open this topic after I mentioned in the lockdown thread how I started to do a lot more for my fitness. https://f...
New
DevotionGeo
I know that these benchmarks might not be the exact picture of real-world scenario, but still I expect a Rust web framework performing a ...
New
AstonJ
Or looking forward to? :nerd_face:
490 12945 266
New
Exadra37
Please tell us what is your preferred monitor setup for programming(not gaming) and why you have chosen it. Does your monitor have eye p...
New
Rainer
My first contact with Erlang was about 2 years ago when I used RabbitMQ, which is written in Erlang, for my job. This made me curious and...
New
AstonJ
We have a thread about the keyboards we have, but what about nice keyboards we come across that we want? If you have seen any that look n...
New
New
AstonJ
We’ve talked about his book briefly here but it is quickly becoming obsolete - so he’s decided to create a series of 7 podcasts, the firs...
New
PragmaticBookshelf
Build efficient applications that exploit the unique benefits of a pure functional language, learning from an engineer who uses Haskell t...
New
sir.laksmana_wenk
I’m able to do the “artistic” part of game-development; character designing/modeling, music, environment modeling, etc. However, I don’t...
New