CommunityNews

CommunityNews

Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs

In this technical report, we tackle the challenges of training large-scale Mixture of Experts (MoE) models, focusing on overcoming cost inefficiency and resource limitations prevalent in such systems. To address these issues, we present two differently sized MoE large language models (LLMs), namely Ling-Lite and Ling-Plus (referred to as “Bailing” in Chinese, spelled Bǎilíng in Pinyin). Ling-Lite contains 16.8 billion parameters with 2.75 billion activated parameters, while Ling-Plus boasts 290 billion parameters with 28.8 billion activated parameters. Both models exhibit comparable performance to leading industry benchmarks. This report offers actionable insights to improve the efficiency and accessibility of AI development in resource-constrained settings, promoting more scalable and sustainable technologies. Specifically, to reduce training costs for large-scale MoE models, we propose innovative methods for (1) optimization of model architecture and training processes, (2) refinement of training anomaly handling, and (3) enhancement of model evaluation efficiency. Additionally, leveraging high-quality data generated from knowledge graphs, our models demonstrate superior capabilities in tool use compared to other models. Ultimately, our experimental findings demonstrate that a 300B MoE LLM can be effectively trained on lower-performance devices while achieving comparable performance to models of a similar scale, including dense and MoE models. Compared to high-performance devices, utilizing a lower-specification hardware system during the pre-training phase demonstrates significant cost savings, reducing computing costs by approximately 20%. The models can be accessed at inclusionAI (inclusionAI).

Read in full here:

Popular General Dev topics Top

First poster: dimitarvp
On Wednesday last week, Google’s Fiona Cicconi wrote to company employees. She announced that Google was bringing forward its timetable ...
New
First poster: andrea
What’s the real cost of interruptions? I illustrate all the context developers keep in their head and how it starts to decay immediately ...
New
First poster: bot
openai-python/chatml.md at main · openai/openai-python. The OpenAI Python library provides convenient access to the OpenAI API from appl...
New
First poster: bot
zig/http.zig at 7cf2cbb33ef34c1d211135f56d30fe23b6cacd42 · ziglang/zig. General-purpose programming language and toolchain for maintaini...
New
First poster: joeb
50 Shades of Go: Traps, Gotchas, and Common Mistakes for New Golang Devs. Go is a simple and fun language, but, like any other language,...
/go
New
First poster: KnowledgeIsPower
Building a Slack/Discord alternative with Tauri/Rust linen <span class="hashtag-icon-placeholder"></span>blog. Introduction My name is K...
New
CommunityNews
Apple Patents Suggest Future AirPods Could Monitor Biosignals &amp; Brain Activity - AppleMagazine. The US Patent &amp; Trademark Office...
New
First poster: dyowee
A Go package for building Progressive Web Apps. A package for building progressive web apps (PWA) with the Go programming language (Gola...
New
New
First poster: adamaiken89
Why Ruby on Rails still matters. An old tool endures in a Next.js world
New

Other popular topics Top

Devtalk
Reading something? Working on something? Planning something? Changing jobs even!? If you’re up for sharing, please let us know what you’...
1021 17087 374
New
wolf4earth
@AstonJ prompted me to open this topic after I mentioned in the lockdown thread how I started to do a lot more for my fitness. https://f...
New
AstonJ
Or looking forward to? :nerd_face:
New
siddhant3030
I’m thinking of buying a monitor that I can rotate to use as a vertical monitor? Also, I want to know if someone is using it for program...
New
Exadra37
I am asking for any distro that only has the bare-bones to be able to get a shell in the server and then just install the packages as we ...
New
foxtrottwist
A few weeks ago I started using Warp a terminal written in rust. Though in it’s current state of development there are a few caveats (tab...
New
PragmaticBookshelf
Author Spotlight James Stanier @jstanier James Stanier, author of Effective Remote Work , discusses how to rethink the office as we e...
New
PragmaticBookshelf
Author Spotlight Rebecca Skinner @RebeccaSkinner Welcome to our latest author spotlight, where we sit down with Rebecca Skinner, auth...
New
New
CommunityNews
A Brief Review of the Minisforum V3 AMD Tablet. Update: I have created an awesome-minisforum-v3 GitHub repository to list information fo...
New