CommunityNews

CommunityNews

DeepSeek-V3 Technical Report

We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance. We pre-train DeepSeek-V3 on 14.8 trillion diverse and high-quality tokens, followed by Supervised Fine-Tuning and Reinforcement Learning stages to fully harness its capabilities. Comprehensive evaluations reveal that DeepSeek-V3 outperforms other open-source models and achieves performance comparable to leading closed-source models. Despite its excellent performance, DeepSeek-V3 requires only 2.788M H800 GPU hours for its full training. In addition, its training process is remarkably stable. Throughout the entire training process, we did not experience any irrecoverable loss spikes or perform any rollbacks. The model checkpoints are available at GitHub - deepseek-ai/DeepSeek-V3.

Read in full here:

Where Next?

Popular General Dev topics Top

First poster: mafinar
The following languages will help current and new web developers navigate the programming landscape to code web-based services and apps t...
New
First poster: dimitarvp
A career ending mistake — Bitfield Consulting. As software engineers, we’re constantly making detailed, elaborate plans for computers to...
New
CommunityNews
…or, “why make programming even harder?” Learning functional programming is an opportunity to discover a new way to represent programs, t...
New
CommunityNews
Docker on MacOS is slow and how to fix it. Thanks to the DALL·E 2, we finally have a very nice graphic representation of the feelings of...
New
First poster: bot
GitHub - lucidrains/PaLM-rlhf-pytorch: Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architectur...
New
CommunityNews
Apple Patents Suggest Future AirPods Could Monitor Biosignals & Brain Activity - AppleMagazine. The US Patent & Trademark Office...
New
First poster: AstonJ
Jan | Rethink the Computer. Jan turns your computer into an AI machine by running LLMs locally on your computer. It’s a privacy-focus, l...
New
CommunityNews
A Brief Review of the Minisforum V3 AMD Tablet. Update: I have created an awesome-minisforum-v3 GitHub repository to list information fo...
New
First poster: dyowee
olmOCR is an open-source tool for converting PDFs to text with high accuracy, preserving reading order and supporting tables, equations, ...
New
New

Other popular topics Top

AstonJ
If it’s a mechanical keyboard, which switches do you have? Would you recommend it? Why? What will your next keyboard be? Pics always w...
New
AstonJ
Curious to know which languages and frameworks you’re all thinking about learning next :upside_down_face: Perhaps if there’s enough peop...
New
AstonJ
I’ve been hearing quite a lot of comments relating to the sound of a keyboard, with one of the most desirable of these called ‘thock’, he...
New
New
AstonJ
Just done a fresh install of macOS Big Sur and on installing Erlang I am getting: asdf install erlang 23.1.2 Configure failed. checking ...
New
AstonJ
I ended up cancelling my Moonlander order as I think it’s just going to be a bit too bulky for me. I think the Planck and the Preonic (o...
New
rustkas
Intensively researching Erlang books and additional resources on it, I have found that the topic of using Regular Expressions is either c...
New
PragmaticBookshelf
Build efficient applications that exploit the unique benefits of a pure functional language, learning from an engineer who uses Haskell t...
New
New
NewsBot
Node.js v22.14.0 has been released. Link: Release 2025-02-11, Version 22.14.0 'Jod' (LTS), @aduh95 · nodejs/node · GitHub
New