CommunityNews

CommunityNews

DeepSeek-V3 Technical Report

We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance. We pre-train DeepSeek-V3 on 14.8 trillion diverse and high-quality tokens, followed by Supervised Fine-Tuning and Reinforcement Learning stages to fully harness its capabilities. Comprehensive evaluations reveal that DeepSeek-V3 outperforms other open-source models and achieves performance comparable to leading closed-source models. Despite its excellent performance, DeepSeek-V3 requires only 2.788M H800 GPU hours for its full training. In addition, its training process is remarkably stable. Throughout the entire training process, we did not experience any irrecoverable loss spikes or perform any rollbacks. The model checkpoints are available at GitHub - deepseek-ai/DeepSeek-V3.

Read in full here:

Popular General Dev topics Top

Dusty
Thanks to @AstonJ for encouraging me to post this here: An introduction to alternative keyboard layouts Feel free to post your own expe...
New
dasdom
No chair. I have a standing desk. This post was split into a dedicated thread from our thread about chairs :slight_smile:
New
AstonJ
Just been adding some more portals, currently have the following languages: Apache Groovy C C# C++ Clojure CoffeeScript Crystal ...
New
New
AstonJ
Which keyboard switches do you use/like? :upside_down_face:
New
First poster: dimitarvp
Rails is not written in Ruby. I’m born and raised in Kraków, a beautiful city in Poland, maybe you’ve heard about it, maybe you’ve even ...
New
AstonJ
If you get Can't find emacs in your PATH when trying to install Doom Emacs on your Mac you… just… need to install Emacs first! :lol: bre...
New
AstonJ
This was interesting: He’s definitely more of an Emacs fan (which is fine) and the thing I found interesting is how you wo...
New
CommunityNews
9 fintech engineering mistakes. Read this list unless you want to build a money dissappearing system
New
CommunityNews
A Brief Review of the Minisforum V3 AMD Tablet. Update: I have created an awesome-minisforum-v3 GitHub repository to list information fo...
New

Other popular topics Top

Devtalk
Reading something? Working on something? Planning something? Changing jobs even!? If you’re up for sharing, please let us know what you’...
1016 16828 371
New
DevotionGeo
I know that these benchmarks might not be the exact picture of real-world scenario, but still I expect a Rust web framework performing a ...
New
brentjanderson
Bought the Moonlander mechanical keyboard. Cherry Brown MX switches. Arms and wrists have been hurting enough that it’s time I did someth...
New
AstonJ
Just done a fresh install of macOS Big Sur and on installing Erlang I am getting: asdf install erlang 23.1.2 Configure failed. checking ...
New
Rainer
Not sure if following fits exactly this thread, or if we should have a hobby thread… For many years I’m designing and building model air...
New
AstonJ
If you are experiencing Rails console using 100% CPU on your dev machine, then updating your development and test gems might fix the issu...
New
New
AstonJ
If you want a quick and easy way to block any website on your Mac using Little Snitch simply… File > New Rule: And select Deny, O...
New
First poster: bot
Large Language Models like ChatGPT say The Darnedest Things. The Errors They MakeWhy We Need to Document Them, and What We Have Decided ...
New
AstonJ
This is a very quick guide, you just need to: Download LM Studio: https://lmstudio.ai/ Click on search Type DeepSeek, then select the o...
New