CommunityNews

DeepSeek-V3 Technical Report

We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance. We pre-train DeepSeek-V3 on 14.8 trillion diverse and high-quality tokens, followed by Supervised Fine-Tuning and Reinforcement Learning stages to fully harness its capabilities. Comprehensive evaluations reveal that DeepSeek-V3 outperforms other open-source models and achieves performance comparable to leading closed-source models. Despite its excellent performance, DeepSeek-V3 requires only 2.788M H800 GPU hours for its full training. In addition, its training process is remarkably stable. Throughout the entire training process, we did not experience any irrecoverable loss spikes or perform any rollbacks. The model checkpoints are available at GitHub - deepseek-ai/DeepSeek-V3.

Read in full here:

View thread on forum

/deepseek

0 891 0

2025-03-27 14:46:32 UTC

Where Next?

View thread on forum

deepseek

Home General Dev>In The News

/deepseek

0 891 0

Last post

Popular General Dev topics

General Dev>In The News

SkiftOS: Simple, handmade operating system for the x86 platform

skiftOS is a simple, handmade operating system for the x86 platform, aiming for clean and pretty APIs while keeping the spirit of UNIX. s...

github.com

#skiftos

2 1996 3

2021-01-28 14:47:06 UTC

New

General Dev>In The News

Yatima: A programming language for the decentralized web

In one sense, the Truth Mines were just another indexscape. Hundreds of thousands of specialized selections of the library’s contents wer...

github.com

#programming #web #yatima #newuages

13 1119 5

2021-06-25 11:28:27 UTC

New

General Dev>In The News

How to design a good API and why it matters (2006)

ABSTRACT In lieu of a traditional , I’ve tried to distill the essence of the talk into a collection of maxims: All programmers are API ...

dl.acm.org

#api #design

2 1407 1

2022-10-07 10:11:24 UTC

New

General Dev>In The News

I made a home security system, powered by a Raspberry Pi 3

Raspberry Pi security alarm — the basics. In November last year — I started building a DIY security alarm system, using a Raspberry Pi a...

blog.cavelab.dev

/security

0 2261 0

2023-01-01 15:50:18 UTC

New

General Dev>In The News

DreamBerd is a perfect programming language

GitHub - TodePond/DreamBerd: perfect programming language. perfect programming language. Contribute to TodePond/DreamBerd development by...

github.com

#programming

3 1734 2

2023-06-10 08:05:04 UTC

New

General Dev>In The News

Fintech engineering mistakes

9 fintech engineering mistakes. Read this list unless you want to build a money dissappearing system

startupwin.kelsus.com

0 1776 0

2023-06-28 15:09:41 UTC

New

General Dev>In The News

Dark mode is not as good for your eyes as you believe (2019)

Dark mode isn’t as good for your eyes as you believe. The shadowy display mode has leagues of fans claiming it helps reduce eye strain, ...

wired.com

3 1587 2

2024-07-09 23:57:17 UTC

New

General Dev>In The News

Everything Is Chrome

The power is in Google’s hands.

vale.rocks

#chrome

2 701 1

2025-03-11 21:52:03 UTC

New

General Dev>In The News

Llama.cpp AI Performance with the GeForce RTX 5090 Review

In beginning the NVIDIA Blackwell Linux testing with the GeForce RTX 5090 compute performance, besides all the CUDA/OpenCL/OptiX benchmar...

phoronix.com

#performance #cpp #llama #geforce

0 1381 1

2025-03-21 12:10:45 UTC

New

General Dev>In The News

TrueSkill 2: An improved Bayesian skill rating system - Microsoft Research

Online multiplayer games, such as Gears of War and Halo, use skill-based matchmaking to give players fair and enjoyable matches. They dep...

microsoft.com

#microsoft

0 1031 0

2025-04-14 06:01:41 UTC

New

Other popular topics

Backend>Learning Resources

Testing Elixir

Write Elixir tests that you can be proud of. Dive into Elixir’s test philosophy and gain mastery over the terminology and concepts that u...

pragprog.com

#pragprog /elixir #published-book /book-testing-elixir

33 5004 8

2021-01-05 06:17:50 UTC

New

General Dev>Hardware

What monitor(s) do you have for programming?

Please tell us what is your preferred monitor setup for programming(not gaming) and why you have chosen it. Does your monitor have eye p...

#monitors #coding #programming #development

227 11362 88

2022-02-01 12:02:08 UTC

New

Backend>Learning Resources

Python Testing with pytest, Second Edition

Create efficient, elegant software tests in pytest, Python's most powerful testing framework. Brian Okken @brianokken Edited by Kat...

pragprog.com

#pragprog /python #published-book /book-python-testing-with-pytest-second-edition

16 7461 4

2021-06-25 16:57:39 UTC

New

General Dev>In The News

Zig now has built-in HTTP server and client in std

zig/http.zig at 7cf2cbb33ef34c1d211135f56d30fe23b6cacd42 · ziglang/zig. General-purpose programming language and toolchain for maintaini...

github.com

/zig #http

0 5624 0

2023-05-19 00:35:41 UTC

New

General Dev>Learning Resources

A Common-Sense Guide to Data Structures and Algorithms in Python, Volume 1

Big O Notation can make your code faster by orders of magnitude. Get the hands-on info you need to master data structures and algorithms ...

pragprog.com

#pragprog /python #published-book /book-a-common-sense-guide-to-data-structures-and-algorithms-in-python-volume-1

24 5988 11

2024-01-29 15:52:29 UTC

New

General Dev>In The News

X can’t stop spread of explicit, fake AI Taylor Swift images

Will Swifties’ war on AI fakes spark a deepfake porn reckoning?

arstechnica.com

/swift

0 8379 0

2024-01-26 05:47:12 UTC

New

General Dev>In The News

Jan: An open source alternative to ChatGPT that runs on the desktop

Jan | Rethink the Computer. Jan turns your computer into an AI machine by running LLMs locally on your computer. It’s a privacy-focus, l...

jan.ai

#desktop #chatgpt

4 5652 4

2024-03-29 08:42:30 UTC

New

AI>Chat

How to: Run DeepSeek on Mac, Windows, and Linux!

This is a very quick guide, you just need to: Download LM Studio: https://lmstudio.ai/ Click on search Type DeepSeek, then select the o...

#macs /deepseek #guides #lm-studio

14 9328 10

2025-06-19 15:11:16 UTC

New

Backend>Official News

Node.js v22.14.0 released!

Node.js v22.14.0 has been released. Link: Release 2025-02-11, Version 22.14.0 'Jod' (LTS), @aduh95 · nodejs/node · GitHub

github.com

/nodejs #official-news

0 4251 0

2025-02-11 15:30:14 UTC

New

Backend>Learning Resources

Risk-First Software Development, Second Edition

As digital systems increasingly run the world, mastery of the recurring patterns of software development risk is the key to fast and effe...

pragprog.com

#pragprog #published-book /book-risk-first-software-development-second-edition

12 4217 8

2025-09-19 12:27:58 UTC

New

Latest in DeepSeek

Reasonix — DeepSeek-native AI coding agent

AI>In The News

DeepSeek V4—almost on the frontier, a fraction of the price

AI>In The News

DeepSeek V4 Preview Release

AI>In The News

DeepSeek V4 is live in preview — should your team switch?

AI>Chat

DeepSeek v4

AI>In The News

DeepSeek V4 dropped today — $0.28/M output on 1M context, running on Huawei Ascend. Are you routing workloads to it?

AI>Chat

My mom and Dr. DeepSeek

AI>In The News

vLLM Large Scale Serving: DeepSeek @ 2.2k tok/s/H200 with Wide-EP

AI>In The News

China’s DeepSeek Uses Banned Nvidia Chips for AI Model, Report Says

AI>In The News

DeepSeek-v3.2: Pushing the frontier of open large language models

AI>In The News

DeepSeek Portal ❯

General Dev>In The News

Introducing Precursor: detecting agentic behavior with continuous client-side signals

General Dev>In The News

Dell sued by Finnish company over $70m price increase for data centre servers

General Dev>In The News

Salience-Driven Development

General Dev>In The News

The death of open channels

General Dev>In The News

Are you telling me a readonly property is wrecking my performance?

General Dev>In The News

Networking and the Internet, from First Principles · Faza

General Dev>In The News

Google Search lets creators know more about their reach

General Dev>In The News

Please don't discontinue Gemini 2.5 Flash

General Dev>In The News

Computation as a Universal and Fundamental Concept — Ergo

General Dev>In The News

The app that deleted itself

General Dev>In The News

General Dev In The News ❯

Latest on Devtalk

Red Hat will support your RHEL forever now - for a price

Linux>In The News

Introducing Precursor: detecting agentic behavior with continuous client-side signals

General Dev>In The News

Exploring LiveView 1.2

Backend>Learning Resources

Kotlin v2.4.10 released!

Backend>Official News

Fable 5.8.1 released!

Frontend>Official News

Dell sued by Finnish company over $70m price increase for data centre servers

General Dev>In The News

Salience-Driven Development

General Dev>In The News

The death of open channels

General Dev>In The News

Zig Creator Calls Spade a Spade, Anthropic Blows Smoke

AI>In The News

Are you telling me a readonly property is wrecking my performance?

General Dev>In The News

Old and new apps, via modern coding agents

AI>In The News

V 0.5.2 released!

Backend>Official News

AI 2040 and the Cult of Intelligence

AI>In The News

Are Scientists Sacrificing Originality for Speed With the Use of AI?

AI>In The News

AI Can't Recreate Thrust (But It Can Help You Understand It)

AI>In The News

Devtalk ❯

We ❤️ helpful members!

We reward our most helpful members via our MOTM scheme - by giving away a whopping 25 books per year!

Sub Categories:

We're in Beta

About us Mission Statement See our Roadmap

DeepSeek-V3 Technical Report

CommunityNews

DeepSeek-V3 Technical Report

Where Next?

Popular General Dev topics

SkiftOS: Simple, handmade operating system for the x86 platform

Yatima: A programming language for the decentralized web

How to design a good API and why it matters (2006)

I made a home security system, powered by a Raspberry Pi 3

DreamBerd is a perfect programming language

Fintech engineering mistakes

Dark mode is not as good for your eyes as you believe (2019)

Everything Is Chrome

Llama.cpp AI Performance with the GeForce RTX 5090 Review

TrueSkill 2: An improved Bayesian skill rating system - Microsoft Research

Other popular topics

Testing Elixir

What monitor(s) do you have for programming?

Python Testing with pytest, Second Edition

Zig now has built-in HTTP server and client in std

A Common-Sense Guide to Data Structures and Algorithms in Python, Volume 1

X can’t stop spread of explicit, fake AI Taylor Swift images

Jan: An open source alternative to ChatGPT that runs on the desktop

How to: Run DeepSeek on Mac, Windows, and Linux!

Node.js v22.14.0 released!

Risk-First Software Development, Second Edition

Sponsor Spotlight

Latest in DeepSeek

General Dev>In The News

Latest on Devtalk

We ❤️ helpful members!

Devtalk Sponsors

Categories:

Sub Categories:

Popular Portals

Devtalk Sponsors

We're in Beta