CommunityNews

CommunityNews

We’re a tiny team @deepseek-ai pushing our limits in AGI exploration.

Starting this week , Feb 24, 2025 we’ll open-source 5 repos – one daily drop – not because we’ve made grand claims, but simply as developers sharing our small-but-sincere progress with full transparency.

These are humble building blocks of our online service: documented, deployed and battle-tested in production. No vaporware, just sincere code that moved our tiny yet ambitious dream forward.

Why? Because every line shared becomes collective momentum that accelerates the journey. Daily unlocks begin soon. No ivory towers - just pure garage-energy and community-driven innovation :wrench:

Stay tuned – let’s geek out in the open together.

DeepSeek-Open-Infra

Hello, DeepSeek Open Infra!

202502 Open-Source Week

We’re a tiny team @deepseek-ai pushing our limits in AGI exploration.

Starting this week , Feb 24, 2025 we’ll open-source 5 repos – one daily drop – not because we’ve made grand claims,
but simply as developers sharing our small-but-sincere progress with full transparency.

These are humble building blocks of our online service: documented, deployed and battle-tested in production.
No vaporware, just sincere code that moved our tiny yet ambitious dream forward.

Why? Because every line shared becomes collective momentum that accelerates the journey.
Daily unlocks begin soon. No ivory towers - just pure garage-energy and community-driven innovation :wrench:

Stay tuned – let’s geek out in the open together.

Day 1 - FlashMLA

Efficient MLA Decoding Kernel for Hopper GPUs
Optimized for variable-length sequences, battle-tested in production

:link: FlashMLA GitHub Repo
:white_check_mark: BF16 support
:white_check_mark: Paged KV cache (block size 64)
:zap: Performance: 3000 GB/s memory-bound | BF16 580 TFLOPS compute-bound on H800

Day 2 - DeepEP

Excited to introduce DeepEP - the first open-source EP communication library for MoE model training and inference.

:link: DeepEP GitHub Repo
:white_check_mark: Efficient and optimized all-to-all communication
:white_check_mark: Both intranode and internode support with NVLink and RDMA
:white_check_mark: High-throughput kernels for training and inference prefilling
:white_check_mark: Low-latency kernels for inference decoding
:white_check_mark: Native FP8 dispatch support
:white_check_mark: Flexible GPU resource control for computation-communication overlapping

Day 3 - DeepGEMM

Introducing DeepGEMM - an FP8 GEMM library that supports both dense and MoE GEMMs, powering V3/R1 training and inference.

:link: DeepGEMM GitHub Repo
:zap: Up to 1350+ FP8 TFLOPS on Hopper GPUs
:white_check_mark: No heavy dependency, as clean as a tutorial
:white_check_mark: Fully Just-In-Time compiled
:white_check_mark: Core logic at ~300 lines - yet outperforms expert-tuned kernels across most matrix sizes
:white_check_mark: Supports dense layout and two MoE layouts

Day 4 - Optimized Parallelism Strategies

:white_check_mark: DualPipe - a bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.
:link: GitHub Repo

:white_check_mark: EPLB - an expert-parallel load balancer for V3/R1.
:link: GitHub Repo

:bar_chart: Analyze computation-communication overlap in V3/R1.
:link: GitHub Repo

Day 5 - 3FS, Thruster for All DeepSeek Data Access

Fire-Flyer File System (3FS) - a parallel file system that utilizes the full bandwidth of modern SSDs and RDMA networks.

:zap: 6.6 TiB/s aggregate read throughput in a 180-node cluster
:zap: 3.66 TiB/min throughput on GraySort benchmark in a 25-node cluster
:zap: 40+ GiB/s peak throughput per client node for KVCache lookup
:dna: Disaggregated architecture with strong consistency semantics
:white_check_mark: Training data preprocessing, dataset loading, checkpoint saving/reloading, embedding vector search & KVCache lookups for inference in V3/R1

:inbox_tray: 3FS → GitHub - deepseek-ai/3FS: A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
:fountain: Smallpond - data processing framework on 3FS → GitHub - deepseek-ai/smallpond: A lightweight data processing framework built on DuckDB and 3FS.

2024 AI Infrastructure Paper (SC24)

Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning

:page_facing_up: Paper Link
:page_facing_up: Arxiv Paper Link

Read in full here:

This thread was posted by one of our members via one of our news source trackers.

Popular General Dev topics Top

siddhant3030
I’m thinking of buying a monitor that I can rotate to use as a vertical monitor? Also, I want to know if someone is using it for program...
New
New
AstonJ
Do the test and post your score :nerd_face: :keyboard: If possible, please add info such as the keyboard you’re using, the layout (Qw...
New
ohm
I just switched jobs to tech lead with a small team of about 6 other developer. This is my first tech lead job. What do I need to know? A...
New
DevotionGeo
I installed Github Copilot (VS Code extension) and signed up for the technical preview three days ago. Yesterday I got the invitation, an...
New
AstonJ
Saw this on TikTok of all places! :lol: Anyone heard of them before? Lite:
New
AstonJ
I’ve been watching Prag Dave’s Elixir course and I noticed he uses tree: Tree is a recursive directory listing program that produces a ...
New
AstonJ
This was interesting: He’s definitely more of an Emacs fan (which is fine) and the thing I found interesting is how you wo...
New
First poster: bot
The overengineered Solution to my Pigeon Problem. TL;DR: I built a wifi-equipped water gun to shoot the pigeons on my balcony, controlle...
New
CommunityNews
Christian Mills - Testing Intel’s Arc A770 GPU for Deep Learning Pt. 2. This post covers my experience training image classification mod...
New

Other popular topics Top

malloryerik
Any thoughts on Svelte? Svelte is a radical new approach to building user interfaces. Whereas traditional frameworks like React and Vue...
New
AstonJ
Curious to know which languages and frameworks you’re all thinking about learning next :upside_down_face: Perhaps if there’s enough peop...
New
AstonJ
poll poll Be sure to check out @Dusty’s article posted here: An Introduction to Alternative Keyboard Layouts It’s one of the best write-...
New
AstonJ
I ended up cancelling my Moonlander order as I think it’s just going to be a bit too bulky for me. I think the Planck and the Preonic (o...
New
PragmaticBookshelf
Learn different ways of writing concurrent code in Elixir and increase your application's performance, without sacrificing scalability or...
New
AstonJ
Continuing the discussion from Thinking about learning Crystal, let’s discuss - I was wondering which languages don’t GC - maybe we can c...
New
AstonJ
Saw this on TikTok of all places! :lol: Anyone heard of them before? Lite:
New
AstonJ
Biggest jackpot ever apparently! :upside_down_face: I don’t (usually) gamble/play the lottery, but working on a program to predict the...
New
AstonJ
If you get Can't find emacs in your PATH when trying to install Doom Emacs on your Mac you… just… need to install Emacs first! :lol: bre...
New
DevotionGeo
I have always used antique keyboards like Cherry MX 1800 or Cherry MX 8100 and almost always have modified the switches in some way, like...
New