CommunityNews

CommunityNews

Deepseek - starting this week we'll open-source 5 repos

We’re a tiny team @deepseek-ai pushing our limits in AGI exploration.

Starting this week , Feb 24, 2025 we’ll open-source 5 repos – one daily drop – not because we’ve made grand claims, but simply as developers sharing our small-but-sincere progress with full transparency.

These are humble building blocks of our online service: documented, deployed and battle-tested in production. No vaporware, just sincere code that moved our tiny yet ambitious dream forward.

Why? Because every line shared becomes collective momentum that accelerates the journey. Daily unlocks begin soon. No ivory towers - just pure garage-energy and community-driven innovation :wrench:

Stay tuned – let’s geek out in the open together.

DeepSeek-Open-Infra

Hello, DeepSeek Open Infra!

202502 Open-Source Week

We’re a tiny team @deepseek-ai pushing our limits in AGI exploration.

Starting this week , Feb 24, 2025 we’ll open-source 5 repos – one daily drop – not because we’ve made grand claims,
but simply as developers sharing our small-but-sincere progress with full transparency.

These are humble building blocks of our online service: documented, deployed and battle-tested in production.
No vaporware, just sincere code that moved our tiny yet ambitious dream forward.

Why? Because every line shared becomes collective momentum that accelerates the journey.
Daily unlocks begin soon. No ivory towers - just pure garage-energy and community-driven innovation :wrench:

Stay tuned – let’s geek out in the open together.

Day 1 - FlashMLA

Efficient MLA Decoding Kernel for Hopper GPUs
Optimized for variable-length sequences, battle-tested in production

:link: FlashMLA GitHub Repo
:white_check_mark: BF16 support
:white_check_mark: Paged KV cache (block size 64)
:zap: Performance: 3000 GB/s memory-bound | BF16 580 TFLOPS compute-bound on H800

Day 2 - DeepEP

Excited to introduce DeepEP - the first open-source EP communication library for MoE model training and inference.

:link: DeepEP GitHub Repo
:white_check_mark: Efficient and optimized all-to-all communication
:white_check_mark: Both intranode and internode support with NVLink and RDMA
:white_check_mark: High-throughput kernels for training and inference prefilling
:white_check_mark: Low-latency kernels for inference decoding
:white_check_mark: Native FP8 dispatch support
:white_check_mark: Flexible GPU resource control for computation-communication overlapping

Day 3 - DeepGEMM

Introducing DeepGEMM - an FP8 GEMM library that supports both dense and MoE GEMMs, powering V3/R1 training and inference.

:link: DeepGEMM GitHub Repo
:zap: Up to 1350+ FP8 TFLOPS on Hopper GPUs
:white_check_mark: No heavy dependency, as clean as a tutorial
:white_check_mark: Fully Just-In-Time compiled
:white_check_mark: Core logic at ~300 lines - yet outperforms expert-tuned kernels across most matrix sizes
:white_check_mark: Supports dense layout and two MoE layouts

Day 4 - Optimized Parallelism Strategies

:white_check_mark: DualPipe - a bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.
:link: GitHub Repo

:white_check_mark: EPLB - an expert-parallel load balancer for V3/R1.
:link: GitHub Repo

:bar_chart: Analyze computation-communication overlap in V3/R1.
:link: GitHub Repo

Day 5 - 3FS, Thruster for All DeepSeek Data Access

Fire-Flyer File System (3FS) - a parallel file system that utilizes the full bandwidth of modern SSDs and RDMA networks.

:zap: 6.6 TiB/s aggregate read throughput in a 180-node cluster
:zap: 3.66 TiB/min throughput on GraySort benchmark in a 25-node cluster
:zap: 40+ GiB/s peak throughput per client node for KVCache lookup
:dna: Disaggregated architecture with strong consistency semantics
:white_check_mark: Training data preprocessing, dataset loading, checkpoint saving/reloading, embedding vector search & KVCache lookups for inference in V3/R1

:inbox_tray: 3FS → GitHub - deepseek-ai/3FS: A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
:fountain: Smallpond - data processing framework on 3FS → GitHub - deepseek-ai/smallpond: A lightweight data processing framework built on DuckDB and 3FS.

2024 AI Infrastructure Paper (SC24)

Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning

:page_facing_up: Paper Link
:page_facing_up: Arxiv Paper Link

Read in full here:

This thread was posted by one of our members via one of our news source trackers.

Where Next?

Popular General Dev topics Top

First poster: dwaynebradley
Maybe it’s just my experience, but Object-Oriented Programming seems like a default, most common paradigm of software engineering. The on...
New
First poster: bot
SPWN is a programming language that compiles to Geometry Dash levels. What that means is that you can create levels by using not only the...
New
First poster: dpritchett
It’s not what programming languages do, it’s what they shepherd you to. How many of you have listened, read or taken part in a discussio...
New
First poster: bot
It has some interesting features: It’s entirely wireless (the left half speaks Bluetooth to the right half, and the right half speaks B...
New
First poster: dyowee
Everyone seems to be striving for ‘clean’ code at the moment. You can’t read a blog post without the author telling you how clean their a...
New
First poster: mindriot
LG 28-inch 16:18 DualUp Monitor with Ergo Stand and USB Type-C™ (28MQ780-B) | LG USA. Shop LG 28MQ780-B on the official LG.com website ...
New
First poster: bot
Raspberry Pi security alarm — the basics. In November last year — I started building a DIY security alarm system, using a Raspberry Pi a...
New
First poster: DevotionGeo
To avoid being replaced by LLMs, do what they can’t. What LLM’s can’t do yet
New
First poster: AstonJ
Truly independent web browser. Contribute to LadybirdBrowser/ladybird development by creating an account on GitHub.
New
CommunityNews
Rendering Action Mailer emails with Phlex components and layouts: Clean, Composable, and Completely Ruby - Blog post by Camillo Visini
New

Other popular topics Top

Devtalk
Hello Devtalk World! Please let us know a little about who you are and where you’re from :nerd_face:
New
PragmaticBookshelf
Stop developing web apps with yesterday’s tools. Today, developers are increasingly adopting Clojure as a web-development platform. See f...
New
PragmaticBookshelf
Machine learning can be intimidating, with its reliance on math and algorithms that most programmers don't encounter in their regular wor...
New
Exadra37
I am thinking in building or buy a desktop computer for programing, both professionally and on my free time, and my choice of OS is Linux...
New
AstonJ
There’s a whole world of custom keycaps out there that I didn’t know existed! Check out all of our Keycaps threads here: https://forum....
New
PragmaticBookshelf
Build highly interactive applications without ever leaving Elixir, the way the experts do. Let LiveView take care of performance, scalabi...
New
PragmaticBookshelf
Create efficient, elegant software tests in pytest, Python's most powerful testing framework. Brian Okken @brianokken Edited by Kat...
New
PragmaticBookshelf
Use WebRTC to build web applications that stream media and data in real time directly from one user to another, all in the browser. ...
New
PragmaticBookshelf
Author Spotlight Rebecca Skinner @RebeccaSkinner Welcome to our latest author spotlight, where we sit down with Rebecca Skinner, auth...
New
hilfordjames
There appears to have been an update that has changed the terminology for what has previously been known as the Taskbar Overflow - this h...
New