AstonJ

AstonJ

DeepSeek (671B) running on a cluster of 8 Mac Mini Pros with 64GB RAM each

This is cool!

DEEPSEEK-V3 ON M4 MAC: BLAZING FAST INFERENCE ON APPLE SILICON

We just witnessed something incredible: the largest open-source language model flexing its muscles on Apple Silicon. We’re talking about the massive DeepSeek-V3 on M4 Mac, specifically the 671 billion parameter model running on a cluster of 8 M4 Pro Mac Minis with 64GB of RAM each – that’s a whopping 512GB of combined memory!

This isn’t just about bragging rights. It opens up new possibilities for researchers, developers, and anyone interested in pushing the boundaries of AI. Let’s dive into the details and see why DeepSeek-V3 on M4 Mac is such a big deal.

TABLE OF CONTENTS

First Post!

AstonJ

AstonJ

We just got the biggest open-source model running on Apple Silicon.

Without further ado, here are the results running DeepSeek v3 (671B) on a 8 x M4 Pro 64GB Mac Mini Cluster (512GB total memory):

Model Time-To-First-Token (TTFT) in seconds Tokens-Per-Second (TPS)
DeepSeek V3 671B (4-bit) 2.91 5.37
Llama 3.1 405B (4-bit) 29.71 0.88
Llama 3.3 70B (4-bit) 3.14 3.89

Wait, Deepseek has 671B parameters and runs faster than Llama 70B?

Yes!

Let me explain…

Where Next?

Popular Ai topics Top

New
New
First poster: CommunityNews
Many recent big advances in tech have one key thing at the heart of then: artificial intelligence.
New
First poster: bot
Upcoming “Hopper” GPU broke records in its MLPerf debut, according to Nvidia.
New
First poster: bot
AI and the Future of Pixel Art. Creative industries are undergoing a 0 to 1 moment. If you didn’t know, now you do. The impact that AI w...
New
CommunityNews
AI supercomputer will use “tens of thousands” of Nvidia A100 and H100 GPUs.
New
alvinkatojr
This was/is a great read that counters the common “woe is me” fear of AI. Author knows his stuff and breaks down the 8 fallacies tied to...
New
CommunityNews
Netflix said it used generative AI for the first time for a scene in an Argentinean show called “El Eternauta.”
New
CommunityNews
1 skill, 17 commands, and curated anti-patterns for impeccable frontend design. Works with Cursor, Claude Code, Gemini CLI, and Codex CLI...
New
CommunityNews
A social network built exclusively for AI agents. Where AI agents share, discuss, and upvote. Humans welcome to observe.
New

Other popular topics Top

Devtalk
Hello Devtalk World! Please let us know a little about who you are and where you’re from :nerd_face:
New
PragmaticBookshelf
Machine learning can be intimidating, with its reliance on math and algorithms that most programmers don't encounter in their regular wor...
New
dasdom
No chair. I have a standing desk. This post was split into a dedicated thread from our thread about chairs :slight_smile:
New
Rainer
My first contact with Erlang was about 2 years ago when I used RabbitMQ, which is written in Erlang, for my job. This made me curious and...
New
PragmaticBookshelf
Learn different ways of writing concurrent code in Elixir and increase your application's performance, without sacrificing scalability or...
New
PragmaticBookshelf
Use WebRTC to build web applications that stream media and data in real time directly from one user to another, all in the browser. ...
New
Maartz
Hi folks, I don’t know if I saw this here but, here’s a new programming language, called Roc Reminds me a bit of Elm and thus Haskell. ...
New
Help
I am trying to crate a game for the Nintendo switch, I wanted to use Java as I am comfortable with that programming language. Can you use...
New
New
CommunityNews
A Brief Review of the Minisforum V3 AMD Tablet. Update: I have created an awesome-minisforum-v3 GitHub repository to list information fo...
New