CommunityNews

We Bought the Whole GPU, So We're Damn Well Going to Use the Whole GPU

TLDR: We’re releasing a throughput-optimized megakernel for tensor-parallel inference with Llama-70B on H100s. Our kernel can aggressively overlap compute, memory, and communication ops in order to simultaneously use the different hardware resources available on a GPU. When integrated into the Tokasaurus inference engine, our megakernel can outperform SGLang by >22% on end-to-end throughput (measured as time to finish 65,536 prompts from the ShareGPT benchmark). We’re releasing the code here; please be warned that this really is research code; it is sensitive to compiler versions, GPU setup, and sometimes even being looked at the wrong way, and we have no intention whatsoever of supporting it. We hope you’ll find the ideas and results interesting nonetheless!

Read in full here:

View thread on forum

#gpu

0 226 0

2025-10-02 15:19:07 UTC

Where Next?

View thread on forum

gpu

Home AI>In The News

#gpu

0 226 0

Last post

Popular Ai topics

AI>In The News

Combating Anti-Blackness in the AI Community

In response to a national and international awakening on the issues of anti-Blackness and systemic discrimination, we have penned this pi...

devinguillory.com

#community /diversity

0 1449 1

2021-01-31 21:13:15 UTC

New

AI>In The News

AI: Ghost workers demand to be seen and heard

Artificial intelligence and machine learning exist on the back of a lot of hard work from humans. Alongside the scientists, there are th...

bbc.co.uk

#ai

0 1326 0

2021-03-29 13:24:19 UTC

New

AI>In The News

Why cows may be hiding something but AI can spot it

bbc.co.uk

#spot

0 1004 0

2022-02-01 15:09:12 UTC

New

AI>In The News

In New Math Proofs, Artificial Intelligence Plays to Win

A new computer program fashioned after artificial intelligence systems like AlphaGo has solved several open problems in combinatorics and...

quantamagazine.org

#math

0 1287 0

2022-03-07 23:16:04 UTC

New

AI>In The News

Making Things Think – AI Book

Making Things Think: How AI and Deep Learning Power the Products We Use — Holloway. AI now shapes our lives, yet few people know how mac...

holloway.com

#book

0 1473 0

2022-07-07 23:21:31 UTC

New

AI>In The News

You can’t solve AI security problems with more AI

You can’t solve AI security problems with more AI. One of the most common proposed solutions to prompt injection attacks (where an AI la...

simonwillison.net

/security

0 1129 0

2022-10-17 13:09:12 UTC

New

AI>In The News

OpenJourney: Midjourney, but Open Source

OpenJourney is a Text-to-Image AI model which has the goal of bringing an open source equivalent to Midjourney to the people. It is curre...

open-journey.github.io

0 2151 0

2023-01-26 03:25:56 UTC

New

AI>In The News

Cursor 1.0 - The AI Code Editor

Cursor 1.0 brings BugBot for code review, a first look at memories, one-click MCP setup, Jupyter support and general availability of Back...

cursor.com

#code #changelog #cursor

0 1040 0

2025-06-05 04:21:46 UTC

New

AI>In The News

Local LLM for Coding with Ollama on macOS

With all the AI buzz around coding assistants, and being a bit concerned about being dependent on third-party cloud providers here, I dec...

#macos

2 621 1

2025-08-09 05:07:32 UTC

New

AI>In The News

From Noise to Image

An interactive, visual guide to the magic behind how AIs generate images from text.

lighthousesoftware.co.uk

0 13 0

2026-03-01 15:22:06 UTC

New

Other popular topics

Backend>Chat

What is the reason behind Rust’s web framework, Rocket, not performing as well as expected in the Techempower benchmarks?

I know that these benchmarks might not be the exact picture of real-world scenario, but still I expect a Rust web framework performing a ...

#web-frameworks /rust

36 7463 11

2020-06-21 10:50:02 UTC

New

Science/Tech>Tech Chat

What are you watching?

Or looking forward to? :nerd_face:

#community

498 13326 269

2026-01-28 02:22:15 UTC

New

Backend>Chat

Would you use Erlang now when there is Elixir?

Why, if your answer is yes?

/elixir /erlang

167 4955 52

2021-04-22 18:15:44 UTC

New

Community>Journals

Programming Erlang Book Club

My first contact with Erlang was about 2 years ago when I used RabbitMQ, which is written in Erlang, for my job. This made me curious and...

/erlang /book-programming-erlang-2nd-edition #book-club

195 6815 95

2025-02-16 20:22:17 UTC

New

General Dev>Hardware

Custom keyboard keycaps

There’s a whole world of custom keycaps out there that I didn’t know existed! Check out all of our Keycaps threads here: https://forum....

#hardware /keyboards #keycaps #mechanical-keyboards

15 11086 19

2023-07-27 16:30:57 UTC

New

General Dev>Dev Chat

Warp—The blazingly fast, Rust-based terminal

A few weeks ago I started using Warp a terminal written in rust. Though in it’s current state of development there are a few caveats (tab...

/rust #terminal

52 6785 22

2025-02-26 17:47:24 UTC

New

General Dev>Code Editors

Doom-Emacs: Can't find emacs in your PATH

If you get Can't find emacs in your PATH when trying to install Doom Emacs on your Mac you… just… need to install Emacs first! :lol: bre...

#macos /emacs #doom-emacs

4 5837 0

2022-02-04 00:32:03 UTC

New

Game Dev>Questions

I want to learn how make a game, but where should I start?

I’m able to do the “artistic” part of game-development; character designing/modeling, music, environment modeling, etc. However, I don’t...

#game-dev

15 4965 9

2025-10-18 13:12:58 UTC

New

Android>Questions

Unresolved Reference to android in build.gradle.kts – Beginner Issue

Hello, I’m a beginner in Android development and I’m facing an issue with my project setup. In my build.gradle.kts file, I have the foll...

#binding

0 7460 2

2024-12-09 21:07:33 UTC

New

AI>Chat

How to: Run DeepSeek on Mac, Windows, and Linux!

This is a very quick guide, you just need to: Download LM Studio: https://lmstudio.ai/ Click on search Type DeepSeek, then select the o...

#macs /deepseek #guides #lm-studio

14 9328 10

2025-06-19 15:11:16 UTC

New

AI>In The News

From Noise to Image

AI>In The News

747s and Coding Agents

AI>In The News

The Future of AI

AI>In The News

We Will Not Be Divided

AI>In The News

Statement from Dario Amodei on our discussions with the Department of War

AI>In The News

The path to ubiquitous AI

AI>In The News

Ggml.ai joins Hugging Face to ensure the long-term progress of Local AI · ggml-org llama.cpp · Discussion #19759

AI>In The News

How will OpenAI compete?

AI>In The News

I Taught My Dog to Vibe Code Games | Caleb Leak

AI>In The News

Hugging Face Skills - definitions for AI/ML tasks like dataset creation, model training, and evaluation

AI>In The News

AI In The News ❯

Latest on Devtalk

From Noise to Image

AI>In The News

747s and Coding Agents

AI>In The News

The Future of AI

AI>In The News

The whole thing was a scam

General Dev>In The News

MinIO Is Dead, Long Live MinIO

General Dev>In The News

Rust is Just a Tool

Backend>In The News

We Will Not Be Divided

AI>In The News

Bootc and OSTree: Modernizing Linux System Deployment

Linux>In The News

Yew yew-v0.22.1 released!

Frontend>Official News

Performance Tips · The Julia Language

Backend>In The News

Bending the CLOS MOP for Java-Style Single Dispatch

Backend>In The News

Worldwide Smartphone Market to Decline 13% in 2026, Marking the Largest Drop Ever Due to the Memory Shortage Crisis, according to IDC

General Dev>In The News

React Native v0.84.1 released!

Hybrid>Official News

Statement from Dario Amodei on our discussions with the Department of War

AI>In The News

Dear Time Lords: Freeze Computers In 1993

General Dev>In The News

Devtalk ❯

We ❤️ helpful members!

We reward our most helpful members via our MOTM scheme - by giving away a whopping 25 books per year!

Sub Categories:

We're in Beta

About us Mission Statement See our Roadmap

We Bought the Whole GPU, So We're Damn Well Going to Use the Whole GPU

CommunityNews

We Bought the Whole GPU, So We're Damn Well Going to Use the Whole GPU

Where Next?

Popular Ai topics

Combating Anti-Blackness in the AI Community

AI: Ghost workers demand to be seen and heard

Why cows may be hiding something but AI can spot it

In New Math Proofs, Artificial Intelligence Plays to Win

Making Things Think – AI Book

You can’t solve AI security problems with more AI

OpenJourney: Midjourney, but Open Source

Cursor 1.0 - The AI Code Editor

Local LLM for Coding with Ollama on macOS

From Noise to Image

Other popular topics

What is the reason behind Rust’s web framework, Rocket, not performing as well as expected in the Techempower benchmarks?

What are you watching?

Would you use Erlang now when there is Elixir?

Programming Erlang Book Club

Custom keyboard keycaps

Warp—The blazingly fast, Rust-based terminal

Doom-Emacs: Can't find emacs in your PATH

I want to learn how make a game, but where should I start?

Unresolved Reference to android in build.gradle.kts – Beginner Issue

How to: Run DeepSeek on Mac, Windows, and Linux!

Sponsor Spotlight

AI>In The News

Latest on Devtalk

We ❤️ helpful members!

Devtalk Sponsors

Categories:

Sub Categories:

Popular Portals

Devtalk Sponsors

We're in Beta