CommunityNews

CommunityNews

We Bought the Whole GPU, So We're Damn Well Going to Use the Whole GPU

TLDR: We’re releasing a throughput-optimized megakernel for tensor-parallel inference with Llama-70B on H100s. Our kernel can aggressively overlap compute, memory, and communication ops in order to simultaneously use the different hardware resources available on a GPU. When integrated into the Tokasaurus inference engine, our megakernel can outperform SGLang by >22% on end-to-end throughput (measured as time to finish 65,536 prompts from the ShareGPT benchmark). We’re releasing the code here; please be warned that this really is research code; it is sensitive to compiler versions, GPU setup, and sometimes even being looked at the wrong way, and we have no intention whatsoever of supporting it. We hope you’ll find the ideas and results interesting nonetheless!

Read in full here:

Where Next?

Popular Ai topics Top

First poster: bot
NVIDIA Uses AI to Slash Bandwidth on Video Calls. NVIDIA Research has invented a way to use AI to dramatically reduce video call bandwid...
New
New
First poster: CommunityNews
Now that DeepMind has taught AI to master the game of Go—and furthered its advantage in chess—they’ve turned their attention to another b...
New
First poster: bot
Kicking off another busy Spring GPU Technology Conference for NVIDIA, this morning the graphics and accelerator designer is announcing th...
New
First poster: CommunityNews
Google has unveiled a tool that uses artificial intelligence to help spot skin, hair and nail conditions, based on images uploaded by pat...
New
First poster: bot
AI Is Discovering Patterns in Pure Mathematics That Have Never Been Seen Before. We can add suggesting and proving mathematical theorems...
New
First poster: joni
My experience trying to write original, full-length human-sounding articles using Claude AI. You can use AI tools like Claude to help yo...
New
First poster: gflashner
Google’s openly available Gemma collection of AI models has reached a milestone: over 150 million downloads. Omar Sanseviero, a developer...
New
CommunityNews
The glamourous AI coding agent for your favourite terminal :heart_with_arrow: - charmbracelet/crush
New
First poster: jkdiaz
TechCrunch spoke to experienced coders about their time using AI-generated code about what they see as the future of vibe coding.
New

Other popular topics Top

New
PragmaticBookshelf
Brace yourself for a fun challenge: build a photorealistic 3D renderer from scratch! In just a couple of weeks, build a ray tracer that r...
New
PragmaticBookshelf
Ruby, Io, Prolog, Scala, Erlang, Clojure, Haskell. With Seven Languages in Seven Weeks, by Bruce A. Tate, you’ll go beyond the syntax—and...
New
AstonJ
Curious to know which languages and frameworks you’re all thinking about learning next :upside_down_face: Perhaps if there’s enough peop...
New
DevotionGeo
The V Programming Language Simple language for building maintainable programs V is already mentioned couple of times in the forum, but I...
New
AstonJ
Continuing the discussion from Thinking about learning Crystal, let’s discuss - I was wondering which languages don’t GC - maybe we can c...
New
PragmaticBookshelf
Build efficient applications that exploit the unique benefits of a pure functional language, learning from an engineer who uses Haskell t...
New
PragmaticBookshelf
Develop, deploy, and debug BEAM applications using BEAMOps: a new paradigm that focuses on scalability, fault tolerance, and owning each ...
New
CommunityNews
A Brief Review of the Minisforum V3 AMD Tablet. Update: I have created an awesome-minisforum-v3 GitHub repository to list information fo...
New
RobertRichards
Hair Salon Games for Girls Fun Girls Hair Saloon game is mainly developed for kids. This game allows users to select virtual avatars to ...
New