CommunityNews

CommunityNews

We Bought the Whole GPU, So We're Damn Well Going to Use the Whole GPU

TLDR: We’re releasing a throughput-optimized megakernel for tensor-parallel inference with Llama-70B on H100s. Our kernel can aggressively overlap compute, memory, and communication ops in order to simultaneously use the different hardware resources available on a GPU. When integrated into the Tokasaurus inference engine, our megakernel can outperform SGLang by >22% on end-to-end throughput (measured as time to finish 65,536 prompts from the ShareGPT benchmark). We’re releasing the code here; please be warned that this really is research code; it is sensitive to compiler versions, GPU setup, and sometimes even being looked at the wrong way, and we have no intention whatsoever of supporting it. We hope you’ll find the ideas and results interesting nonetheless!

Read in full here:

Where Next?

Popular Ai topics Top

First poster: bot
An ancient language has defied decryption for 100 years. Can AI crack the code?. Machine learning can translate between two known langua...
New
First poster: OvermindDL1
Equity, the performing arts workers union, says actors need protection from computer-generated substitutes.
New
First poster: bot
DeepMind AI learns simple physics like a baby. Neural network could be a step towards programs for studying how human infants learn.
New
First poster: bot
When Hyundai acquired Boston Dynamics at the end of 2020, there were plenty of open questions. Chief among them was why we should assume ...
New
First poster: bot
Technique could allow high-quality calls and music on low-quality connections.
New
First poster: bot
BBC documentary used face-swapping AI to hide protesters’ identities. Filmmakers used an AI to swap the faces of anti-government protest...
New
New
First poster: happyrat1
With a leap in the evolution of large language models, some leading thinkers are questioning whether AI might become sentient
New
gfqdjb
With all the AI buzz around coding assistants, and being a bit concerned about being dependent on third-party cloud providers here, I dec...
New
CommunityNews
Contracted AI raters describe grueling deadlines, poor pay and opacity around work to make chatbots intelligent
New

Other popular topics Top

PragmaticBookshelf
Learn from the award-winning programming series that inspired the Elixir language, and go on a step-by-step journey through the most impo...
New
AstonJ
What chair do you have while working… and why? Is there a ‘best’ type of chair or working position for developers?
New
PragmaticBookshelf
Rust is an exciting new programming language combining the power of C with memory safety, fearless concurrency, and productivity boosters...
New
PragmaticBookshelf
Author Spotlight Jamis Buck @jamis This month, we have the pleasure of spotlighting author Jamis Buck, who has written Mazes for Prog...
New
PragmaticBookshelf
Author Spotlight Mike Riley @mriley This month, we turn the spotlight on Mike Riley, author of Portable Python Projects. Mike’s book ...
New
New
husaindevelop
Inside our android webview app, we are trying to paste the copied content from another app eg (notes) using navigator.clipboard.readtext ...
New
PragmaticBookshelf
Programming Ruby is the most complete book on Ruby, covering both the language itself and the standard library as well as commonly used t...
New
PragmaticBookshelf
Author Spotlight: Peter Ullrich @PJUllrich Data is at the core of every business, but it is useless if nobody can access and analyze ...
New
PragmaticBookshelf
Develop, deploy, and debug BEAM applications using BEAMOps: a new paradigm that focuses on scalability, fault tolerance, and owning each ...
New