CommunityNews

CommunityNews

We Bought the Whole GPU, So We're Damn Well Going to Use the Whole GPU

TLDR: We’re releasing a throughput-optimized megakernel for tensor-parallel inference with Llama-70B on H100s. Our kernel can aggressively overlap compute, memory, and communication ops in order to simultaneously use the different hardware resources available on a GPU. When integrated into the Tokasaurus inference engine, our megakernel can outperform SGLang by >22% on end-to-end throughput (measured as time to finish 65,536 prompts from the ShareGPT benchmark). We’re releasing the code here; please be warned that this really is research code; it is sensitive to compiler versions, GPU setup, and sometimes even being looked at the wrong way, and we have no intention whatsoever of supporting it. We hope you’ll find the ideas and results interesting nonetheless!

Read in full here:

Where Next?

Popular Ai topics Top

First poster: bot
In response to a national and international awakening on the issues of anti-Blackness and systemic discrimination, we have penned this pi...
New
CommunityNews
DeepMind’s New AI With a Memory Outperforms Algorithms 25 Times Its Size. DeepMind’s model, with just 7 billion parameters, outperformed...
New
CommunityNews
Artificial intelligence is now smart enough to write tracks that earn streaming service royalties.
New
First poster: CommunityNews
A new computer program fashioned after artificial intelligence systems like AlphaGo has solved several open problems in combinatorics and...
New
CommunityNews
We present Imagen, a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understandin...
New
First poster: bot
AlphaTensor discovers better algorithms for matrix math, inspiring another improvement from afar.
New
First poster: bot
You can’t solve AI security problems with more AI. One of the most common proposed solutions to prompt injection attacks (where an AI la...
New
First poster: happyrat1
With a leap in the evolution of large language models, some leading thinkers are questioning whether AI might become sentient
New
CommunityNews
Cursor 1.0 brings BugBot for code review, a first look at memories, one-click MCP setup, Jupyter support and general availability of Back...
New
CommunityNews
Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing rout...
New

Other popular topics Top

PragmaticBookshelf
Stop developing web apps with yesterday’s tools. Today, developers are increasingly adopting Clojure as a web-development platform. See f...
New
PragmaticBookshelf
Rust is an exciting new programming language combining the power of C with memory safety, fearless concurrency, and productivity boosters...
New
AstonJ
Thanks to @foxtrottwist’s and @Tomas’s posts in this thread: Poll: Which code editor do you use? I bought Onivim! :nerd_face: https://on...
New
PragmaticBookshelf
Tailwind CSS is an exciting new CSS framework that allows you to design your site by composing simple utility classes to create complex e...
New
Maartz
Hi folks, I don’t know if I saw this here but, here’s a new programming language, called Roc Reminds me a bit of Elm and thus Haskell. ...
New
AstonJ
Was just curious to see if any were around, found this one: I got 51/100: Not sure if it was meant to buy I am sure at times the b...
New
hilfordjames
There appears to have been an update that has changed the terminology for what has previously been known as the Taskbar Overflow - this h...
New
New
New
sir.laksmana_wenk
I’m able to do the “artistic” part of game-development; character designing/modeling, music, environment modeling, etc. However, I don’t...
New