CommunityNews

CommunityNews

We Bought the Whole GPU, So We're Damn Well Going to Use the Whole GPU

TLDR: We’re releasing a throughput-optimized megakernel for tensor-parallel inference with Llama-70B on H100s. Our kernel can aggressively overlap compute, memory, and communication ops in order to simultaneously use the different hardware resources available on a GPU. When integrated into the Tokasaurus inference engine, our megakernel can outperform SGLang by >22% on end-to-end throughput (measured as time to finish 65,536 prompts from the ShareGPT benchmark). We’re releasing the code here; please be warned that this really is research code; it is sensitive to compiler versions, GPU setup, and sometimes even being looked at the wrong way, and we have no intention whatsoever of supporting it. We hope you’ll find the ideas and results interesting nonetheless!

Read in full here:

Where Next?

Popular Ai topics Top

First poster: CommunityNews
Google has unveiled a tool that uses artificial intelligence to help spot skin, hair and nail conditions, based on images uploaded by pat...
New
First poster: CommunityNews
Making Things Think: How AI and Deep Learning Power the Products We Use — Holloway. AI now shapes our lives, yet few people know how mac...
New
First poster: bot
DeepMind AI learns simple physics like a baby. Neural network could be a step towards programs for studying how human infants learn.
New
First poster: CommunityNews
Chat-bots are amazing these days! About a month ago LaMDA made the news when it apparently convinced an engineer at Google that it was se...
New
New
First poster: bot
Not to be outdone by Meta, Google’s AI generator can output 1280x768 HD video at 24 fps.
New
First poster: DevotionGeo
Voice synthesis PR stunt calls upon the dead to help sell an AI product.
New
New
First poster: bot
Exascale Cerebras Andromeda cluster packs more cores than 1,954 Nvidia A100 GPUs.
New
AstonJ
This is cool! DEEPSEEK-V3 ON M4 MAC: BLAZING FAST INFERENCE ON APPLE SILICON We just witnessed something incredible: the largest open-s...
New

Other popular topics Top

Devtalk
Reading something? Working on something? Planning something? Changing jobs even!? If you’re up for sharing, please let us know what you’...
1051 21715 396
New
AstonJ
If it’s a mechanical keyboard, which switches do you have? Would you recommend it? Why? What will your next keyboard be? Pics always w...
New
PragmaticBookshelf
Learn from the award-winning programming series that inspired the Elixir language, and go on a step-by-step journey through the most impo...
New
PragmaticBookshelf
Write Elixir tests that you can be proud of. Dive into Elixir’s test philosophy and gain mastery over the terminology and concepts that u...
New
siddhant3030
I’m thinking of buying a monitor that I can rotate to use as a vertical monitor? Also, I want to know if someone is using it for program...
New
AstonJ
I have seen the keycaps I want - they are due for a group-buy this week but won’t be delivered until October next year!!! :rofl: The Ser...
New
mafinar
This is going to be a long an frequently posted thread. While talking to a friend of mine who has taken data structure and algorithm cou...
New
PragmaticBookshelf
Rails 7 completely redefines what it means to produce fantastic user experiences and provides a way to achieve all the benefits of single...
New
AstonJ
This is a very quick guide, you just need to: Download LM Studio: https://lmstudio.ai/ Click on search Type DeepSeek, then select the o...
New
Fl4m3Ph03n1x
Background Lately I am in a quest to find a good quality TTS ai generation tool to run locally in order to create audio for some videos I...
New