CommunityNews

CommunityNews

Speculative Speculative Decoding

Autoregressive decoding is bottlenecked by its sequential nature. Speculative decoding has become a standard way to accelerate inference by using a fast draft model to predict upcoming tokens from a slower target model, and then verifying them in parallel with a single target model forward pass. However, speculative decoding itself relies on a sequential dependence between speculation and verification. We introduce speculative speculative decoding (SSD) to parallelize these operations. While a verification is ongoing, the draft model predicts likely verification outcomes and prepares speculations pre-emptively for them. If the actual verification outcome is then in the predicted set, a speculation can be returned immediately, eliminating drafting overhead entirely. We identify three key challenges presented by speculative speculative decoding, and suggest principled methods to solve each. The result is Saguaro, an optimized SSD algorithm. Our implementation is up to 2x faster than optimized speculative decoding baselines and up to 5x faster than autoregressive decoding with open source inference engines.

Read in full here:

Where Next?

Popular General Dev topics Top

First poster: HenryCost
I wired my tree with 500 LED lights and calculated their 3D coordinates… If you support me on Patreon at any point in December 2020 I wi...
New
First poster: bot
MEMORANDUM FOR SENIOR PENTAGON LEADERSHIP COMMANDANT OF THE COAST GUARD COMMANDERS OF THE COMBATANT COMMANDS DEFENSE AGENCY AND DOD FIEL...
New
First poster: mindriot
LG 28-inch 16:18 DualUp Monitor with Ergo Stand and USB Type-C™ (28MQ780-B) | LG USA. Shop LG 28MQ780-B on the official LG.com website ...
New
First poster: bot
API Gateway Trends behind Features: Apache APISIX 3.0 vs. Kong 3.0 - API7.ai. By comparing the open-source API Gateway Apache APISIX and...
New
New
First poster: gulshan212
Why Python keeps growing, explained | The GitHub Blog. A deep dive into why more people are using Python than ever, its key use cases, a...
New
CommunityNews
Apple Patents Suggest Future AirPods Could Monitor Biosignals & Brain Activity - AppleMagazine. The US Patent & Trademark Office...
New
First poster: dyowee
A Go package for building Progressive Web Apps. A package for building progressive web apps (PWA) with the Go programming language (Gola...
New
First poster: AstonJ
Jan | Rethink the Computer. Jan turns your computer into an AI machine by running LLMs locally on your computer. It’s a privacy-focus, l...
New
First poster: dyowee
olmOCR is an open-source tool for converting PDFs to text with high accuracy, preserving reading order and supporting tables, equations, ...
New

Other popular topics Top

New
PragmaticBookshelf
Machine learning can be intimidating, with its reliance on math and algorithms that most programmers don't encounter in their regular wor...
New
PragmaticBookshelf
Brace yourself for a fun challenge: build a photorealistic 3D renderer from scratch! In just a couple of weeks, build a ray tracer that r...
New
brentjanderson
Bought the Moonlander mechanical keyboard. Cherry Brown MX switches. Arms and wrists have been hurting enough that it’s time I did someth...
New
New
AstonJ
Thanks to @foxtrottwist’s and @Tomas’s posts in this thread: Poll: Which code editor do you use? I bought Onivim! :nerd_face: https://on...
New
AstonJ
Saw this on TikTok of all places! :lol: Anyone heard of them before? Lite:
New
PragmaticBookshelf
Author Spotlight Jamis Buck @jamis This month, we have the pleasure of spotlighting author Jamis Buck, who has written Mazes for Prog...
New
AstonJ
If you want a quick and easy way to block any website on your Mac using Little Snitch simply… File > New Rule: And select Deny, O...
New
New