CommunityNews

CommunityNews

ChunkLLM: A Lightweight Pluggable Framework for Accelerating LLMs Inference

Transformer-based large models excel in natural language processing and computer vision, but face severe computational inefficiencies due to the self-attention’s quadratic complexity with input tokens. Recently, researchers have proposed a series of methods based on block selection and compression to alleviate this problem, but they either have issues with semantic incompleteness or poor training-inference efficiency. To comprehensively address these challenges, we propose ChunkLLM, a lightweight and pluggable training framework. Specifically, we introduce two components: QK Adapter (Q-Adapter and K-Adapter) and Chunk Adapter. The former is attached to each Transformer layer, serving dual purposes of feature compression and chunk attention acquisition. The latter operates at the bottommost layer of the model, functioning to detect chunk boundaries by leveraging contextual semantic information. During the training phase, the parameters of the backbone remain frozen, with only the QK Adapter and Chunk Adapter undergoing training. Notably, we design an attention distillation method for training the QK Adapter, which enhances the recall rate of key chunks. During the inference phase, chunk selection is triggered exclusively when the current token is detected as a chunk boundary, thereby accelerating model inference. Experimental evaluations are conducted on a diverse set of long-text and short-text benchmark datasets spanning multiple tasks. ChunkLLM not only attains comparable performance on short-text benchmarks but also maintains 98.64% of the performance on long-context benchmarks while preserving a 48.58% key-value cache retention rate. Particularly, ChunkLLM attains a maximum speedup of 4.48x in comparison to the vanilla Transformer in the processing of 120K long texts.

Read in full here:

Most Liked

chris.johan

chris.johan

Any examples on how to use this?

Where Next?

Popular Ai topics Top

First poster: bot
Within the decade, Google aims to build a useful, error-corrected quantum computer. This will accelerate solutions for some of the world’...
New
First poster: CommunityNews
Should we be concerned that the decisions of AIs are inscrutable? | Psyche Ideas. Machine learning is a black box – even when the decisi...
New
First poster: CommunityNews
BROKEN PROMISES & EMPTY THREATS: THE EVOLUTION OF AI IN THE USA, 1956-1996 Artificial Intelligence (AI) is once again a promising tec...
New
New
First poster: bot
AlphaTensor discovers better algorithms for matrix math, inspiring another improvement from afar.
New
First poster: Eiji
AI web crawling bots are the cockroaches of the internet, many developers believe. FOSS devs are fighting back in ingenuous, humorous wa...
New
alvinkatojr
This was/is a great read that counters the common “woe is me” fear of AI. Author knows his stuff and breaks down the 8 fallacies tied to...
New
First poster: gflashner
Google’s openly available Gemma collection of AI models has reached a milestone: over 150 million downloads. Omar Sanseviero, a developer...
New
First poster: conradwt
Why I decided to ditch Cursor and switch to running Claude Code in an isolated environment + diy guide!
New
First poster: gfqdjb
First enthusiasm A couple of weeks ago I subscribed to Claude Code, and during the first few weeks I had a really nice experience. It was...
New

Other popular topics Top

DevotionGeo
I know that -t flag is used along with -i flag for getting an interactive shell. But I cannot digest what the man page for docker run com...
New
AstonJ
Curious to know which languages and frameworks you’re all thinking about learning next :upside_down_face: Perhaps if there’s enough peop...
New
PragmaticBookshelf
From finance to artificial intelligence, genetic algorithms are a powerful tool with a wide array of applications. But you don't need an ...
New
PragmaticBookshelf
Build highly interactive applications without ever leaving Elixir, the way the experts do. Let LiveView take care of performance, scalabi...
New
AstonJ
We’ve talked about his book briefly here but it is quickly becoming obsolete - so he’s decided to create a series of 7 podcasts, the firs...
New
husaindevelop
Inside our android webview app, we are trying to paste the copied content from another app eg (notes) using navigator.clipboard.readtext ...
New
DevotionGeo
I have always used antique keyboards like Cherry MX 1800 or Cherry MX 8100 and almost always have modified the switches in some way, like...
New
First poster: bot
zig/http.zig at 7cf2cbb33ef34c1d211135f56d30fe23b6cacd42 · ziglang/zig. General-purpose programming language and toolchain for maintaini...
New
First poster: AstonJ
Jan | Rethink the Computer. Jan turns your computer into an AI machine by running LLMs locally on your computer. It’s a privacy-focus, l...
New
AstonJ
This is a very quick guide, you just need to: Download LM Studio: https://lmstudio.ai/ Click on search Type DeepSeek, then select the o...
New