CommunityNews

CommunityNews

ChunkLLM: A Lightweight Pluggable Framework for Accelerating LLMs Inference

Transformer-based large models excel in natural language processing and computer vision, but face severe computational inefficiencies due to the self-attention’s quadratic complexity with input tokens. Recently, researchers have proposed a series of methods based on block selection and compression to alleviate this problem, but they either have issues with semantic incompleteness or poor training-inference efficiency. To comprehensively address these challenges, we propose ChunkLLM, a lightweight and pluggable training framework. Specifically, we introduce two components: QK Adapter (Q-Adapter and K-Adapter) and Chunk Adapter. The former is attached to each Transformer layer, serving dual purposes of feature compression and chunk attention acquisition. The latter operates at the bottommost layer of the model, functioning to detect chunk boundaries by leveraging contextual semantic information. During the training phase, the parameters of the backbone remain frozen, with only the QK Adapter and Chunk Adapter undergoing training. Notably, we design an attention distillation method for training the QK Adapter, which enhances the recall rate of key chunks. During the inference phase, chunk selection is triggered exclusively when the current token is detected as a chunk boundary, thereby accelerating model inference. Experimental evaluations are conducted on a diverse set of long-text and short-text benchmark datasets spanning multiple tasks. ChunkLLM not only attains comparable performance on short-text benchmarks but also maintains 98.64% of the performance on long-context benchmarks while preserving a 48.58% key-value cache retention rate. Particularly, ChunkLLM attains a maximum speedup of 4.48x in comparison to the vanilla Transformer in the processing of 120K long texts.

Read in full here:

Most Liked

chris.johan

chris.johan

Any examples on how to use this?

Where Next?

Popular Ai topics Top

New
First poster: CommunityNews
Steve Blank Artificial Intelligence and Machine Learning– Explained. Artificial Intelligence is a once-in-a lifetime commercial and defe...
New
First poster: bot
Autonomous Drones Challenge Human Champions in First “Fair” Race. Watching robots operate with speed and precision is always impressive,...
New
First poster: DevotionGeo
Voice synthesis PR stunt calls upon the dead to help sell an AI product.
New
First poster: bot
Exascale Cerebras Andromeda cluster packs more cores than 1,954 Nvidia A100 GPUs.
New
First poster: CommunityNews
OpenJourney is a Text-to-Image AI model which has the goal of bringing an open source equivalent to Midjourney to the people. It is curre...
New
First poster: joni
My experience trying to write original, full-length human-sounding articles using Claude AI. You can use AI tools like Claude to help yo...
New
First poster: alvinkatojr
Klarna CEO says the company stopped hiring a year ago because AI ‘can already do all of the jobs’. Klarna CEO Sebastian Siemiatkowski sa...
New
First poster: alvinkatojr
Giving AI systems the ability to focus on particular brain regions can make them much better at reconstructing images of what a monkey is...
New
CommunityNews
Cursor 1.0 brings BugBot for code review, a first look at memories, one-click MCP setup, Jupyter support and general availability of Back...
New

Other popular topics Top

PragmaticBookshelf
Brace yourself for a fun challenge: build a photorealistic 3D renderer from scratch! In just a couple of weeks, build a ray tracer that r...
New
New
Exadra37
Oh just spent so much time on this to discover now that RancherOS is in end of life but Rancher is refusing to mark the Github repo as su...
New
AstonJ
Continuing the discussion from Thinking about learning Crystal, let’s discuss - I was wondering which languages don’t GC - maybe we can c...
New
rustkas
Intensively researching Erlang books and additional resources on it, I have found that the topic of using Regular Expressions is either c...
New
PragmaticBookshelf
Rails 7 completely redefines what it means to produce fantastic user experiences and provides a way to achieve all the benefits of single...
New
AstonJ
If you want a quick and easy way to block any website on your Mac using Little Snitch simply… File > New Rule: And select Deny, O...
New
PragmaticBookshelf
Programming Ruby is the most complete book on Ruby, covering both the language itself and the standard library as well as commonly used t...
New
DevotionGeo
I have always used antique keyboards like Cherry MX 1800 or Cherry MX 8100 and almost always have modified the switches in some way, like...
New
NewsBot
Node.js v22.14.0 has been released. Link: Release 2025-02-11, Version 22.14.0 'Jod' (LTS), @aduh95 · nodejs/node · GitHub
New