CommunityNews

CommunityNews

ChunkLLM: A Lightweight Pluggable Framework for Accelerating LLMs Inference

Transformer-based large models excel in natural language processing and computer vision, but face severe computational inefficiencies due to the self-attention’s quadratic complexity with input tokens. Recently, researchers have proposed a series of methods based on block selection and compression to alleviate this problem, but they either have issues with semantic incompleteness or poor training-inference efficiency. To comprehensively address these challenges, we propose ChunkLLM, a lightweight and pluggable training framework. Specifically, we introduce two components: QK Adapter (Q-Adapter and K-Adapter) and Chunk Adapter. The former is attached to each Transformer layer, serving dual purposes of feature compression and chunk attention acquisition. The latter operates at the bottommost layer of the model, functioning to detect chunk boundaries by leveraging contextual semantic information. During the training phase, the parameters of the backbone remain frozen, with only the QK Adapter and Chunk Adapter undergoing training. Notably, we design an attention distillation method for training the QK Adapter, which enhances the recall rate of key chunks. During the inference phase, chunk selection is triggered exclusively when the current token is detected as a chunk boundary, thereby accelerating model inference. Experimental evaluations are conducted on a diverse set of long-text and short-text benchmark datasets spanning multiple tasks. ChunkLLM not only attains comparable performance on short-text benchmarks but also maintains 98.64% of the performance on long-context benchmarks while preserving a 48.58% key-value cache retention rate. Particularly, ChunkLLM attains a maximum speedup of 4.48x in comparison to the vanilla Transformer in the processing of 120K long texts.

Read in full here:

Most Liked

chris.johan

chris.johan

Any examples on how to use this?

Where Next?

Popular Ai topics Top

New
First poster: CommunityNews
AI models are increasingly applied in high-stakes domains like health and conservation. Data quality carries an elevated signifi- cance i...
New
First poster: jacobtriton
Why AI is Harder Than We Think. Since its beginning in the 1950s, the field of artificial intelligence has cycled several times between...
New
New
First poster: bot
Building games and apps entirely through natural language using OpenAI’s code-davinci model. TL;DR: OpenAI has a new code generating mod...
New
First poster: alvinkatojr
Klarna CEO says the company stopped hiring a year ago because AI ‘can already do all of the jobs’. Klarna CEO Sebastian Siemiatkowski sa...
New
alvinkatojr
This was/is a great read that counters the common “woe is me” fear of AI. Author knows his stuff and breaks down the 8 fallacies tied to...
New
First poster: ozornin
They are among 400 artists appealing to Sir Keir Starmer, saying creative industries are threatened.
New
CommunityNews
Comparison and ranking the performance of over 100 AI models (LLMs) across key metrics including intelligence, price, performance and spe...
New
CommunityNews
Contracted AI raters describe grueling deadlines, poor pay and opacity around work to make chatbots intelligent
New

Other popular topics Top

ohm
Which, if any, games do you play? On what platform? I just bought (and completed) Minecraft Dungeons for my Nintendo Switch. Other than ...
New
AstonJ
poll poll Be sure to check out @Dusty’s article posted here: An Introduction to Alternative Keyboard Layouts It’s one of the best write-...
New
AstonJ
This looks like a stunning keycap set :orange_heart: A LEGENDARY KEYBOARD LIVES ON When you bought an Apple Macintosh computer in the e...
New
dimitarvp
Small essay with thoughts on macOS vs. Linux: I know @Exadra37 is just waiting around the corner to scream at me “I TOLD YOU SO!!!” but I...
New
PragmaticBookshelf
Create efficient, elegant software tests in pytest, Python's most powerful testing framework. Brian Okken @brianokken Edited by Kat...
New
foxtrottwist
A few weeks ago I started using Warp a terminal written in rust. Though in it’s current state of development there are a few caveats (tab...
New
New
husaindevelop
Inside our android webview app, we are trying to paste the copied content from another app eg (notes) using navigator.clipboard.readtext ...
New
AnfaengerAlex
Hello, I’m a beginner in Android development and I’m facing an issue with my project setup. In my build.gradle.kts file, I have the foll...
New
Fl4m3Ph03n1x
Background Lately I am in a quest to find a good quality TTS ai generation tool to run locally in order to create audio for some videos I...
New