CommunityNews

ChunkLLM: A Lightweight Pluggable Framework for Accelerating LLMs Inference

Transformer-based large models excel in natural language processing and computer vision, but face severe computational inefficiencies due to the self-attention’s quadratic complexity with input tokens. Recently, researchers have proposed a series of methods based on block selection and compression to alleviate this problem, but they either have issues with semantic incompleteness or poor training-inference efficiency. To comprehensively address these challenges, we propose ChunkLLM, a lightweight and pluggable training framework. Specifically, we introduce two components: QK Adapter (Q-Adapter and K-Adapter) and Chunk Adapter. The former is attached to each Transformer layer, serving dual purposes of feature compression and chunk attention acquisition. The latter operates at the bottommost layer of the model, functioning to detect chunk boundaries by leveraging contextual semantic information. During the training phase, the parameters of the backbone remain frozen, with only the QK Adapter and Chunk Adapter undergoing training. Notably, we design an attention distillation method for training the QK Adapter, which enhances the recall rate of key chunks. During the inference phase, chunk selection is triggered exclusively when the current token is detected as a chunk boundary, thereby accelerating model inference. Experimental evaluations are conducted on a diverse set of long-text and short-text benchmark datasets spanning multiple tasks. ChunkLLM not only attains comparable performance on short-text benchmarks but also maintains 98.64% of the performance on long-context benchmarks while preserving a 48.58% key-value cache retention rate. Particularly, ChunkLLM attains a maximum speedup of 4.48x in comparison to the vanilla Transformer in the processing of 120K long texts.

Read in full here:

1 comment

#llms #framework

1 187 1

2025-10-27 11:11:16 UTC

Most Liked

chris.johan

Any examples on how to use this?

Post #2

Where Next?

View thread on forum

llms

framework

Home AI>In The News

#llms #framework

1 187 1

Last post

Popular Ai topics

AI>In The News

Nvidia Uses AI to Slash Bandwidth on Video Calls

NVIDIA Uses AI to Slash Bandwidth on Video Calls. NVIDIA Research has invented a way to use AI to dramatically reduce video call bandwid...

petapixel.com

#video #nvidia

1 966 0

2020-10-09 15:35:49 UTC

New

AI>In The News

Combating Anti-Blackness in the AI Community

In response to a national and international awakening on the issues of anti-Blackness and systemic discrimination, we have penned this pi...

devinguillory.com

#community /diversity

0 1449 1

2021-01-31 21:13:15 UTC

New

AI>In The News

AI Is Discovering Patterns in Pure Mathematics That Have Never Been Seen Before

AI Is Discovering Patterns in Pure Mathematics That Have Never Been Seen Before. We can add suggesting and proving mathematical theorems...

sciencealert.com

#mathematics

0 1139 0

2021-12-11 23:07:15 UTC

New

AI>In The News

DeepMind AI learns simple physics like a baby

DeepMind AI learns simple physics like a baby. Neural network could be a step towards programs for studying how human infants learn.

nature.com

#deepmind

0 964 0

2022-07-11 23:16:33 UTC

New

AI>In The News

New AI assistant can browse, search, and use web apps like a human

Adept’s ACT-1 has learned how to automate complex UI tasks in web apps using an AI model.

arstechnica.com

#apps #web

0 968 0

2022-09-16 02:29:18 UTC

New

AI>In The News

Klarna CEO says the company stopped hiring a year ago because AI 'can already do all of the jobs'

Klarna CEO says the company stopped hiring a year ago because AI ‘can already do all of the jobs’. Klarna CEO Sebastian Siemiatkowski sa...

businessinsider.com

/erlang #jobs #klarna

2 814 2

2024-12-24 16:46:22 UTC

New

AI>In The News

The Great Displacement Is Already Well Underway

It’s Not a Hypothetical, I’ve Already Lost My Job to AI For The Last Year

shawnfromportland.substack.com

6 723 4

2025-06-09 01:55:59 UTC

New

AI>In The News

Ollama's new engine for multimodal models · Ollama Blog

Ollama now supports new multimodal models with its new engine.

ollama.com

#blog #ollama

0 786 0

2025-05-16 14:30:19 UTC

New

AI>In The News

Switching to Claude Code + VSCode inside Docker

Why I decided to ditch Cursor and switch to running Claude Code in an isolated environment + diy guide!

timsh.org

#docker #code /vscode #claude

0 849 2

2026-04-21 12:51:23 UTC

New

AI>In The News

Crush: The glamourous AI coding agent for your favourite terminal 💘

The glamourous AI coding agent for your favourite terminal :heart_with_arrow: - charmbracelet/crush

github.com

#terminal #coding #github #crush

0 1121 0

2025-07-31 01:27:58 UTC

New

Other popular topics

Backend>Learning Resources

Programming Machine Learning

Machine learning can be intimidating, with its reliance on math and algorithms that most programmers don't encounter in their regular wor...

pragprog.com

#pragprog #ai /python #published-book /book-programming-machine-learning #math #algorithms

6 5350 3

2023-10-03 15:08:13 UTC

New

General Dev>Learning Resources

Seven More Languages in Seven Weeks

Learn from the award-winning programming series that inspired the Elixir language, and go on a step-by-step journey through the most impo...

pragprog.com

#pragprog /elixir /julia /lua #published-book #factor /elm #minikanren /idris /book-seven-more-languages-in-seven-weeks

4 5862 0

2020-04-29 21:59:54 UTC

New

General Dev>Hardware

Moonlander Keyboard (Mechanical) (Ergonomic) (Split) (Ortholinear)

Bought the Moonlander mechanical keyboard. Cherry Brown MX switches. Arms and wrists have been hurting enough that it’s time I did someth...

#hardware /keyboards #moonlander #mechanical-keyboards #ortholinear #ergonomic

212 17779 90

2021-07-13 15:33:55 UTC

New

General Dev>Hardware

Poll: Which keyboard layout do you use?

poll poll Be sure to check out @Dusty’s article posted here: An Introduction to Alternative Keyboard Layouts It’s one of the best write-...

colemakmods.github.io

#polls /keyboards

10 6048 11

2020-10-31 23:12:33 UTC

New

General Dev>Code Editors

Dendron: a personal knowledge management tool on top of VSCode

/vscode #visual-studio-code

30 8077 9

2021-05-05 12:15:29 UTC

New

General Dev>Dev Chat

The V Programming Language

The V Programming Language Simple language for building maintainable programs V is already mentioned couple of times in the forum, but I...

#programminguages /v

21 13874 7

2021-04-12 15:13:42 UTC

New

Backend>Learning Resources

Programming WebRTC

Use WebRTC to build web applications that stream media and data in real time directly from one user to another, all in the browser. ...

pragprog.com

#pragprog #published-book /js #webrtc /book-programming-webrtc

27 6969 6

2021-11-20 19:03:04 UTC

New

Community>In The Spotlight

Spotlight: Rebecca Skinner (Author) Interview and AMA!

Author Spotlight Rebecca Skinner @RebeccaSkinner Welcome to our latest author spotlight, where we sit down with Rebecca Skinner, auth...

#author-spotlight /haskell /book-effective-haskell

106 11719 28

2022-11-16 10:29:37 UTC

New

Android>Questions

Clipboard readtext not working in android webview

Inside our android webview app, we are trying to paste the copied content from another app eg (notes) using navigator.clipboard.readtext ...

#android #clipboard

1 5651 0

2022-09-27 18:52:03 UTC

New

Backend>Learning Resources

Ash Framework

Explore the power of Ash Framework by modeling and building the domain for a real-world web application. Rebecca Le @sevenseacat and ...

pragprog.com

#pragprog /elixir #published-book /ash /book-ash-framework

15 7555 9

2025-02-06 12:19:21 UTC

New

AI>In The News

AI Agent - Build custom plugins without writing any code

AI>In The News

What is AI good at?

AI>In The News

Real businesses built live by Michii, an AI autonomous company

AI>In The News

AI didn’t replace our Security Team, it multiplied it

AI>In The News

Visuali.io: AI Image Generator & Photo Editor

AI>In The News

Agents Are Invention Machines

AI>In The News

Claude Code: Anatomy of a Misfeature

AI>In The News

Kimi K3 - Intelligence, Performance & Price Analysis

AI>In The News

Introducing LM Studio Bionic: the AI agent for open models

AI>In The News

Grok Build is open source

AI>In The News

AI In The News ❯

Latest on Devtalk

Kotlin v2.4.20-Beta2 released!

Backend>Official News

AssemblyScript v0.28.20 released!

Frontend>Official News

Apple has decided to compete for creativity app users

macOS>In The News

AI Agent - Build custom plugins without writing any code

AI>In The News

Free Ink · An open ecosystem for e-readers

General Dev>In The News

'VPNs are lawful technical tools,' says EU Court in landmark Anne Frank copyright ruling

General Dev>In The News

React v19.2.8, v19.1.9 and v19.0.8 released!

Frontend>Official News

Gleam v1.18.0-rc1 released!

Backend>Official News

React Native v0.87.0-rc.2 and v0.87.0-rc.1 released!

Hybrid>Official News

What is AI good at?

AI>In The News

Real businesses built live by Michii, an AI autonomous company

AI>In The News

The ACLU Is Arming Lawyers to Expose State Surveillance Secrets

General Dev>In The News

What is the Semantic Layer?

General Dev>In The News

Lossless Model Compression Experiment

General Dev>In The News

SleeperGem: RubyGems supply chain attack targets dormant maintainer accounts

General Dev>In The News

Devtalk ❯

We ❤️ helpful members!

We reward our most helpful members via our MOTM scheme - by giving away a whopping 25 books per year!

Sub Categories:

We're in Beta

About us Mission Statement See our Roadmap

ChunkLLM: A Lightweight Pluggable Framework for Accelerating LLMs Inference

CommunityNews

ChunkLLM: A Lightweight Pluggable Framework for Accelerating LLMs Inference

Most Liked

chris.johan

Where Next?

Popular Ai topics

Nvidia Uses AI to Slash Bandwidth on Video Calls

Combating Anti-Blackness in the AI Community

AI Is Discovering Patterns in Pure Mathematics That Have Never Been Seen Before

DeepMind AI learns simple physics like a baby

New AI assistant can browse, search, and use web apps like a human

Klarna CEO says the company stopped hiring a year ago because AI 'can already do all of the jobs'

The Great Displacement Is Already Well Underway

Ollama's new engine for multimodal models · Ollama Blog

Switching to Claude Code + VSCode inside Docker

Crush: The glamourous AI coding agent for your favourite terminal 💘

Other popular topics

Programming Machine Learning

Seven More Languages in Seven Weeks

Moonlander Keyboard (Mechanical) (Ergonomic) (Split) (Ortholinear)

Poll: Which keyboard layout do you use?

Dendron: a personal knowledge management tool on top of VSCode

The V Programming Language

Programming WebRTC

Spotlight: Rebecca Skinner (Author) Interview and AMA!

Clipboard readtext not working in android webview

Ash Framework

Sponsor Spotlight

AI>In The News

Latest on Devtalk

We ❤️ helpful members!

Devtalk Sponsors

Categories:

Sub Categories:

Popular Portals

Devtalk Sponsors

We're in Beta