CommunityNews

Zebra-Llama: Towards Extremely Efficient Hybrid Models

With the growing demand for deploying large language models (LLMs) across diverse applications, improving their inference efficiency is crucial for sustainable and democratized access. However, retraining LLMs to meet new user-specific requirements is prohibitively expensive and environmentally unsustainable. In this work, we propose a practical and scalable alternative: composing efficient hybrid language models from existing pre-trained models. Our approach, Zebra-Llama, introduces a family of 1B, 3B, and 8B hybrid models by combining State Space Models (SSMs) and Multi-head Latent Attention (MLA) layers, using a refined initialization and post-training pipeline to efficiently transfer knowledge from pre-trained Transformers. Zebra-Llama achieves Transformer-level accuracy with near-SSM efficiency using only 7-11B training tokens (compared to trillions of tokens required for pre-training) and an 8B teacher. Moreover, Zebra-Llama dramatically reduces KV cache size -down to 3.9%, 2%, and 2.73% of the original for the 1B, 3B, and 8B variants, respectively-while preserving 100%, 100%, and >97% of average zero-shot performance on LM Harness tasks. Compared to models like MambaInLLaMA, X-EcoMLA, Minitron, and Llamba, Zebra-Llama consistently delivers competitive or superior accuracy while using significantly fewer tokens, smaller teachers, and vastly reduced KV cache memory. Notably, Zebra-Llama-8B surpasses Minitron-8B in few-shot accuracy by 7% while using 8x fewer training tokens, over 12x smaller KV cache, and a smaller teacher (8B vs. 15B). It also achieves 2.6x-3.8x higher throughput (tokens/s) than MambaInLlama up to a 32k context length. We will release code and model checkpoints upon acceptance.

Read in full here:

View thread on forum

#llama

0 54 0

2025-12-07 16:34:38 UTC

Where Next?

View thread on forum

llama

Home AI>In The News

#llama

0 54 0

Last post

Popular Ai topics

AI>In The News

DALL·E: Creating Images from Text

bbc.co.uk

#ai

0 1502 0

2021-01-06 17:00:34 UTC

New

AI>In The News

AI: Ghost workers demand to be seen and heard

Artificial intelligence and machine learning exist on the back of a lot of hard work from humans. Alongside the scientists, there are th...

bbc.co.uk

#ai

0 1326 0

2021-03-29 13:24:19 UTC

New

AI>In The News

The Evolution of AI in the USA, 1956-1996

BROKEN PROMISES & EMPTY THREATS: THE EVOLUTION OF AI IN THE USA, 1956-1996 Artificial Intelligence (AI) is once again a promising tec...

technologystories.org

0 1612 0

2021-12-06 23:09:27 UTC

New

AI>In The News

Why cows may be hiding something but AI can spot it

bbc.co.uk

#spot

0 1004 0

2022-02-01 15:09:12 UTC

New

AI>In The News

How to fix the eyes in AI-generated images

aidemos.info

0 4508 0

2022-09-10 13:54:33 UTC

New

AI>In The News

OpenJourney: Midjourney, but Open Source

OpenJourney is a Text-to-Image AI model which has the goal of bringing an open source equivalent to Midjourney to the people. It is curre...

open-journey.github.io

0 2151 0

2023-01-26 03:25:56 UTC

New

AI>In The News

Ollama's new engine for multimodal models · Ollama Blog

Ollama now supports new multimodal models with its new engine.

ollama.com

#blog #ollama

0 786 0

2025-05-16 14:30:19 UTC

New

AI>In The News

OpenAI introduces Codex, its first full-fledged AI agent for coding

It replicates your development environment and takes up to 30 minutes per task.

arstechnica.com

#coding #openai

3 733 4

2025-05-20 23:35:09 UTC

New

AI>In The News

AI could already be conscious. Are we ready for it?

With a leap in the evolution of large language models, some leading thinkers are questioning whether AI might become sentient

bbc.com

6 831 7

2025-06-19 04:40:45 UTC

New

AI>In The News

AI Changes Everything

From fear to optimism: why I am convinced AI is worth embracing.

lucumr.pocoo.org

4 854 5

2025-07-10 05:21:15 UTC

New

Other popular topics

Game Dev>Learning Resources

The Ray Tracer Challenge

Brace yourself for a fun challenge: build a photorealistic 3D renderer from scratch! In just a couple of weeks, build a ray tracer that r...

pragprog.com

#pragprog #published-book /book-the-ray-tracer-challenge #algorithms

3 6115 0

2020-09-22 14:26:56 UTC

New

Science/Tech>Tech Chat

Games! Which do you play?

Which, if any, games do you play? On what platform? I just bought (and completed) Minecraft Dungeons for my Nintendo Switch. Other than ...

#games

246 6097 101

2024-08-22 11:09:29 UTC

New

General Dev>Hardware

Custom keyboard keycaps

There’s a whole world of custom keycaps out there that I didn’t know existed! Check out all of our Keycaps threads here: https://forum....

#hardware /keyboards #keycaps #mechanical-keyboards

15 11086 19

2023-07-27 16:30:57 UTC

New

Backend>Questions

Erlang's not installing on macOS Big Sur "You are natively building Erlang/OTP for a later version of MacOSX than current version"

Just done a fresh install of macOS Big Sur and on installing Erlang I am getting: asdf install erlang 23.1.2 Configure failed. checking ...

#macos /erlang #big-sur #asdf

10 6212 8

2021-01-16 12:33:23 UTC

New

Data Science

Can AI/ML predict a lottery win?

Biggest jackpot ever apparently! :upside_down_face: I don’t (usually) gamble/play the lottery, but working on a program to predict the...

#ai #machine-learning

19 3939 10

2021-10-18 19:01:41 UTC

New

General Dev>Dev Chat

Warp—The blazingly fast, Rust-based terminal

A few weeks ago I started using Warp a terminal written in rust. Though in it’s current state of development there are a few caveats (tab...

/rust #terminal

52 6785 22

2025-02-26 17:47:24 UTC

New

Game Dev>Questions

Can I use Java to program a game for Nintendo switch?

I am trying to crate a game for the Nintendo switch, I wanted to use Java as I am comfortable with that programming language. Can you use...

/java #nintendo

8 4771 3

2023-09-15 11:15:04 UTC

New

Community>In The Spotlight

Spotlight: Mike Riley (Author) Interview and AMA!

Author Spotlight Mike Riley @mriley This month, we turn the spotlight on Mike Riley, author of Portable Python Projects. Mike’s book ...

#author-spotlight /python #iot /book-portable-python-projects #internet-of-things

62 7035 19

2022-06-09 14:01:01 UTC

New

Windows>Chat

Taskbar Overflow Menu (NOT System Tray Overflow)

There appears to have been an update that has changed the terminology for what has previously been known as the Taskbar Overflow - this h...

#taskbar-overflow-win-11

3 3715 2

2023-02-13 08:43:55 UTC

New

Backend>Learning Resources

Simplicity

Fight complexity and reclaim the original spirit of agility by learning to simplify how you develop software. The result: a more humane a...

pragprog.com

#pragprog #published-book /book-simplicity

10 6553 8

2025-03-14 21:53:12 UTC

New

AI>In The News

Grammarly pulls AI author-impersonation tool after backlash

AI>In The News

Anthropic’s Claude AI can respond with charts, diagrams, and other visuals now

AI>In The News

An AI Agent Published a Hit Piece on Me – The Operator Came Forward

AI>In The News

Consistency diffusion language models: Up to 14x faster inference without sacrificing quality

AI>In The News

Claude for Open Source

AI>In The News

LLMs Are Good at SQL. We Gave Ours Terabytes of CI Logs

AI>In The News

‘Unbelievably dangerous’: experts sound alarm after ChatGPT Health fails to recognise medical emergencies

AI>In The News

The 8 Levels of Agentic Engineering

AI>In The News

Anthropic sues US over blacklisting; White House calls firm "radical left, woke"

AI>In The News

Meta acquires Moltbook, the AI agent social network

AI>In The News

AI In The News ❯

Latest on Devtalk

You Deleted Everything and AWS Is Still Charging You | Jonathan Vogel

General Dev>In The News

The wild six weeks for NanoClaw’s creator that led to a deal with Docker | TechCrunch

General Dev>In The News

The Cost of Delegation

AI>Blogs/Talks

Grammarly pulls AI author-impersonation tool after backlash

AI>In The News

Anthropic’s Claude AI can respond with charts, diagrams, and other visuals now

AI>In The News

Google Chrome is coming to Arm-powered Linux devices later this year

Linux>In The News

Nova v0.13.10 released!

Backend>Official News

Quarkus 3.34.0.CR1 released!

Backend>Official News

Spring v6.2.17 released!

Backend>Official News

FreeCAD: Your own 3D parametric modeler

General Dev>In The News

A Beginner’s Guide to Split Keyboards

General Dev>In The News

An AI Agent Published a Hit Piece on Me – The Operator Came Forward

AI>In The News

Consistency diffusion language models: Up to 14x faster inference without sacrificing quality

AI>In The News

Apple's MacBook Neo makes repairs easier and cheaper than other MacBooks

macOS>In The News

Looking for a Partner to help lead our small Indie Studio (Rev-Share)

Game Dev>Jobs

Devtalk ❯

We ❤️ helpful members!

We reward our most helpful members via our MOTM scheme - by giving away a whopping 25 books per year!

Sub Categories:

We're in Beta

About us Mission Statement See our Roadmap

Zebra-Llama: Towards Extremely Efficient Hybrid Models

CommunityNews

Zebra-Llama: Towards Extremely Efficient Hybrid Models

Where Next?

Popular Ai topics

DALL·E: Creating Images from Text

AI: Ghost workers demand to be seen and heard

The Evolution of AI in the USA, 1956-1996

Why cows may be hiding something but AI can spot it

How to fix the eyes in AI-generated images

OpenJourney: Midjourney, but Open Source

Ollama's new engine for multimodal models · Ollama Blog

OpenAI introduces Codex, its first full-fledged AI agent for coding

AI could already be conscious. Are we ready for it?

AI Changes Everything

Other popular topics

The Ray Tracer Challenge

Games! Which do you play?

Custom keyboard keycaps

Erlang's not installing on macOS Big Sur "You are natively building Erlang/OTP for a later version of MacOSX than current version"

Can AI/ML predict a lottery win?

Warp—The blazingly fast, Rust-based terminal

Can I use Java to program a game for Nintendo switch?

Spotlight: Mike Riley (Author) Interview and AMA!

Taskbar Overflow Menu (NOT System Tray Overflow)

Simplicity

Sponsor Spotlight

AI>In The News

Latest on Devtalk

We ❤️ helpful members!

Devtalk Sponsors

Categories:

Sub Categories:

Popular Portals

Devtalk Sponsors

We're in Beta