CommunityNews

CommunityNews

Zebra-Llama: Towards Extremely Efficient Hybrid Models

With the growing demand for deploying large language models (LLMs) across diverse applications, improving their inference efficiency is crucial for sustainable and democratized access. However, retraining LLMs to meet new user-specific requirements is prohibitively expensive and environmentally unsustainable. In this work, we propose a practical and scalable alternative: composing efficient hybrid language models from existing pre-trained models. Our approach, Zebra-Llama, introduces a family of 1B, 3B, and 8B hybrid models by combining State Space Models (SSMs) and Multi-head Latent Attention (MLA) layers, using a refined initialization and post-training pipeline to efficiently transfer knowledge from pre-trained Transformers. Zebra-Llama achieves Transformer-level accuracy with near-SSM efficiency using only 7-11B training tokens (compared to trillions of tokens required for pre-training) and an 8B teacher. Moreover, Zebra-Llama dramatically reduces KV cache size -down to 3.9%, 2%, and 2.73% of the original for the 1B, 3B, and 8B variants, respectively-while preserving 100%, 100%, and >97% of average zero-shot performance on LM Harness tasks. Compared to models like MambaInLLaMA, X-EcoMLA, Minitron, and Llamba, Zebra-Llama consistently delivers competitive or superior accuracy while using significantly fewer tokens, smaller teachers, and vastly reduced KV cache memory. Notably, Zebra-Llama-8B surpasses Minitron-8B in few-shot accuracy by 7% while using 8x fewer training tokens, over 12x smaller KV cache, and a smaller teacher (8B vs. 15B). It also achieves 2.6x-3.8x higher throughput (tokens/s) than MambaInLlama up to a 32k context length. We will release code and model checkpoints upon acceptance.

Read in full here:

Where Next?

Popular Ai topics Top

New
First poster: bot
A research group has taught AI to magnetically wrangle a high-powered stream of plasma used for fusion research — but wait! Put away your...
New
CommunityNews
We present Imagen, a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understandin...
New
First poster: bot
Upcoming “Hopper” GPU broke records in its MLPerf debut, according to Nvidia.
New
First poster: bot
Not to be outdone by Meta, Google’s AI generator can output 1280x768 HD video at 24 fps.
New
First poster: bot
Ghostwriter generates, completes, or transforms code in 16 languages, similar to GitHub Copilot.
New
alvinkatojr
This was/is a great read that counters the common “woe is me” fear of AI. Author knows his stuff and breaks down the 8 fallacies tied to...
New
First poster: alvinkatojr
Giving AI systems the ability to focus on particular brain regions can make them much better at reconstructing images of what a monkey is...
New
First poster: mercyf
It replicates your development environment and takes up to 30 minutes per task.
New
First poster: jkdiaz
TechCrunch spoke to experienced coders about their time using AI-generated code about what they see as the future of vibe coding.
New

Other popular topics Top

PragmaticBookshelf
Learn from the award-winning programming series that inspired the Elixir language, and go on a step-by-step journey through the most impo...
New
DevotionGeo
I know that -t flag is used along with -i flag for getting an interactive shell. But I cannot digest what the man page for docker run com...
New
New
AstonJ
poll poll Be sure to check out @Dusty’s article posted here: An Introduction to Alternative Keyboard Layouts It’s one of the best write-...
New
New
foxtrottwist
A few weeks ago I started using Warp a terminal written in rust. Though in it’s current state of development there are a few caveats (tab...
New
mafinar
This is going to be a long an frequently posted thread. While talking to a friend of mine who has taken data structure and algorithm cou...
New
AstonJ
We’ve talked about his book briefly here but it is quickly becoming obsolete - so he’s decided to create a series of 7 podcasts, the firs...
New
PragmaticBookshelf
Author Spotlight Jamis Buck @jamis This month, we have the pleasure of spotlighting author Jamis Buck, who has written Mazes for Prog...
New
PragmaticBookshelf
Author Spotlight Rebecca Skinner @RebeccaSkinner Welcome to our latest author spotlight, where we sit down with Rebecca Skinner, auth...
New