CommunityNews

CommunityNews

LLM Paper on Mamba MoE: Jamba Technical Report from AI2

Jamba: A Hybrid Transformer-Mamba Language Model.
We present Jamba, a new base large language model based on a novel hybrid Transformer-Mamba mixture-of-experts (MoE) architecture. Specifically, Jamba interleaves blocks of Transformer and Mamba layers, enjoying the benefits of both model families. MoE is added in some of these layers to increase model capacity while keeping active parameter usage manageable. This flexible architecture allows resource- and objective-specific configurations. In the particular configuration we have implemented, we end up with a powerful model that fits in a single 80GB GPU. Built at large scale, Jamba provides high throughput and small memory footprint compared to vanilla Transformers, and at the same time state-of-the-art performance on standard language model benchmarks and long-context evaluations. Remarkably, the model presents strong results for up to 256K tokens context length. We study various architectural decisions, such as how to combine Transformer and Mamba layers, and how to mix experts, and show that some of them are crucial in large scale modeling. We also describe several interesting properties of these architectures which the training and evaluation of Jamba have revealed, and plan to release checkpoints from various ablation runs, to encourage further exploration of this novel architecture. We make the weights of our implementation of Jamba publicly available under a permissive license.

Read in full here:

This thread was posted by one of our members via one of our news source trackers.

Where Next?

Popular General Dev topics Top

AstonJ
Apple co-founder Steve Wozniak is suing YouTube for allegedly allowing scammers to use images and videos of him to defraud people. The s...
New
New
First poster: bot
FUZIX FUZIX is a fusion of various elements from the assorted UZI forks and branches beaten together into some kind of semi-coherent pla...
New
First poster: dimitarvp
skiftOS is a simple, handmade operating system for the x86 platform, aiming for clean and pretty APIs while keeping the spirit of UNIX. s...
New
First poster: AstonJ
https://permission.site/ This thread was posted by one of our members via one of our news source trackers.
New
First poster: dpritchett
It’s not what programming languages do, it’s what they shepherd you to. How many of you have listened, read or taken part in a discussio...
New
First poster: cpgo
8 reasons to ditch Chrome and switch to Firefox. Chrome may dominate, but Firefox is a known name among browsers for a reason. Whether y...
New
First poster: dani
The pool of talented C++ developers is running dry. Highly sought after, rarely provided.
New
First poster: KnowledgeIsPower
Building a Slack/Discord alternative with Tauri/Rust linen <span class="hashtag-icon-placeholder"></span>blog. Introduction My name is K...
New
First poster: AstonJ
Jan | Rethink the Computer. Jan turns your computer into an AI machine by running LLMs locally on your computer. It’s a privacy-focus, l...
New

Other popular topics Top

AstonJ
You might be thinking we should just ask who’s not using VSCode :joy: however there are some new additions in the space that might give V...
New
AstonJ
There’s a whole world of custom keycaps out there that I didn’t know existed! Check out all of our Keycaps threads here: https://forum....
New
mafinar
Crystal recently reached version 1. I had been following it for awhile but never got to really learn it. Most languages I picked up out o...
New
wmnnd
Here’s the story how one of the world’s first production deployments of LiveView came to be - and how trying to improve it almost caused ...
New
mafinar
This is going to be a long an frequently posted thread. While talking to a friend of mine who has taken data structure and algorithm cou...
New
Help
I am trying to crate a game for the Nintendo switch, I wanted to use Java as I am comfortable with that programming language. Can you use...
New
First poster: bot
Large Language Models like ChatGPT say The Darnedest Things. The Errors They MakeWhy We Need to Document Them, and What We Have Decided ...
New
sir.laksmana_wenk
I’m able to do the “artistic” part of game-development; character designing/modeling, music, environment modeling, etc. However, I don’t...
New
AnfaengerAlex
Hello, I’m a beginner in Android development and I’m facing an issue with my project setup. In my build.gradle.kts file, I have the foll...
New
AstonJ
Curious what kind of results others are getting, I think actually prefer the 7B model to the 32B model, not only is it faster but the qua...
New