CommunityNews

LLM Paper on Mamba MoE: Jamba Technical Report from AI2

Jamba: A Hybrid Transformer-Mamba Language Model.
We present Jamba, a new base large language model based on a novel hybrid Transformer-Mamba mixture-of-experts (MoE) architecture. Specifically, Jamba interleaves blocks of Transformer and Mamba layers, enjoying the benefits of both model families. MoE is added in some of these layers to increase model capacity while keeping active parameter usage manageable. This flexible architecture allows resource- and objective-specific configurations. In the particular configuration we have implemented, we end up with a powerful model that fits in a single 80GB GPU. Built at large scale, Jamba provides high throughput and small memory footprint compared to vanilla Transformers, and at the same time state-of-the-art performance on standard language model benchmarks and long-context evaluations. Remarkably, the model presents strong results for up to 256K tokens context length. We study various architectural decisions, such as how to combine Transformer and Mamba layers, and how to mix experts, and show that some of them are crucial in large scale modeling. We also describe several interesting properties of these architectures which the training and evaluation of Jamba have revealed, and plan to release checkpoints from various ablation runs, to encourage further exploration of this novel architecture. We make the weights of our implementation of Jamba publicly available under a permissive license.

Read in full here:

This thread was posted by one of our members via one of our news source trackers.

View thread on forum

#paper #llm

0 269 0

2024-04-02 02:18:51 UTC

Where Next?

View thread on forum

paper

llm

Home General Dev>In The News

#paper #llm

0 269 0

Last post

Popular General Dev topics

General Dev>In The News

Fuzix: A Unix-ish operating system for small machines by Alan Cox

FUZIX FUZIX is a fusion of various elements from the assorted UZI forks and branches beaten together into some kind of semi-coherent pla...

fuzix.org

#unix

0 1419 0

2021-01-04 22:15:21 UTC

New

General Dev>In The News

Permission.site (online tool to check browser permissions)

https://permission.site/ This thread was posted by one of our members via one of our news source trackers.

permission.site

22 1327 8

2021-04-02 19:09:03 UTC

New

General Dev>In The News

Emacs Typing Tutor

Last night I re-read this Steve Yegge article about learning to type as a programmer. I can touch type, but I don’t usually manage to bre...

connorberry.com

/emacs #typing

0 1099 0

2021-09-22 05:32:49 UTC

New

General Dev>In The News

Keyboard lets people type so fast it’s banned from typing competitions

This Keyboard Lets People Type So Fast It’s Banned From Typing Competitions. A new peripheral lets you keep typing without ever lifting ...

vice.com

#typing #type #keyboard

13 1663 6

2022-01-07 20:00:03 UTC

New

General Dev>In The News

A more mature ChatGPT competitor? PaLM + RLHF - Pytorch (wip)

GitHub - lucidrains/PaLM-rlhf-pytorch: Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architectur...

github.com

#chatgpt

0 1046 0

2022-12-30 00:01:13 UTC

New

General Dev>In The News

A Framework for Prioritizing Tech Debt

A Framework for Prioritizing Tech Debt. Leverage is a powerful tool that applies to many things, including the code we write. However, t...

maxcountryman.com

0 675 0

2023-01-19 16:21:31 UTC

New

General Dev>In The News

Why I like Clojure as a solo developer | Biff

Why I like Clojure as a solo developer | Biff. Most of the reasons fall into a few categories: data orientation, the JVM, and the REPL.

biffweb.com

/clojure

2 1071 2

2023-04-23 01:18:47 UTC

New

General Dev>In The News

Zig now has built-in HTTP server and client in std

zig/http.zig at 7cf2cbb33ef34c1d211135f56d30fe23b6cacd42 · ziglang/zig. General-purpose programming language and toolchain for maintaini...

github.com

/zig #http

0 2781 0

2023-05-19 00:35:41 UTC

New

General Dev>In The News

DreamBerd is a perfect programming language

GitHub - TodePond/DreamBerd: perfect programming language. perfect programming language. Contribute to TodePond/DreamBerd development by...

github.com

#programming

3 1301 2

2023-06-10 08:05:04 UTC

New

General Dev>In The News

SLUM: The Shadow Library Uptime Monitor

SLUM: The Shadow Library Uptime Monitor. This dashboard tracks the availability of popular shadow libraries in real time from a US-based...

open-slum.org

#library #monitor

0 648 0

2025-01-19 20:46:27 UTC

New

Other popular topics

General Dev>Dev Chat

Standing Desks

No chair. I have a standing desk. This post was split into a dedicated thread from our thread about chairs :slight_smile:

#workspace #opinions

177 8632 77

2022-09-27 18:40:05 UTC

New

Linux>Questions

What is the most minimalist Linux server distro?

I am asking for any distro that only has the bare-bones to be able to get a shell in the server and then just install the packages as we ...

#linux #servers

66 21694 24

2022-04-13 13:30:03 UTC

New

Backend>Questions

Please tell me how to write a query for this in nodejs

API 4 Path: /user/following/ Method: GET Description: Returns the list of all names of people whom the user follows Response [ { ...

/nodejs

7 3059 3

2021-06-23 23:49:49 UTC

New

Community>In The Spotlight

Spotlight: Dmitry Zinoviev (Author) Interview and AMA!

Author Spotlight Dmitry Zinoviev @aqsaqal Today we’re putting our spotlight on Dmitry Zinoviev, author of Data Science Essentials in ...

#author-spotlight /python /book-complex-network-analysis-in-python /book-data-science-essentials-in-python /book-resourceful-code-reuse /book-pythonic-programming

33 4801 14

2022-10-11 20:07:10 UTC

New

Game Dev>Questions

Can I use Java to program a game for Nintendo switch?

I am trying to crate a game for the Nintendo switch, I wanted to use Java as I am comfortable with that programming language. Can you use...

/java #nintendo

8 3528 3

2023-09-15 11:15:04 UTC

New

Community>In The Spotlight

Spotlight: Mike Riley (Author) Interview and AMA!

Author Spotlight Mike Riley @mriley This month, we turn the spotlight on Mike Riley, author of Portable Python Projects. Mike’s book ...

#author-spotlight /python #iot /book-portable-python-projects #internet-of-things

62 6351 19

2022-06-09 14:01:01 UTC

New

General Dev>Questions

Do you prefer regular mechanical keyboards or low profile mechanical keyboards and why?

I have always used antique keyboards like Cherry MX 1800 or Cherry MX 8100 and almost always have modified the switches in some way, like...

/keyboards #mechanical-keyboards

27 2843 9

2023-02-06 21:10:15 UTC

New

Community>In The Spotlight

Spotlight: Bruce Tate (Author) Interview and AMA!

Author Spotlight: Bruce Tate @redrapids Programming languages always emerge out of need, and if that’s not always true, they’re defin...

/elixir /ruby /phoenix /book-seven-more-languages-in-seven-weeks /book-seven-languages-in-seven-weeks #liveview /book-programming-phoenix-liveview

54 4591 23

2023-10-17 17:14:03 UTC

New

Backend>Questions

Psql: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed: No such file or directory

If you’re getting errors like this: psql: error: connection to server on socket “/tmp/.s.PGSQL.5432” failed: No such file or directory ...

#macos /rails /postgresql

1 2188 1

2024-10-17 02:03:48 UTC

New

AI>In The News

DeepSeek (671B) running on a cluster of 8 Mac Mini Pros with 64GB RAM each

This is cool! DEEPSEEK-V3 ON M4 MAC: BLAZING FAST INFERENCE ON APPLE SILICON We just witnessed something incredible: the largest open-s...

#ai #macs /deepseek

0 3570 1

2025-01-29 18:43:37 UTC

New

General Dev>In The News

Replacing cron jobs with a centralized task scheduler

General Dev>In The News

Age Verification Doesn’t Need to Be a Privacy Footgun - Dhole Moments

General Dev>In The News

Every Satellite Orbiting Earth and Who Owns Them

General Dev>In The News

Freestyle Documentation - How we revamped our Docs for AI

General Dev>In The News

Dark patterns: tricks to make you spend more online

General Dev>In The News

2000 words about arrays and tables

General Dev>In The News

Optician Sans – Free font based on historical optotypes

General Dev>In The News

The Hype is the Product

General Dev>In The News

Writing memory efficient C structs

General Dev>In The News

Opsqueue: lightweight batch processing queue for heavy loads

General Dev>In The News

General Dev In The News ❯

Latest on Devtalk

Replacing cron jobs with a centralized task scheduler

General Dev>In The News

Age Verification Doesn’t Need to Be a Privacy Footgun - Dhole Moments

General Dev>In The News

LLM Leaderboard - Comparison of over 100 AI models from OpenAI, Google, DeepSeek & others | Artificial Analysis

AI>In The News

Every Satellite Orbiting Earth and Who Owns Them

General Dev>In The News

Developer survey shows trust in AI coding tools is falling as usage rises

AI>In The News

Scala 3.7.2 is now available!

Backend>Official News

openSUSE: Tumbleweed Monthly Update - July 2025

Linux>Official News

China claims Nvidia built backdoor into H20 chip designed for Chinese market

AI>In The News

Freestyle Documentation - How we revamped our Docs for AI

General Dev>In The News

Dark patterns: tricks to make you spend more online

General Dev>In The News

QUIC for the kernel

Linux>In The News

Node.js v24.5.0 and v22.18.0 released!

Backend>Official News

TypeScript v5.9.2 released!

Frontend>Official News

Tutorial Deploy Phoenix 1.8 with Coolify on Hetzner

Backend>Blogs/Talks

Djangonaut Space is looking for contributors to be mentors

Backend>Official News

Devtalk ❯

We ❤️ helpful members!

We reward our most helpful members via our MOTM scheme - by giving away a whopping 25 books per year!

Sub Categories:

We're in Beta

About us Mission Statement See our Roadmap

LLM Paper on Mamba MoE: Jamba Technical Report from AI2

CommunityNews

LLM Paper on Mamba MoE: Jamba Technical Report from AI2

Where Next?

Popular General Dev topics

Fuzix: A Unix-ish operating system for small machines by Alan Cox

Permission.site (online tool to check browser permissions)

Emacs Typing Tutor

Keyboard lets people type so fast it’s banned from typing competitions

A more mature ChatGPT competitor? PaLM + RLHF - Pytorch (wip)

A Framework for Prioritizing Tech Debt

Why I like Clojure as a solo developer | Biff

Zig now has built-in HTTP server and client in std

DreamBerd is a perfect programming language

SLUM: The Shadow Library Uptime Monitor

Other popular topics

Standing Desks

What is the most minimalist Linux server distro?

Please tell me how to write a query for this in nodejs

Spotlight: Dmitry Zinoviev (Author) Interview and AMA!

Can I use Java to program a game for Nintendo switch?

Spotlight: Mike Riley (Author) Interview and AMA!

Do you prefer regular mechanical keyboards or low profile mechanical keyboards and why?

Spotlight: Bruce Tate (Author) Interview and AMA!

Psql: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed: No such file or directory

DeepSeek (671B) running on a cluster of 8 Mac Mini Pros with 64GB RAM each

Sponsor Spotlight

General Dev>In The News

Latest on Devtalk

We ❤️ helpful members!

Devtalk Sponsors

Categories:

Sub Categories:

Popular Portals

Devtalk Sponsors

We're in Beta