CommunityNews

CommunityNews

MM1: Methods, Analysis and Insights from Multimodal LLM Pre-training

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training.
In this work, we discuss building performant Multimodal Large Language Models (MLLMs). In particular, we study the importance of various architecture components and data choices. Through careful and comprehensive ablations of the image encoder, the vision language connector, and various pre-training data choices, we identified several crucial design lessons. For example, we demonstrate that for large-scale multimodal pre-training using a careful mix of image-caption, interleaved image-text, and text-only data is crucial for achieving state-of-the-art (SOTA) few-shot results across multiple benchmarks, compared to other published pre-training results. Further, we show that the image encoder together with image resolution and the image token count has substantial impact, while the vision-language connector design is of comparatively negligible importance. By scaling up the presented recipe, we build MM1, a family of multimodal models up to 30B parameters, consisting of both dense models and mixture-of-experts (MoE) variants, that are SOTA in pre-training metrics and achieve competitive performance after supervised fine-tuning on a range of established multimodal benchmarks. Thanks to large-scale pre-training, MM1 enjoys appealing properties such as enhanced in-context learning, and multi-image reasoning, enabling few-shot chain-of-thought prompting.

Read in full here:

This thread was posted by one of our members via one of our news source trackers.

Where Next?

Popular General Dev topics Top

First poster: bot
A field guide to help you recognize achievement, spot A field guide to help you recognize achievement, spot bottlenecks, and debug your d...
New
First poster: iPaul
TOKYO (Kyodo) – Japan’s government plans to encourage firms to let their employees choose to work four days a week instead of five, aimin...
New
First poster: bot
Neovim nightly, v0.5.0 and v0.4.4 has been released. Link: Release Nvim development (prerelease) build · neovim/neovim · GitHub Link:...
New
First poster: dpritchett
It’s not what programming languages do, it’s what they shepherd you to. How many of you have listened, read or taken part in a discussio...
New
First poster: mindriot
LG 28-inch 16:18 DualUp Monitor with Ergo Stand and USB Type-C™ (28MQ780-B) | LG USA. Shop LG 28MQ780-B on the official LG.com website ...
New
First poster: dani
The pool of talented C++ developers is running dry. Highly sought after, rarely provided.
New
First poster: AstonJ
Jan | Rethink the Computer. Jan turns your computer into an AI machine by running LLMs locally on your computer. It’s a privacy-focus, l...
New
First poster: joeb
The new frameworks will continue until morale improves.
/js
New
First poster: alvinkatojr
There are countless articles why developers should not focus on Frameworks too much and instead learn to understand the underlying langua...
New
New

Other popular topics Top

ohm
Which, if any, games do you play? On what platform? I just bought (and completed) Minecraft Dungeons for my Nintendo Switch. Other than ...
New
axelson
I’ve been really enjoying obsidian.md: It is very snappy (even though it is based on Electron). I love that it is all local by defaul...
New
siddhant3030
I’m thinking of buying a monitor that I can rotate to use as a vertical monitor? Also, I want to know if someone is using it for program...
New
New
AstonJ
I have seen the keycaps I want - they are due for a group-buy this week but won’t be delivered until October next year!!! :rofl: The Ser...
New
mafinar
This is going to be a long an frequently posted thread. While talking to a friend of mine who has taken data structure and algorithm cou...
New
PragmaticBookshelf
Rails 7 completely redefines what it means to produce fantastic user experiences and provides a way to achieve all the benefits of single...
New
PragmaticBookshelf
Author Spotlight: Karl Stolley @karlstolley Logic! Rhetoric! Prag! Wow, what a combination. In this spotlight, we sit down with Karl ...
New
hilfordjames
There appears to have been an update that has changed the terminology for what has previously been known as the Taskbar Overflow - this h...
New
PragmaticBookshelf
Author Spotlight: Tammy Coron @Paradox927 Gaming, and writing games in particular, is about passion, vision, experience, and immersio...
New