CommunityNews

CommunityNews

MM1: Methods, Analysis and Insights from Multimodal LLM Pre-training

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training.
In this work, we discuss building performant Multimodal Large Language Models (MLLMs). In particular, we study the importance of various architecture components and data choices. Through careful and comprehensive ablations of the image encoder, the vision language connector, and various pre-training data choices, we identified several crucial design lessons. For example, we demonstrate that for large-scale multimodal pre-training using a careful mix of image-caption, interleaved image-text, and text-only data is crucial for achieving state-of-the-art (SOTA) few-shot results across multiple benchmarks, compared to other published pre-training results. Further, we show that the image encoder together with image resolution and the image token count has substantial impact, while the vision-language connector design is of comparatively negligible importance. By scaling up the presented recipe, we build MM1, a family of multimodal models up to 30B parameters, consisting of both dense models and mixture-of-experts (MoE) variants, that are SOTA in pre-training metrics and achieve competitive performance after supervised fine-tuning on a range of established multimodal benchmarks. Thanks to large-scale pre-training, MM1 enjoys appealing properties such as enhanced in-context learning, and multi-image reasoning, enabling few-shot chain-of-thought prompting.

Read in full here:

This thread was posted by one of our members via one of our news source trackers.

Where Next?

Popular General Dev topics Top

First poster: iPaul
TOKYO (Kyodo) – Japan’s government plans to encourage firms to let their employees choose to work four days a week instead of five, aimin...
New
First poster: bot
Neovim nightly, v0.5.0 and v0.4.4 has been released. Link: Release Nvim development (prerelease) build · neovim/neovim · GitHub Link:...
New
First poster: mafinar
F# Is The Best Coding Language Today. If you want to personally pick up a programming language in order to become a better coder in what...
New
First poster: bot
SPWN is a programming language that compiles to Geometry Dash levels. What that means is that you can create levels by using not only the...
New
First poster: bot
Kinesis Advantage360 Ergonomic Keyboard. Split-adjustable, contoured design that maximizes comfort and boosts productivity. Mechanical s...
New
First poster: bot
Flipper Zero is a portable multi-tool for pentesters and geeks in a toy-like body. It loves hacking digital stuff, such as radio protocol...
New
First poster: bot
The overengineered Solution to my Pigeon Problem. TL;DR: I built a wifi-equipped water gun to shoot the pigeons on my balcony, controlle...
New
First poster: Korbin73
Whatever happened to Elm, anyway?. I see this question pop up quite frequently in lots of different arenas - folks are curious as to wha...
New
First poster: bot
Declarative GNOME configuration with NixOS. I adore tinkering with my machine, trying new tools, extensions, themes, and ideas. When I w...
New
CommunityNews
9 fintech engineering mistakes. Read this list unless you want to build a money dissappearing system
New

Other popular topics Top

wolf4earth
@AstonJ prompted me to open this topic after I mentioned in the lockdown thread how I started to do a lot more for my fitness. https://f...
New
AstonJ
Thanks to @foxtrottwist’s and @Tomas’s posts in this thread: Poll: Which code editor do you use? I bought Onivim! :nerd_face: https://on...
New
AstonJ
Do the test and post your score :nerd_face: :keyboard: If possible, please add info such as the keyboard you’re using, the layout (Qw...
New
Exadra37
Oh just spent so much time on this to discover now that RancherOS is in end of life but Rancher is refusing to mark the Github repo as su...
New
Maartz
Hi folks, I don’t know if I saw this here but, here’s a new programming language, called Roc Reminds me a bit of Elm and thus Haskell. ...
New
mafinar
This is going to be a long an frequently posted thread. While talking to a friend of mine who has taken data structure and algorithm cou...
New
AstonJ
We’ve talked about his book briefly here but it is quickly becoming obsolete - so he’s decided to create a series of 7 podcasts, the firs...
New
DevotionGeo
I have always used antique keyboards like Cherry MX 1800 or Cherry MX 8100 and almost always have modified the switches in some way, like...
New
Margaret
Ask Me Anything with Mark Volkmann @mvolkmann On February 24 and 25, we are giving you a chance to ask questions of PragProg author M...
New
RobertRichards
Hair Salon Games for Girls Fun Girls Hair Saloon game is mainly developed for kids. This game allows users to select virtual avatars to ...
New