Benjamin Muller (@ben_mlr) 's Twitter Profile
Benjamin Muller

@ben_mlr

Research in AI. Focusing on scaling language models multi-modally & multilingually. Llama pretraining team @AIatMeta

ID: 722490639359709184

linkhttp://benjamin-mlr.github.io calendar_today19-04-2016 18:22:46

188 Tweet

924 Takipçi

1,1K Takip Edilen

Armen Aghajanyan (@armenagha) 's Twitter Profile Photo

A restricted, safety aligned (no-image-out) version of Chameleon (7B/34B) is now open-weight! github.com/facebookresear… The team strongly believes in open-source. We had to do a lot of work to get this out to the public safely. Congrats to the Chameleon team!

Benjamin Muller (@ben_mlr) 's Twitter Profile Photo

It was great to present the Spirit-LM model with tuanh208 Spirit-LM is a foundation model that jointly learns text and expressive speech based on Llama 2. Thanks TwelveLabs (twelvelabs.io) for organizing the webinar Arxiv available here for more details: arxiv.org/abs/2402.05755

Soumith Chintala (@soumithchintala) 's Twitter Profile Photo

I'm giving the opening Keynote at ICML 2024 on Tuesday the 23rd @ 9:30am CEST. I'll try empower folks to get Open Science back on track -- the free discussion of ideas is such an important aspect of AI progress, and we've been losing track. This is a complex topic, and I wont

I'm giving the opening Keynote at ICML 2024 on Tuesday the 23rd @ 9:30am CEST.
I'll try empower folks to get Open Science back on track -- the free discussion of ideas is such an important aspect of AI progress, and we've been losing track.
This is a complex topic, and I wont
Laurens van der Maaten (@lvdmaaten) 's Twitter Profile Photo

So… we trained a model and we wrote a paper about it. Have fun y’all! llama.meta.com/llama-download… ai.meta.com/research/publi…

AI at Meta (@aiatmeta) 's Twitter Profile Photo

Starting today, open source is leading the way. Introducing Llama 3.1: Our most capable models yet. Today we’re releasing a collection of new Llama 3.1 models including our long awaited 405B. These models deliver improved reasoning capabilities, a larger 128K token context

AI at Meta (@aiatmeta) 's Twitter Profile Photo

LLM Evaluations are an important area of work — today we're announcing a new LLM Evaluation Research Grant to foster further innovation in this area. Recipients will get $200K in funding to support this work. We're accepting proposals until September 6 ➡️ go.fb.me/eym3xq

LLM Evaluations are an important area of work — today we're announcing a new LLM Evaluation Research Grant to foster further innovation in this area.

Recipients will get $200K in funding to support this work. We're accepting proposals until September 6 ➡️ go.fb.me/eym3xq
Chunting Zhou (@violet_zct) 's Twitter Profile Photo

Introducing *Transfusion* - a unified approach for training models that can generate both text and images. arxiv.org/pdf/2408.11039 Transfusion combines language modeling (next token prediction) with diffusion to train a single transformer over mixed-modality sequences. This

Introducing *Transfusion* - a unified approach for training models that can generate both text and images. arxiv.org/pdf/2408.11039

Transfusion combines language modeling (next token prediction) with diffusion to train a single transformer over mixed-modality sequences. This
Andrew Brown (@andrew__brown__) 's Twitter Profile Photo

OK here goes the "excited to share ...." post Want to know how to train a T2V model (with other amazing capabilities) that beats ALL prior work ?? Well we released a 90 page tech report with every detail 😊 ai.meta.com/research/movie…… Thanks to the amazing team!

Benjamin Muller (@ben_mlr) 's Twitter Profile Photo

Recent LLMs (e.g. LLama 3 🦙) are increasingly good at Math. However, this progress is reserved for languages with large amounts of task-specific instruct-tuning data. In this work AI at Meta (led by Lucas Bandarkar ), we introduce a new model merging technique called **Layer

Xiang Yue@ICLR2025🇸🇬 (@xiangyue96) 's Twitter Profile Photo

🌍 I’ve always had a dream of making AI accessible to everyone, regardless of location or language. However, current open MLLMs often respond in English, even to non-English queries! 🚀 Introducing Pangea: A Fully Open Multilingual Multimodal LLM supporting 39 languages! 🌐✨

🌍 I’ve always had a dream of making AI accessible to everyone, regardless of location or language. However, current open MLLMs often respond in English, even to non-English queries!

🚀 Introducing Pangea: A Fully Open Multilingual Multimodal LLM supporting 39 languages! 🌐✨
Benjamin Muller (@ben_mlr) 's Twitter Profile Photo

Groundbreaking scaling trends for Byte-level Language Modeling with the new BLT architecture 🚀 More insights in the thread 🧵

AI at Meta (@aiatmeta) 's Twitter Profile Photo

New from Meta FAIR — Byte Latent Transformer: Patches Scale Better Than Tokens introduces BLT, which for the first time, matches tokenization-based LLM performance at scale with significant improvements in inference efficiency & robustness. Paper ➡️ go.fb.me/w23lmz

New from Meta FAIR — Byte Latent Transformer: Patches Scale Better Than Tokens introduces BLT, which for the first time, matches tokenization-based LLM performance at scale with significant improvements in inference efficiency & robustness.

Paper ➡️ go.fb.me/w23lmz
Gargi Ghosh (@gargighosh) 's Twitter Profile Photo

We released new research - Byte Latent Transformer(BLT) BLT encodes bytes into dynamic patches using light-weight local models and processes them with a large latent transformer. Think of it as a transformer sandwich!

We released new research - Byte Latent Transformer(BLT)
BLT encodes bytes into dynamic patches using light-weight local models and processes them with a large latent transformer. Think of it as a transformer sandwich!
Jason Weston (@jaseweston) 's Twitter Profile Photo

🚨 Diverse Preference Optimization (DivPO) 🚨 SOTA LLMs have model collapse🫠: they can't generate diverse creative writing or synthetic data 🎨 DivPO trains for both high reward & diversity, vastly improving variety with similar quality. Paper 📝: arxiv.org/abs/2501.18101 🧵below

🚨 Diverse Preference Optimization (DivPO) 🚨
SOTA LLMs have model collapse🫠: they can't generate diverse creative writing or synthetic data
🎨 DivPO trains for both high reward & diversity, vastly improving variety with similar quality.
Paper 📝: arxiv.org/abs/2501.18101
🧵below
AI at Meta (@aiatmeta) 's Twitter Profile Photo

Today is the start of a new era of natively multimodal AI innovation. Today, we’re introducing the first Llama 4 models: Llama 4 Scout and Llama 4 Maverick — our most advanced models yet and the best in their class for multimodality. Llama 4 Scout • 17B-active-parameter model

Today is the start of a new era of natively multimodal AI innovation.

Today, we’re introducing the first Llama 4 models: Llama 4 Scout and Llama 4 Maverick —  our most advanced models yet and the best in their class for multimodality.

Llama 4 Scout
• 17B-active-parameter model
Percy Liang (@percyliang) 's Twitter Profile Photo

We ran Llama 4 Maverick through some HELM benchmarks. It is 1st on HELM capabilities (MMLU-Pro, GPQA, IFEval, WildBench, Omni-MATH), but… crfm.stanford.edu/helm/capabilit…

We ran Llama 4 Maverick through some HELM benchmarks. It is 1st on HELM capabilities (MMLU-Pro, GPQA, IFEval, WildBench, Omni-MATH), but…
crfm.stanford.edu/helm/capabilit…