Artidoro Pagnoni (@artidoropagnoni) Twitter Tweets • TwiCopy

Philipp Schmid

9 months ago

Easily Fine-tune AI at Meta Llama 3 70B! 🦙 I am excited to share a new guide on how to fine-tune Llama 3 70B with PyTorch FSDP, Q-Lora, and Flash Attention 2 (SDPA) using Hugging Face build for consumer-size GPUs (4x 24GB). 🚀 Blog: philschmid.de/fsdp-qlora-lla… The blog covers: 👨‍💻

thumb_up_off_alt672

chat_bubble_outline18

repeat153

shareShare

Tomasz Limisiewicz

@tomlimi

8 months ago

📢New pre-print alert! 📢 A curse of over-segmentation haunts multilingual language models. While prior approaches have tried to resolve this by balancing data across languages, the problem lies much deeper — in the byte encodings themselves.🔍🔡 arxiv.org/pdf/2403.10691 (1/6)

thumb_up_off_alt91

chat_bubble_outline4

repeat21

shareShare

AI at Meta

@aiatmeta

8 months ago

Newly published work from FAIR, Chameleon: Mixed-Modal Early-Fusion Foundation Models. This research presents a family of early-fusion token-based mixed-modal models capable of understanding & generating images & text in any arbitrary sequence. Paper ➡️ go.fb.me/7rb19n

thumb_up_off_alt941

chat_bubble_outline27

repeat199

shareShare

Armen Aghajanyan

@armenagha

8 months ago

I’m excited to announce our latest paper, introducing a family of early-fusion token-in token-out (gpt4o….), models capable of interleaved text and image understanding and generation. arxiv.org/abs/2405.09818

thumb_up_off_alt1,1K

chat_bubble_outline43

repeat229

shareShare

Lili Yu

@liliyu_lili

8 months ago

Interleaving text image generation with consistency is a unique feature bought by our early-fusing end to end training model.

thumb_up_off_alt39

chat_bubble_outline1

repeat4

shareShare

Alex Hägele

@haeggee

7 months ago

Why exactly do we train LLMs with the cosine schedule, still?🤔 Maybe we do not actually have to -- and that would come with a lot of benefits :) 🧵Our paper on LR schedules, compute-optimality and more affordable scaling laws

thumb_up_off_alt128

chat_bubble_outline2

repeat25

shareShare

Weijia Shi

@weijiashi2

7 months ago

Augmenting GPT-4o with Visual Sketchpad ✏️ We introduce Sketchpad agent, a framework that equips multimodal LLMs with a visual canvas and drawing tools 🎨 . Improving GPT-4o's performance in vision and math tasks 📈 🔗: visualsketchpad.github.io

thumb_up_off_alt286

chat_bubble_outline9

repeat53

shareShare

Margaret Li

@margs_li

6 months ago

RLHF-aligned LMs excel at long-form generation, but how? We show how current models rely on anchor spans ⚓: strings that occur across many samples for the same prompt, forming an implicit outline, viz below.

thumb_up_off_alt211

chat_bubble_outline6

repeat34

shareShare

Artidoro Pagnoni

@artidoropagnoni

6 months ago

Improvements from RLHF are achieved at the expense of world modeling capabilities. Could this be a fundamental trade-off between world models and agent models?

thumb_up_off_alt15

chat_bubble_outline0

repeat0

shareShare

Philippe Laban

@philippelaban

6 months ago

Check out our latest work (co-led with Alex Fabbri) on Summary of a Haystack (SummHay). A challenging task that shows long-context summarization with precise citation is far from solved... Got a long-context LLM or RAG you want to test? Code: github.com/salesforce/sum…

thumb_up_off_alt23

chat_bubble_outline1

repeat7

shareShare

Pengfei Liu

@stefan_fee

6 months ago

The Alpaca moment of Large Multimodal Models! Can we build native LMMs just like Llama for simple multimodal generation? Introducing Anole: the first open-source, autoregressive native LMM for multimodal generation. Building on Chameleon by AI at Meta: github.com/GAIR-NLP/anole

thumb_up_off_alt540

chat_bubble_outline11

repeat122

shareShare

Rulin Shao

@rulinshao

6 months ago

🔥We release the first open-source 1.4T-token RAG datastore and present a scaling study for RAG on perplexity and downstream tasks! We show LM+RAG scales better than LM alone, with better performance for the same training compute (pretraining+indexing) retrievalscaling.github.io 🧵

thumb_up_off_alt357

chat_bubble_outline18

repeat85

shareShare

Han Guo

@hanguo97

6 months ago

Introducing FLUTE, a CUDA kernel for non-uniformly quantized (via a lookup table) LLM Inference. It accelerates QLoRA's NormalFloat (NF) out of the box and more. As an application, we extended NF4 and are releasing quantized models for LLaMA-3 (8B/70B) and Gemma-2 (9B/27B).

thumb_up_off_alt298

chat_bubble_outline6

repeat61

shareShare

Mike Lewis

@ml_perception

6 months ago

So excited for the open release of Llama 3.1 405B - with MMLU > 87, it's a really strong model and I can't wait to see what you all build with it! llama.meta.com Also check out the paper here, with lots of details on how this was made: tinyurl.com/2z2cpj8m

thumb_up_off_alt179

chat_bubble_outline3

repeat20

shareShare

Mark Saroufim

@marksaroufim

6 months ago

We've kicked off the NeurIPS Meta Hacker Cup AI track! I mostly use LLMs to code and they've been great at removing the drudgery of boiler-platy code but for harder problems they are meh Both the big-boi llamas and GPT-4 fall flat on their face!

thumb_up_off_alt50

chat_bubble_outline1

repeat7

shareShare

Victoria X Lin

@victorialinml

5 months ago

1/n Introducing MoMa 🖼, our new sparse early-fusion architecture for mixed-modal language modeling that significantly boosts pre-training efficiency 🚀 (arxiv.org/pdf/2407.21770). MoMa employs a mixture-of-expert (MoE) framework with modality-specific expert groups. Given any

thumb_up_off_alt297

chat_bubble_outline7

repeat55

shareShare

Terra Blevins

@terrablvns

5 months ago

I’m very excited to join Northeastern U. Khoury College of Computer Sciences as an assistant professor starting Fall '25!! Looking forward to working with the amazing people there! Until then I'll be a postdoc at NLP @ Uni Vienna with Ben Roth, so reach out if you want to meet up while I'm over in Europe ✨

thumb_up_off_alt292

chat_bubble_outline29

repeat18

shareShare

Chunting Zhou

@violet_zct

5 months ago

Introducing *Transfusion* - a unified approach for training models that can generate both text and images. arxiv.org/pdf/2408.11039 Transfusion combines language modeling (next token prediction) with diffusion to train a single transformer over mixed-modality sequences. This

thumb_up_off_alt983

chat_bubble_outline23

repeat208

shareShare

Lili Yu

@liliyu_lili

5 months ago

🚀 Excited to share our latest work: Transfusion! A new multi-modal generative training combining language modeling and image diffusion in a single transformer! Huge shout to Chunting Zhou Omer Levy Michi Yasunaga Arun Babu Kushal Tirumala and other collaborators.

thumb_up_off_alt98

chat_bubble_outline4

repeat17

shareShare