Artidoro Pagnoni (@artidoropagnoni) 's Twitter Profile
Artidoro Pagnoni

@artidoropagnoni

PhD student in NLP at UW with Luke Zettlemoyer

ID: 3583993995

calendar_today08-09-2015 04:35:23

238 Tweet

892 Followers

451 Following

Philipp Schmid (@_philschmid) 's Twitter Profile Photo

Easily Fine-tune AI at Meta Llama 3 70B! πŸ¦™ I am excited to share a new guide on how to fine-tune Llama 3 70B with PyTorch FSDP, Q-Lora, and Flash Attention 2 (SDPA) using Hugging Face build for consumer-size GPUs (4x 24GB). πŸš€ Blog: philschmid.de/fsdp-qlora-lla… The blog covers: πŸ‘¨β€πŸ’»

Tomasz Limisiewicz (@tomlimi) 's Twitter Profile Photo

πŸ“’New pre-print alert! πŸ“’ A curse of over-segmentation haunts multilingual language models. While prior approaches have tried to resolve this by balancing data across languages, the problem lies much deeper β€” in the byte encodings themselves.πŸ”πŸ”‘ arxiv.org/pdf/2403.10691 (1/6)

πŸ“’New pre-print alert! πŸ“’

A curse of over-segmentation haunts multilingual language models. While prior approaches have tried to resolve this by balancing data across languages, the problem lies much deeper β€” in the byte encodings themselves.πŸ”πŸ”‘

arxiv.org/pdf/2403.10691 (1/6)
AI at Meta (@aiatmeta) 's Twitter Profile Photo

Newly published work from FAIR, Chameleon: Mixed-Modal Early-Fusion Foundation Models. This research presents a family of early-fusion token-based mixed-modal models capable of understanding & generating images & text in any arbitrary sequence. Paper ➑️ go.fb.me/7rb19n

Newly published work from FAIR, Chameleon: Mixed-Modal Early-Fusion Foundation Models.

This research presents a family of early-fusion token-based mixed-modal models capable of understanding & generating images & text in any arbitrary sequence.

Paper ➑️ go.fb.me/7rb19n
Armen Aghajanyan (@armenagha) 's Twitter Profile Photo

I’m excited to announce our latest paper, introducing a family of early-fusion token-in token-out (gpt4o….), models capable of interleaved text and image understanding and generation. arxiv.org/abs/2405.09818

Lili Yu (@liliyu_lili) 's Twitter Profile Photo

Interleaving text image generation with consistency is a unique feature bought by our early-fusing end to end training model.

Alex HΓ€gele (@haeggee) 's Twitter Profile Photo

Why exactly do we train LLMs with the cosine schedule, still?πŸ€” Maybe we do not actually have to -- and that would come with a lot of benefits :) 🧡Our paper on LR schedules, compute-optimality and more affordable scaling laws

Weijia Shi (@weijiashi2) 's Twitter Profile Photo

Augmenting GPT-4o with Visual Sketchpad ✏️ We introduce Sketchpad agent, a framework that equips multimodal LLMs with a visual canvas and drawing tools 🎨 . Improving GPT-4o's performance in vision and math tasks πŸ“ˆ πŸ”—: visualsketchpad.github.io

Augmenting GPT-4o with Visual Sketchpad ✏️

We introduce Sketchpad agent, a framework that equips multimodal LLMs with a visual canvas and drawing tools 🎨 . Improving GPT-4o's performance in vision and math tasks πŸ“ˆ

πŸ”—: visualsketchpad.github.io
Margaret Li (@margs_li) 's Twitter Profile Photo

RLHF-aligned LMs excel at long-form generation, but how? We show how current models rely on anchor spans βš“: strings that occur across many samples for the same prompt, forming an implicit outline, viz below.

RLHF-aligned LMs excel at long-form generation, but how? 
We show how current models rely on anchor spans βš“: strings that occur across many samples for the same prompt, forming an implicit outline, viz below.
Artidoro Pagnoni (@artidoropagnoni) 's Twitter Profile Photo

Improvements from RLHF are achieved at the expense of world modeling capabilities. Could this be a fundamental trade-off between world models and agent models?

Philippe Laban (@philippelaban) 's Twitter Profile Photo

Check out our latest work (co-led with Alex Fabbri) on Summary of a Haystack (SummHay). A challenging task that shows long-context summarization with precise citation is far from solved... Got a long-context LLM or RAG you want to test? Code: github.com/salesforce/sum…

Pengfei Liu (@stefan_fee) 's Twitter Profile Photo

The Alpaca moment of Large Multimodal Models! Can we build native LMMs just like Llama for simple multimodal generation? Introducing Anole: the first open-source, autoregressive native LMM for multimodal generation. Building on Chameleon by AI at Meta: github.com/GAIR-NLP/anole

The Alpaca moment of Large Multimodal Models! Can we build native LMMs just like Llama for simple multimodal generation?
Introducing Anole: the first open-source, autoregressive native LMM for multimodal generation. Building on Chameleon by <a href="/AIatMeta/">AI at Meta</a>: github.com/GAIR-NLP/anole
Rulin Shao (@rulinshao) 's Twitter Profile Photo

πŸ”₯We release the first open-source 1.4T-token RAG datastore and present a scaling study for RAG on perplexity and downstream tasks! We show LM+RAG scales better than LM alone, with better performance for the same training compute (pretraining+indexing) retrievalscaling.github.io 🧡

Han Guo (@hanguo97) 's Twitter Profile Photo

Introducing FLUTE, a CUDA kernel for non-uniformly quantized (via a lookup table) LLM Inference. It accelerates QLoRA's NormalFloat (NF) out of the box and more. As an application, we extended NF4 and are releasing quantized models for LLaMA-3 (8B/70B) and Gemma-2 (9B/27B).

Introducing FLUTE, a CUDA kernel for non-uniformly quantized (via a lookup table) LLM Inference. It accelerates QLoRA's NormalFloat (NF) out of the box and more.

As an application, we extended NF4 and are releasing quantized models for LLaMA-3 (8B/70B) and Gemma-2 (9B/27B).
Mike Lewis (@ml_perception) 's Twitter Profile Photo

So excited for the open release of Llama 3.1 405B - with MMLU > 87, it's a really strong model and I can't wait to see what you all build with it! llama.meta.com Also check out the paper here, with lots of details on how this was made: tinyurl.com/2z2cpj8m

Mark Saroufim (@marksaroufim) 's Twitter Profile Photo

We've kicked off the NeurIPS Meta Hacker Cup AI track! I mostly use LLMs to code and they've been great at removing the drudgery of boiler-platy code but for harder problems they are meh Both the big-boi llamas and GPT-4 fall flat on their face!

Victoria X Lin (@victorialinml) 's Twitter Profile Photo

1/n Introducing MoMa πŸ–Ό, our new sparse early-fusion architecture for mixed-modal language modeling that significantly boosts pre-training efficiency πŸš€ (arxiv.org/pdf/2407.21770). MoMa employs a mixture-of-expert (MoE) framework with modality-specific expert groups. Given any

1/n Introducing MoMa πŸ–Ό, our new sparse early-fusion architecture for mixed-modal language modeling that significantly boosts pre-training efficiency πŸš€ (arxiv.org/pdf/2407.21770).
MoMa employs a mixture-of-expert (MoE) framework with modality-specific expert groups. Given any
Terra Blevins (@terrablvns) 's Twitter Profile Photo

I’m very excited to join Northeastern U. Khoury College of Computer Sciences as an assistant professor starting Fall '25!! Looking forward to working with the amazing people there! Until then I'll be a postdoc at NLP @ Uni Vienna with Ben Roth, so reach out if you want to meet up while I'm over in Europe ✨

Chunting Zhou (@violet_zct) 's Twitter Profile Photo

Introducing *Transfusion* - a unified approach for training models that can generate both text and images. arxiv.org/pdf/2408.11039 Transfusion combines language modeling (next token prediction) with diffusion to train a single transformer over mixed-modality sequences. This

Introducing *Transfusion* - a unified approach for training models that can generate both text and images. arxiv.org/pdf/2408.11039

Transfusion combines language modeling (next token prediction) with diffusion to train a single transformer over mixed-modality sequences. This
Lili Yu (@liliyu_lili) 's Twitter Profile Photo

πŸš€ Excited to share our latest work: Transfusion! A new multi-modal generative training combining language modeling and image diffusion in a single transformer! Huge shout to Chunting Zhou Omer Levy Michi Yasunaga Arun Babu Kushal Tirumala and other collaborators.