Kushal Tirumala (@kushal_tirumala) Twitter Tweets • TwiCopy

Armen Aghajanyan

a year ago

I’m excited to announce our latest paper, introducing a family of early-fusion token-in token-out (gpt4o….), models capable of interleaved text and image understanding and generation. arxiv.org/abs/2405.09818

thumb_up_off_alt1,1K

chat_bubble_outline43

repeat224

shareShare

Akshat Shrivastava

@akshats07

a year ago

Excited to release our team's latest work on Chameleon! Show casing an E2E recipe for early fusion multimodal LLMs. (Fun fact: the core model was trained last year) Check out Srini Iyer 's post for a deep dive here!

thumb_up_off_alt25

chat_bubble_outline6

repeat7

shareShare

Chunting Zhou

@violet_zct

a year ago

🚀 Excited to introduce Chameleon, our work in mixed-modality early-fusion foundation models from last year! 🦎 Capable of understanding and generating text and images in any sequence. Check out our paper to learn more about its SOTA performance and versatile capabilities!

thumb_up_off_alt111

chat_bubble_outline3

repeat19

shareShare

AK

@_akhaliq

a year ago

Chameleon Mixed-Modal Early-Fusion Foundation Models We present Chameleon, a family of early-fusion token-based mixed-modal models capable of understanding and generating images and text in any arbitrary sequence. We outline a stable training approach from inception, an

thumb_up_off_alt94

chat_bubble_outline2

repeat26

shareShare

Aran Komatsuzaki

@arankomatsuzaki

a year ago

Meta presents Chameleon: Mixed-Modal Early-Fusion Foundation Models - SotA in image captioning - On par with Mixtral 8x7B and Gemini-Pro on text-only tasks - On par with Gemini Pro and GPT-4V on a new long-form mixed-modal generation evaluation arxiv.org/abs/2405.09818

thumb_up_off_alt418

chat_bubble_outline4

repeat88

shareShare

Kushal Tirumala

@kushal_tirumala

a year ago

check out what my team at FAIR has been working on :)

thumb_up_off_alt29

chat_bubble_outline1

repeat0

shareShare

Lucas Beyer (bl16)

@giffmana

a year ago

Armen and team have been working in this direction for a while now, and I’ve been eagerly following from the sidelines since the CM3 paper. Very nice to see the line of work come to fruition! Also nice to see that QK-layernorm works beyond ViT-22B.

thumb_up_off_alt122

chat_bubble_outline8

repeat10

shareShare

Lili Yu (Neurips24)

@liliyu_lili

a year ago

The team is working very hard to make this happen. Armen Aghajanyan Srini Iyer Gargi Ghosh Luke Zettlemoyer

thumb_up_off_alt25

chat_bubble_outline1

repeat3

shareShare

Aaditya Singh

@aaditya6284

a year ago

Long (code) files may not be as high quality as you think… Excited for our new work, "Brevity is the soul of wit: Pruning long files for code generation". We find that long code files are often low quality and show benefits from pruning such files for code gen. Read on 🔎⏬

thumb_up_off_alt59

chat_bubble_outline4

repeat16

shareShare

Tim Dettmers

@tim_dettmers

a year ago

After 7 months on the job market, I am happy to announce: - I joined Ai2 - Professor at Carnegie Mellon University from Fall 2025 - New bitsandbytes maintainer Titus von Koeller My main focus will be to strengthen open-source for real-world problems and bring the best AI to laptops 🧵

thumb_up_off_alt2,2K

chat_bubble_outline151

repeat87

shareShare

Sewon Min

@sewon__min

a year ago

📣 After graduating from @UWCSE, I am joining UC Berkeley as an Assistant Professor (affiliated w Berkeley AI Research BerkeleyNLP) and Ai2 as a Research Scientist. I'm looking forward to tackling exciting challenges in NLP & generative AI together with new colleagues! 🐻✨

thumb_up_off_alt1,1K

chat_bubble_outline163

repeat71

shareShare

Aran Komatsuzaki

@arankomatsuzaki

a year ago

Meta presents Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model - Can generate images and text on a par with similar scale diffusion models and language models - Compresses each image to just 16 patches arxiv.org/abs/2408.11039

thumb_up_off_alt441

chat_bubble_outline5

repeat112

shareShare

Lili Yu (Neurips24)

@liliyu_lili

a year ago

🚀 Excited to share our latest work: Transfusion! A new multi-modal generative training combining language modeling and image diffusion in a single transformer! Huge shout to Chunting Zhou Omer Levy Michi Yasunaga Arun Babu Kushal Tirumala and other collaborators.

thumb_up_off_alt104

chat_bubble_outline5

repeat17

shareShare

Felix Hill

@felixhill84

a year ago

Do you work in AI? Do you find things uniquely stressful right now, like never before? Haver you ever suffered from a mental illness? Read my personal experience of those challenges here: docs.google.com/document/d/1aE…

thumb_up_off_alt709

chat_bubble_outline37

repeat106

shareShare

Armen Aghajanyan

@armenagha

9 months ago

Say hello to our new company Perceptron AI. Foundation models transformed the digital realm, now it’s time for the physical world. We’re building the first foundational models designed for real-time, multi-modal intelligence across the real world. perceptron.inc

thumb_up_off_alt663

chat_bubble_outline38

repeat57

shareShare

Armen Aghajanyan

@armenagha

8 months ago

We have 2 open roles Perceptron AI in-person in Seattle. Full Stack Software Engineer Software Engineer (Data) Send resumes to [email protected]

thumb_up_off_alt77

chat_bubble_outline0

repeat9

shareShare

Junhong Shen

@junhongshen1

8 months ago

Introducing Content-Adaptive Tokenizer (CAT) 🐈! An image tokenizer that adapts token count based on image complexity, offering flexible 8x, 16x, or 32x compression! Unlike fixed-length tokenizers, CAT optimizes both representation efficiency and quality. Importantly, we use just

thumb_up_off_alt246

chat_bubble_outline4

repeat47

shareShare

Will Held

@williambarrheld

7 months ago

Balancing data across domains is key to training the best generalist LLMs! In my summer work @MetaAI, we introduce UtiliMax and MEDU, new methods to estimate data utility and optimize data mixes efficiently. HF Blog: huggingface.co/blog/WillHeld/… ArXiv: arxiv.org/abs/2501.11747

thumb_up_off_alt64

chat_bubble_outline3

repeat15

shareShare

Vivek Ramanujan

@ramanujanvivek

7 months ago

Happy to (belatedly) share our recent work introducing Causally Regularized Tokenization 📺, matching LlamaGen-3B generation performance with 0.5x the number of tokens/image (256 vs 576) and 0.25x the number of params (770M vs 3B) on ImageNet. arxiv.org/pdf/2412.16326 1/n

thumb_up_off_alt169

chat_bubble_outline5

repeat26

shareShare

Myra Cheng

@chengmyra1

3 months ago

Do people actually like human-like LLMs? In our #ACL2025 paper HumT DumT, we find a kind of uncanny valley effect: users dislike LLM outputs that are *too human-like*. We thus develop methods to reduce human-likeness without sacrificing performance.

thumb_up_off_alt170

chat_bubble_outline5

repeat25

shareShare