Kushal Tirumala (@kushal_tirumala) 's Twitter Profile
Kushal Tirumala

@kushal_tirumala

Researcher @ FAIR (@MetaAI)

ID: 1526449505779847168

calendar_today17-05-2022 06:28:28

82 Tweet

402 Takipçi

115 Takip Edilen

Armen Aghajanyan (@armenagha) 's Twitter Profile Photo

I’m excited to announce our latest paper, introducing a family of early-fusion token-in token-out (gpt4o….), models capable of interleaved text and image understanding and generation. arxiv.org/abs/2405.09818

Akshat Shrivastava (@akshats07) 's Twitter Profile Photo

Excited to release our team's latest work on Chameleon! Show casing an E2E recipe for early fusion multimodal LLMs. (Fun fact: the core model was trained last year) Check out Srini Iyer 's post for a deep dive here!

Chunting Zhou (@violet_zct) 's Twitter Profile Photo

🚀 Excited to introduce Chameleon, our work in mixed-modality early-fusion foundation models from last year! 🦎 Capable of understanding and generating text and images in any sequence. Check out our paper to learn more about its SOTA performance and versatile capabilities!

AK (@_akhaliq) 's Twitter Profile Photo

Chameleon Mixed-Modal Early-Fusion Foundation Models We present Chameleon, a family of early-fusion token-based mixed-modal models capable of understanding and generating images and text in any arbitrary sequence. We outline a stable training approach from inception, an

Chameleon

Mixed-Modal Early-Fusion Foundation Models

We present Chameleon, a family of early-fusion token-based mixed-modal models capable of understanding and generating images and text in any arbitrary sequence. We outline a stable training approach from inception, an
Aran Komatsuzaki (@arankomatsuzaki) 's Twitter Profile Photo

Meta presents Chameleon: Mixed-Modal Early-Fusion Foundation Models - SotA in image captioning - On par with Mixtral 8x7B and Gemini-Pro on text-only tasks - On par with Gemini Pro and GPT-4V on a new long-form mixed-modal generation evaluation arxiv.org/abs/2405.09818

Meta presents Chameleon: Mixed-Modal Early-Fusion Foundation Models

- SotA in image captioning
- On par with Mixtral 8x7B and Gemini-Pro on text-only tasks
- On par with Gemini Pro and GPT-4V on a new long-form mixed-modal generation evaluation

arxiv.org/abs/2405.09818
Lucas Beyer (bl16) (@giffmana) 's Twitter Profile Photo

Armen and team have been working in this direction for a while now, and I’ve been eagerly following from the sidelines since the CM3 paper. Very nice to see the line of work come to fruition! Also nice to see that QK-layernorm works beyond ViT-22B.

Aaditya Singh (@aaditya6284) 's Twitter Profile Photo

Long (code) files may not be as high quality as you think… Excited for our new work, "Brevity is the soul of wit: Pruning long files for code generation". We find that long code files are often low quality and show benefits from pruning such files for code gen. Read on 🔎⏬

Tim Dettmers (@tim_dettmers) 's Twitter Profile Photo

After 7 months on the job market, I am happy to announce: - I joined Ai2 - Professor at Carnegie Mellon University from Fall 2025 - New bitsandbytes maintainer Titus von Koeller My main focus will be to strengthen open-source for real-world problems and bring the best AI to laptops 🧵

Sewon Min (@sewon__min) 's Twitter Profile Photo

📣 After graduating from @UWCSE, I am joining UC Berkeley as an Assistant Professor (affiliated w Berkeley AI Research BerkeleyNLP) and Ai2 as a Research Scientist. I'm looking forward to tackling exciting challenges in NLP & generative AI together with new colleagues! 🐻✨

Aran Komatsuzaki (@arankomatsuzaki) 's Twitter Profile Photo

Meta presents Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model - Can generate images and text on a par with similar scale diffusion models and language models - Compresses each image to just 16 patches arxiv.org/abs/2408.11039

Meta presents Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

- Can generate images and text on a par with similar scale diffusion models and language models
- Compresses each image to just 16 patches

arxiv.org/abs/2408.11039
Lili Yu (Neurips24) (@liliyu_lili) 's Twitter Profile Photo

🚀 Excited to share our latest work: Transfusion! A new multi-modal generative training combining language modeling and image diffusion in a single transformer! Huge shout to Chunting Zhou Omer Levy Michi Yasunaga Arun Babu Kushal Tirumala and other collaborators.

Felix Hill (@felixhill84) 's Twitter Profile Photo

Do you work in AI? Do you find things uniquely stressful right now, like never before? Haver you ever suffered from a mental illness? Read my personal experience of those challenges here: docs.google.com/document/d/1aE…

Armen Aghajanyan (@armenagha) 's Twitter Profile Photo

Say hello to our new company Perceptron AI. Foundation models transformed the digital realm, now it’s time for the physical world. We’re building the first foundational models designed for real-time, multi-modal intelligence across the real world. perceptron.inc

Junhong Shen (@junhongshen1) 's Twitter Profile Photo

Introducing Content-Adaptive Tokenizer (CAT) 🐈! An image tokenizer that adapts token count based on image complexity, offering flexible 8x, 16x, or 32x compression! Unlike fixed-length tokenizers, CAT optimizes both representation efficiency and quality. Importantly, we use just

Introducing Content-Adaptive Tokenizer (CAT) 🐈! An image tokenizer that adapts token count based on image complexity, offering flexible 8x, 16x, or 32x compression! Unlike fixed-length tokenizers, CAT optimizes both representation efficiency and quality. Importantly, we use just
Will Held (@williambarrheld) 's Twitter Profile Photo

Balancing data across domains is key to training the best generalist LLMs! In my summer work @MetaAI, we introduce UtiliMax and MEDU, new methods to estimate data utility and optimize data mixes efficiently. HF Blog: huggingface.co/blog/WillHeld/… ArXiv: arxiv.org/abs/2501.11747

Vivek Ramanujan (@ramanujanvivek) 's Twitter Profile Photo

Happy to (belatedly) share our recent work introducing Causally Regularized Tokenization 📺, matching LlamaGen-3B generation performance with 0.5x the number of tokens/image (256 vs 576) and 0.25x the number of params (770M vs 3B) on ImageNet. arxiv.org/pdf/2412.16326 1/n

Happy to (belatedly) share our recent work introducing Causally Regularized Tokenization 📺, matching LlamaGen-3B generation performance with 0.5x the number of tokens/image (256 vs 576) and 0.25x the number of params (770M vs 3B) on ImageNet.

arxiv.org/pdf/2412.16326

1/n
Myra Cheng (@chengmyra1) 's Twitter Profile Photo

Do people actually like human-like LLMs? In our #ACL2025 paper HumT DumT, we find a kind of uncanny valley effect: users dislike LLM outputs that are *too human-like*. We thus develop methods to reduce human-likeness without sacrificing performance.

Do people actually like human-like LLMs? In our #ACL2025 paper HumT DumT, we find a kind of uncanny valley effect: users dislike LLM outputs that are *too human-like*. We thus develop methods to reduce human-likeness without sacrificing performance.