Michi Yasunaga (@michiyasunaga) Twitter Tweets • TwiCopy

Shirley Wu

10 months ago

🔥 AvaTaR: Optimizing LLM Agents for Tool Usage via Contrastive Reasoning 🔥 is on NeurIPS Conference 2024! LLM agents are great, but they don't always make the best use of tools! AvaTaR is an automated framework that optimizes an LLM agent’s tool usage for any task. The "magic" lies

thumb_up_off_alt152

chat_bubble_outline1

repeat33

shareShare

Joon Sung Park

@joon_s_pk

9 months ago

Simulating human behavior with AI agents promises a testbed for policy and the social sciences. We interviewed 1,000 people for two hours each to create generative agents of them. These agents replicate their source individuals’ attitudes and behaviors. 🧵arxiv.org/abs/2411.10109

thumb_up_off_alt973

chat_bubble_outline27

repeat256

shareShare

Akari Asai

@akariasai

9 months ago

🚨 I’m on the job market this year! 🚨 I’m completing my Allen School Ph.D. (2025), where I identify and tackle key LLM limitations like hallucinations by developing new models—Retrieval-Augmented LMs—to build more reliable real-world AI systems. Learn more in the thread! 🧵

🚨 I’m on the job market this year! 🚨
I’m completing my <a href="/uwcse/">Allen School</a> Ph.D. (2025), where I identify and tackle key LLM limitations like hallucinations by developing new models—Retrieval-Augmented LMs—to build more reliable real-world AI systems. Learn more in the thread! 🧵

thumb_up_off_alt825

chat_bubble_outline26

repeat118

shareShare

Marjan Ghazvininejad

@gh_marjan

9 months ago

Everyone’s talking about synthetic data generation — but what’s the recipe for scaling it without model collapse? 🤔 Meet ALMA: Alignment with Minimal Annotation. We've developed a new technique for generating synthetic data and aligning LLMs that achieves performance close to

thumb_up_off_alt65

chat_bubble_outline2

repeat10

shareShare

John Nguyen

@__johnnguyen__

9 months ago

🥪New Paper! 🥪Introducing Byte Latent Transformer (BLT) - A tokenizer free model scales better than BPE based models with better inference efficiency and robustness. 🧵

thumb_up_off_alt446

chat_bubble_outline12

repeat64

shareShare

Lili Yu (Neurips24)

@liliyu_lili

9 months ago

We scaled up Megabyte and ended up with a BLT! A pure byte-level model, has a steeper scaling law than the BPE-based models. With up to 8B parameters, BLT matches Llama 3 on general NLP tasks—plus it excels on long-tail data and can manipulate substrings more effectively. The

thumb_up_off_alt70

chat_bubble_outline0

repeat10

shareShare

Weijia Shi

@weijiashi2

8 months ago

Introducing 𝐋𝐥𝐚𝐦𝐚𝐅𝐮𝐬𝐢𝐨𝐧: empowering Llama 🦙 with diffusion 🎨 to understand and generate text and images in arbitrary sequences. ✨ Building upon Transfusion, our recipe fully preserves Llama’s language performance while unlocking its multimodal understanding and

thumb_up_off_alt857

chat_bubble_outline12

repeat175

shareShare

Junhong Shen

@junhongshen1

8 months ago

Introducing Content-Adaptive Tokenizer (CAT) 🐈! An image tokenizer that adapts token count based on image complexity, offering flexible 8x, 16x, or 32x compression! Unlike fixed-length tokenizers, CAT optimizes both representation efficiency and quality. Importantly, we use just

thumb_up_off_alt246

chat_bubble_outline4

repeat47

shareShare

Marjan Ghazvininejad

@gh_marjan

6 months ago

As Vision-Language Models (VLMs) grow more powerful, we need better reward models to align them with human intent. But how can we evaluate these models effectively? There are many aspects to evaluate them from -- correctness, human preference, reasoning, safety, etc.

thumb_up_off_alt15

chat_bubble_outline1

repeat2

shareShare

Zhaofeng Wu @ ICLR

@zhaofeng_wu

5 months ago

Robust reward models are critical for alignment/inference-time algos, auto eval, etc. (e.g. to prevent reward hacking which could render alignment ineffective). ⚠️ But we found that SOTA RMs are brittle 🫧 and easily flip predictions when the inputs are slightly transformed 🍃 🧵

thumb_up_off_alt166

chat_bubble_outline4

repeat31

shareShare