Zhiyuan Zeng (@zhiyuanzeng_) Twitter Tweets • TwiCopy

Yike Wang

5 months ago

LLMs are helpful for scientific research — but will they continuously be helpful? Introducing 🔍ScienceMeter: current knowledge update methods enable 86% preservation of prior scientific knowledge, 72% acquisition of new, and 38%+ projection of future (arxiv.org/abs/2505.24302).

thumb_up_off_alt236

chat_bubble_outline10

repeat53

shareShare

Jacqueline He

@jcqln_h

5 months ago

LMs often output answers that sound right but aren’t supported by input context. This is intrinsic hallucination: the generation of plausible, but unsupported content. We propose Precise Information Control (PIC): a task requiring LMs to ground only on given verifiable claims.

thumb_up_off_alt43

chat_bubble_outline1

repeat18

shareShare

Diyi Yang

@diyi_yang

5 months ago

AI agents are transforming the workforce! We mapped how AI agents could #automate vs. #augment jobs across the U.S. workforce With a worker-first look of the future of work👇🧵

thumb_up_off_alt8

chat_bubble_outline1

repeat5

shareShare

Ai2

@allen_ai

5 months ago

We are #1 on the Hugging Face heatmap - this is what true openness looks like!🥇🎉 750+ models 230+ datasets And counting... Come build with us huggingface.co/spaces/cfahlgr…

thumb_up_off_alt169

chat_bubble_outline8

repeat27

shareShare

Rulin Shao

@rulinshao

5 months ago

🎉Our Spurious Rewards is available on ArXiv! We added experiments on - More prompts/steps/models/analysis... - Spurious Prompts! Surprisingly, we obtained 19.4% gains when replacing prompts with LaTex placeholder text (\lipsum) 😶‍🌫️ Check out our 2nd blog: tinyurl.com/spurious-prompt

$🎉Our Spurious Rewards is available on ArXiv! We added experiments on - More prompts/steps/models/analysis... - Spurious Prompts! Surprisingly, we obtained 19.4% gains when replacing prompts with LaTex placeholder text (\lipsum) 😶‍🌫️ Check out our 2nd blog: tinyurl.com/spurious-prompt$

thumb_up_off_alt219

chat_bubble_outline4

repeat40

shareShare

Stella Li

@stellalisy

5 months ago

Spurious Rewards was not all‼️We now present spurious PROMPTS🤔 check out our latest findings and discussion on evaluation: tinyurl.com/spurious-prompt. Who knew Lorem ipsum can bring 19.4% gains compared to default prompt👀 Also, arXiv is out🤩 arxiv.org/abs/2506.10947📄

thumb_up_off_alt182

chat_bubble_outline6

repeat26

shareShare

Hanna Hajishirzi

@hannahajishirzi

5 months ago

Yayyy!!! Best paper honorable mention at CVPR goes to our Molmo and Pixmo Ai2! This is now becoming a tend :) Last year both OLMo and Dolma received best paper awards at ACL.

thumb_up_off_alt121

chat_bubble_outline7

repeat8

shareShare

Sarah Wiegreffe (on faculty job market!)

@sarahwiegreffe

5 months ago

A bit late to announce, but I’m excited to share that I'll be starting as an assistant professor at the University of Maryland UMD Department of Computer Science this August. I'll be recruiting PhD students this upcoming cycle for fall 2026. (And if you're a UMD grad student, sign up for my fall seminar!)

thumb_up_off_alt564

chat_bubble_outline70

repeat48

shareShare

Hao Xu

@xuhaoxh

4 months ago

Wanna 🔎 inside Internet-scale LLM training data w/o spending 💰💰💰? Introducing infini-gram mini, an exact-match search engine with 14x less storage req than the OG infini-gram 😎 We make 45.6 TB of text searchable. Read on to find our Web Interface, API, and more. (1/n) ⬇️

thumb_up_off_alt59

chat_bubble_outline6

repeat18

shareShare

Zhoujun (Jorge) Cheng

@chengzhoujun

4 months ago

🤯What we know about RL for reasoning might not hold outside math and code? We revisit established findings on RL for LLM reasoning on six domains (Math, Code, Science, Logic, Simulation, Tabular) and found that previous conclusions drawn on math and code are surprisingly

thumb_up_off_alt185

chat_bubble_outline1

repeat47

shareShare

Chenghao Yang

@chrome1996

4 months ago

Have you noticed… 🔍 Aligned LLM generations feel less diverse? 🎯 Base models are decoding-sensitive? 🤔 Generations get more predictable as they progress? 🌲 Tree search fails mid-generation (esp. for reasoning)? We trace these mysteries to LLM probability concentration, and

thumb_up_off_alt88

chat_bubble_outline1

repeat25

shareShare

Junhao Chen

@cumquaaa

4 months ago

🚀 Training an image generation model and picking sides between autoregressive (AR) and diffusion? Why not both? Check out MADFormer with half of the model layers for AR and half for diffusion. AR gives a fast guess for the next patch prediction while diffusion helps refine the

thumb_up_off_alt38

chat_bubble_outline4

repeat10

shareShare

Joongwon Kim

@danieljwkim

4 months ago

Can we improve Llama 3’s reasoning abilities through post-training only? Introducing ASTRO, our new framework that teaches LLMs to perform in-context search and generate long CoT to solve math problems, via SFT and RL. Work done at @aiatmeta. 📄 Paper: arxiv.org/abs/2507.00417

thumb_up_off_alt234

chat_bubble_outline5

repeat46

shareShare

Valentina Pyatkin

@valentina__py

4 months ago

💡Beyond math/code, instruction following with verifiable constraints is suitable to be learned with RLVR. But the set of constraints and verifier functions is limited and most models overfit on IFEval. We introduce IFBench to measure model generalization to unseen constraints.

thumb_up_off_alt347

chat_bubble_outline5

repeat89

shareShare

Victoria Graf

@victoriawgraf

4 months ago

Worried about overfitting to IFEval? 🤔 Use ✨IFBench✨ our new, challenging instruction-following benchmark! Loved working w/ Valentina Pyatkin! Personal highlight: our multi-turn eval setting makes it possible to isolate constraint-following from the rest of the instruction 🔍

thumb_up_off_alt48

chat_bubble_outline2

repeat13

shareShare

Zhiyuan Zeng

@zhiyuanzeng_

4 months ago

EvalTree accepted to Conference on Language Modeling 2025 - my first PhD work and first COLM paper 🙌! What would you like to see next—extensions, applications, or other directions? Always open to ideas! 🧐

EvalTree accepted to <a href="/COLM_conf/">Conference on Language Modeling</a> 2025 - my first PhD work and first COLM paper 🙌!

What would you like to see next—extensions, applications, or other directions? Always open to ideas! 🧐

thumb_up_off_alt199

chat_bubble_outline6

repeat14

shareShare

Allen School

@uwcse

4 months ago

#UWAllen University of Washington & NVIDIA researchers earned a #MLSys2025 Best Paper Award for boosting #LLM performance with FlashInfer—and showed “what’s possible when academia, industry & the open-source community innovate together,” says Zihao Ye. #AI #UWdiscovers news.cs.washington.edu/2025/07/01/all…

thumb_up_off_alt19

chat_bubble_outline0

repeat7

shareShare

Scott Geng

@scottgeng00

4 months ago

🤔 How do we train AI models that surpass their teachers? 🚨 In #COLM2025: ✨Delta learning ✨makes LLM post-training cheap and easy – with only weak data, we beat open 8B SOTA 🤯 The secret? Learn from the *differences* in weak data pairs! 📜 arxiv.org/abs/2507.06187 🧵 below

thumb_up_off_alt159

chat_bubble_outline7

repeat46

shareShare

Weijia Shi

@weijiashi2

4 months ago

Can data owners & LM developers collaborate to build a strong shared model while each retaining data control? Introducing FlexOlmo💪, a mixture-of-experts LM enabling: • Flexible training on your local data without sharing it • Flexible inference to opt in/out your data

thumb_up_off_alt197

chat_bubble_outline7

repeat59

shareShare

Akari Asai

@akariasai

4 months ago

Some updates 🚨 I finished my Ph.D at Allen School in June 2025! After a year at AI2 as a Research Scientist, I am joining CMU Language Technologies Institute | @CarnegieMellon & Machine Learning Dept. at Carnegie Mellon (courtesy) as an Assistant Professor in Fall 2026. The journey, acknowledgments & recruiting in 🧵

Some updates 🚨
I finished my Ph.D at <a href="/uwcse/">Allen School</a> in June 2025!
After a year at AI2 as a Research Scientist, I am joining CMU <a href="/LTIatCMU/">Language Technologies Institute | @CarnegieMellon</a> & <a href="/mldcmu/">Machine Learning Dept. at Carnegie Mellon</a> (courtesy) as an Assistant Professor in Fall 2026.
The journey, acknowledgments & recruiting in 🧵

thumb_up_off_alt1,1K

chat_bubble_outline85

repeat52

shareShare