Yu-Min Tseng (@ym_tseng) Twitter Tweets • TwiCopy

Zhepei Wei ✈️ ICLR 2025

6 months ago

⚠️ New #ICML2025 paper! Want faster and accurate LLM decoding? Check out AdaDecode! 🚀 ⚙️ Adaptive token prediction at intermediate layers w/o full forward pass! 🎯 Identical output to standard decoding! 🧩 No draft model — just a lightweight LM head (0.2% model size)! 🧵[1/n]

thumb_up_off_alt53

chat_bubble_outline3

repeat9

shareShare

Tu Vu

@tuvllms

5 months ago

Excited to share that our paper on model merging at scale has been accepted to Transactions on Machine Learning Research (TMLR). Huge congrats to my intern Prateek Yadav and our awesome co-authors Jonathan Lai, Alexandra Chronopoulou, Manaal Faruqui, Mohit Bansal, and Tsendsuren 🎉!!

thumb_up_off_alt20

chat_bubble_outline0

repeat2

shareShare

Thinh

@thinhphp_vt

5 months ago

🔥 SEAL-0 Leaderboard 📈 Our results on SEAL-0 show a large room for improvement in LLMs' ability to reason over conflicting evidence. 🤯 👉Checkout our paper: arxiv.org/abs/2506.01062 👉Dataset: huggingface.co/datasets/vtllm…

thumb_up_off_alt14

chat_bubble_outline0

repeat6

shareShare

Tu Vu

@tuvllms

4 months ago

Our independent evaluation on reasoning over conflicting evidence with SEAL-0 shows that Grok 4 is a strong model, though its performance gaps with other frontier models like Gemini-2.5-Pro and o3-pro are small.

thumb_up_off_alt21

chat_bubble_outline0

repeat4

shareShare

Zhepei Wei ✈️ ICLR 2025

@weizhepei

4 months ago

Thrilled to present three works at #ICML2025!🥳 🚀AdaDecode — Wed 7/16, East Exhibition Hall A-B (#E-2605) 🔢Negative Reinforcement for Reasoning — Fri 7/18, AI for Math Workshop 🤖WebAgent-R1 — Sat 7/19, Workshop on Computer Use Agents Feel free to stop by and chat about #LLMs!

thumb_up_off_alt12

chat_bubble_outline0

repeat5

shareShare

Yung-Sung Chuang

@yungsungchuang

4 months ago

Scaling CLIP on English-only data is outdated now… 🌍We built CLIP data curation pipeline for 300+ languages 🇬🇧We train MetaCLIP 2 without compromising English-task performance (it actually improves! 🥳It’s time to drop the language filter! 📝arxiv.org/abs/2507.22062 [1/5] 🧵

thumb_up_off_alt290

chat_bubble_outline3

repeat80

shareShare

Justin Chih-Yao Chen

@cyjustinchen

3 months ago

Excited to share that MAgICoRe has been accepted to #EMNLP2025 main! 🎉 Our work identifies 3 key challenges in LLM refinement for reasoning: 1) Over-correction on easy problems 2) Fail to localize and fix its own errors 3) Too few refinement iterations for harder problems

thumb_up_off_alt98

chat_bubble_outline0

repeat36

shareShare

Cheng Han Chiang (姜成翰)

@dcml0714

3 months ago

🎉 Excited to share that our paper on audio-LLM-as-a-judge has been accepted to EMNLP 2025 Findings! 🔗 arxiv.org/abs/2506.05984… 🗝️ Highlights: 🧑‍⚖️ Agreement between human and audio-LLM-judge can be as high as human-human agreements 👑 Gemini-2.5-pro outperforms GPT-4o-audio as a

thumb_up_off_alt86

chat_bubble_outline3

repeat18

shareShare

Valentin Hofmann

@vjhofmann

2 months ago

📢 New #COLM2025 paper 📢 Standard benchmarks give every LLM the same questions. This is like testing 5th graders and college seniors with *one* exam! 🥴 Meet Fluid Benchmarking, a capability-adaptive eval method delivering lower variance, higher validity, and reduced cost. 🧵

thumb_up_off_alt193

chat_bubble_outline5

repeat40

shareShare

Zhepei Wei ✈️ ICLR 2025

@weizhepei

2 months ago

🤔Ever wondered why your post-training methods (SFT/RL) make LLMs reluctant to say “I don't know?” 🤩Introducing TruthRL — a truthfulness-driven RL method that significantly reduces hallucinations while achieving accuracy and proper abstention! 📃arxiv.org/abs/2509.25760 🧵[1/n]

thumb_up_off_alt64

chat_bubble_outline2

repeat14

shareShare

Justin Chih-Yao Chen

@cyjustinchen

2 months ago

🚨 NuRL: Nudging the Boundaries of LLM Reasoning GRPO improves LLM reasoning, but often within the model's "comfort zone": hard samples (w/ 0% pass rate) remain unsolvable and contribute zero learning signals. In NuRL, we show that "nudging" the LLM with self-generated hints

thumb_up_off_alt320

chat_bubble_outline11

repeat70

shareShare

Yu-Min Tseng

@ym_tseng

2 months ago

Presenting this in just a few hours! Come stop by! 🙌 📅 10/09 Thu. 11:00~13:00 Poster #38 #COLM2025 #Montréal 🍁

thumb_up_off_alt68

chat_bubble_outline0

repeat7

shareShare

Dan Hendrycks

@danhendrycks

a month ago

The term “AGI” is currently a vague, moving goalpost. To ground the discussion, we propose a comprehensive, testable definition of AGI. Using it, we can quantify progress: GPT-4 (2023) was 27% of the way to AGI. GPT-5 (2025) is 58%. Here’s how we define and measure it: 🧵

thumb_up_off_alt1,1K

chat_bubble_outline172

repeat368

shareShare