Yu-Min Tseng (@ym_tseng) 's Twitter Profile
Yu-Min Tseng

@ym_tseng

Incoming Ph.D. Student @VT_CS. Master Student @NTU_TW. Visiting Graduate Student @UVA.

ID: 1700055849711091712

linkhttp://ymtseng.com calendar_today08-09-2023 07:58:22

28 Tweet

298 Followers

362 Following

Zhepei Wei ✈️ ICLR 2025 (@weizhepei) 's Twitter Profile Photo

⚠️ New #ICML2025 paper! Want faster and accurate LLM decoding? Check out AdaDecode! 🚀 ⚙️ Adaptive token prediction at intermediate layers w/o full forward pass! 🎯 Identical output to standard decoding! 🧩 No draft model — just a lightweight LM head (0.2% model size)! 🧵[1/n]

⚠️ New #ICML2025 paper!
Want faster and accurate LLM decoding? Check out AdaDecode! 🚀 

⚙️ Adaptive token prediction at intermediate layers w/o full forward pass!
🎯 Identical output to standard decoding!
🧩 No draft model — just a lightweight LM head (0.2% model size)!

🧵[1/n]
Tu Vu (@tuvllms) 's Twitter Profile Photo

Excited to share that our paper on model merging at scale has been accepted to Transactions on Machine Learning Research (TMLR). Huge congrats to my intern Prateek Yadav and our awesome co-authors Jonathan Lai, Alexandra Chronopoulou, Manaal Faruqui, Mohit Bansal, and Tsendsuren 🎉!!

Excited to share that our paper on model merging at scale has been accepted to Transactions on Machine Learning Research (TMLR). Huge congrats to my intern <a href="/prateeky2806/">Prateek Yadav</a> and our awesome co-authors <a href="/_JLai/">Jonathan Lai</a>, <a href="/alexandraxron/">Alexandra Chronopoulou</a>, <a href="/manaalfar/">Manaal Faruqui</a>, <a href="/mohitban47/">Mohit Bansal</a>, and <a href="/TsendeeMTS/">Tsendsuren</a> 🎉!!
Thinh (@thinhphp_vt) 's Twitter Profile Photo

🔥 SEAL-0 Leaderboard 📈 Our results on SEAL-0 show a large room for improvement in LLMs' ability to reason over conflicting evidence. 🤯 👉Checkout our paper: arxiv.org/abs/2506.01062 👉Dataset: huggingface.co/datasets/vtllm…

🔥 SEAL-0 Leaderboard 📈

Our results on SEAL-0 show a large room for improvement in LLMs' ability to reason over conflicting evidence. 🤯

👉Checkout our paper: arxiv.org/abs/2506.01062
👉Dataset: huggingface.co/datasets/vtllm…
Tu Vu (@tuvllms) 's Twitter Profile Photo

Our independent evaluation on reasoning over conflicting evidence with SEAL-0 shows that Grok 4 is a strong model, though its performance gaps with other frontier models like Gemini-2.5-Pro and o3-pro are small.

Zhepei Wei ✈️ ICLR 2025 (@weizhepei) 's Twitter Profile Photo

Thrilled to present three works at #ICML2025!🥳 🚀AdaDecode — Wed 7/16, East Exhibition Hall A-B (#E-2605) 🔢Negative Reinforcement for Reasoning — Fri 7/18, AI for Math Workshop 🤖WebAgent-R1 — Sat 7/19, Workshop on Computer Use Agents Feel free to stop by and chat about #LLMs!

Yung-Sung Chuang (@yungsungchuang) 's Twitter Profile Photo

Scaling CLIP on English-only data is outdated now… 🌍We built CLIP data curation pipeline for 300+ languages 🇬🇧We train MetaCLIP 2 without compromising English-task performance (it actually improves! 🥳It’s time to drop the language filter! 📝arxiv.org/abs/2507.22062 [1/5] 🧵

Scaling CLIP on English-only data is outdated now…

🌍We built CLIP data curation pipeline for 300+ languages
🇬🇧We train MetaCLIP 2 without compromising English-task performance (it actually improves!
🥳It’s time to drop the language filter!

📝arxiv.org/abs/2507.22062

[1/5]

🧵
Justin Chih-Yao Chen (@cyjustinchen) 's Twitter Profile Photo

Excited to share that MAgICoRe has been accepted to #EMNLP2025 main! 🎉 Our work identifies 3 key challenges in LLM refinement for reasoning: 1) Over-correction on easy problems 2) Fail to localize and fix its own errors 3) Too few refinement iterations for harder problems

Cheng Han Chiang (姜成翰) (@dcml0714) 's Twitter Profile Photo

🎉 Excited to share that our paper on audio-LLM-as-a-judge has been accepted to EMNLP 2025 Findings! 🔗 arxiv.org/abs/2506.05984… 🗝️ Highlights: 🧑‍⚖️ Agreement between human and audio-LLM-judge can be as high as human-human agreements 👑 Gemini-2.5-pro outperforms GPT-4o-audio as a

🎉 Excited to share that our paper on audio-LLM-as-a-judge has been accepted to EMNLP 2025 Findings!
🔗 arxiv.org/abs/2506.05984…
🗝️ Highlights:
🧑‍⚖️ Agreement between human and audio-LLM-judge can be as high as human-human agreements
👑 Gemini-2.5-pro outperforms GPT-4o-audio as a
Valentin Hofmann (@vjhofmann) 's Twitter Profile Photo

📢 New #COLM2025 paper 📢 Standard benchmarks give every LLM the same questions. This is like testing 5th graders and college seniors with *one* exam! 🥴 Meet Fluid Benchmarking, a capability-adaptive eval method delivering lower variance, higher validity, and reduced cost. 🧵

📢 New #COLM2025 paper 📢

Standard benchmarks give every LLM the same questions. This is like testing 5th graders and college seniors with *one* exam! 🥴

Meet Fluid Benchmarking, a capability-adaptive eval method delivering lower variance, higher validity, and reduced cost.

🧵
Zhepei Wei ✈️ ICLR 2025 (@weizhepei) 's Twitter Profile Photo

🤔Ever wondered why your post-training methods (SFT/RL) make LLMs reluctant to say “I don't know?” 🤩Introducing TruthRL — a truthfulness-driven RL method that significantly reduces hallucinations while achieving accuracy and proper abstention! 📃arxiv.org/abs/2509.25760 🧵[1/n]

🤔Ever wondered why your post-training methods (SFT/RL) make LLMs reluctant to say “I don't know?”

🤩Introducing TruthRL — a truthfulness-driven RL method that significantly reduces hallucinations while achieving accuracy and proper abstention!

📃arxiv.org/abs/2509.25760
🧵[1/n]
Justin Chih-Yao Chen (@cyjustinchen) 's Twitter Profile Photo

🚨 NuRL: Nudging the Boundaries of LLM Reasoning GRPO improves LLM reasoning, but often within the model's "comfort zone": hard samples (w/ 0% pass rate) remain unsolvable and contribute zero learning signals. In NuRL, we show that "nudging" the LLM with self-generated hints

🚨 NuRL: Nudging the Boundaries of LLM Reasoning

GRPO improves LLM reasoning, but often within the model's "comfort zone": hard samples (w/ 0% pass rate) remain unsolvable and contribute zero learning signals. In NuRL, we show that "nudging" the LLM with self-generated hints
Dan Hendrycks (@danhendrycks) 's Twitter Profile Photo

The term “AGI” is currently a vague, moving goalpost. To ground the discussion, we propose a comprehensive, testable definition of AGI. Using it, we can quantify progress: GPT-4 (2023) was 27% of the way to AGI. GPT-5 (2025) is 58%. Here’s how we define and measure it: 🧵

The term “AGI” is currently a vague, moving goalpost.

To ground the discussion, we propose a comprehensive, testable definition of AGI.
Using it, we can quantify progress:
GPT-4 (2023) was 27% of the way to AGI. GPT-5 (2025) is 58%.

Here’s how we define and measure it: 🧵