Changho Shin @ ICLR 2025 (@changho_shin_) 's Twitter Profile
Changho Shin @ ICLR 2025

@changho_shin_

Ph.D student at @WisconsinCS @UWMadison

ID: 1499060204570394625

linkhttp://ch-shin.github.io calendar_today02-03-2022 16:33:17

227 Tweet

454 Takipçi

890 Takip Edilen

Yoonho Lee (@yoonholeee) 's Twitter Profile Photo

The standard way to improve reasoning in LLMs is to train on long chains of thought. But these traces are often brute-force and shallow. Introducing RLAD, where models instead learn _reasoning abstractions_: concise textual strategies that guide structured exploration. 1/N🧵

The standard way to improve reasoning in LLMs is to train on long chains of thought.

But these traces are often brute-force and shallow.

Introducing RLAD, where models instead learn _reasoning abstractions_: concise textual strategies that guide structured exploration. 
1/N🧵
Fred Sala (@fredsala) 's Twitter Profile Photo

Super excited to present our new work on hybrid architecture models—getting the best of Transformers and SSMs like Mamba—at #COLM2025! Come chat with Nicholas Roberts at poster session 2 on Tuesday. Thread below! (1)

Super excited to present our new work on hybrid architecture models—getting the best of Transformers and SSMs like Mamba—at #COLM2025! Come chat with <a href="/nick11roberts/">Nicholas Roberts</a> at poster session 2 on Tuesday. Thread below! (1)
Albert Ge (@albert_ge_95) 's Twitter Profile Photo

🔭 Towards Extending Open dLLMs to 131k Tokens dLLMs behave differently from AutoRegressive models—they lack attention sinks, making long-context extension tricky. A few simple tweaks go a long way!! ✍️blog albertge.notion.site/longdllm 💻code github.com/lbertge/longdl…

🔭 Towards Extending Open dLLMs to 131k Tokens
dLLMs behave differently from AutoRegressive models—they lack attention sinks, making long-context extension tricky.
A few simple tweaks go a long way!!
✍️blog albertge.notion.site/longdllm
💻code github.com/lbertge/longdl…
Fred Sala (@fredsala) 's Twitter Profile Photo

The coolest trend for AI is shifting from conversation to action—less talking and more doing. This is also a great opportunity for evals: we need benchmarks that measure utility, including in an economic sense. terminalbench is my favorite effort of this type!

Sungmin Cha (@_sungmin_cha) 's Twitter Profile Photo

How can we be sure a generative model (LLMs, Diffusion) has truly unlearned something? What if existing evaluation metrics are misleading us? In our new paper, we introduce FADE, a new metric that assesses genuine unlearning by measuring distributional alignment, moving beyond

How can we be sure a generative model (LLMs, Diffusion) has truly unlearned something? What if existing evaluation metrics are misleading us?

In our new paper, we introduce FADE, a new metric that assesses genuine unlearning by measuring distributional alignment, moving beyond
Jiayu (Mila) Wang (@jiayuwang111) 's Twitter Profile Photo

Excited to share our work on deep research! In this work, we argue that four task design principles are essential for fair comparison in deep research: (1) user-centric, (2) dynamic, (3) unambiguous, and (4) multi-faceted & search-intensive, and LiveResearchBench is guided

Albert Ge (@albert_ge_95) 's Twitter Profile Photo

new state of the art UW School of Computer, Data & Information Sciences building fosters state of the art discussions 😃 excited to kickstart our new ml reading seminar! today we had Nicholas E. Corrado give a talk on his latest work on data mixing for llm alignment! our reading seminar sites.google.com/view/madml

new state of the art <a href="/uwcdis/">UW School of Computer, Data & Information Sciences</a> building fosters state of the art discussions 😃

excited to kickstart our new ml reading seminar! today we had <a href="/NicholasEC49673/">Nicholas E. Corrado</a> give a talk on his latest work on data mixing for llm alignment!

our reading seminar sites.google.com/view/madml
Aniket Rege (@wregss) 's Twitter Profile Photo

Lots of disagreement on the TL about the definition of AGI 🤔 Meanwhile on the way to #ICCV2025 , they’re selling AGI at O’Hare!

Lots of disagreement on the TL about the definition of AGI 🤔

Meanwhile on the way to #ICCV2025 , they’re selling AGI at O’Hare!
Aniket Rege (@wregss) 's Twitter Profile Photo

Also presenting today: 1. MMFM4 in Ex Hall 2 #190 from 930-1030 AM 2. CEGIS in Ex Hall 2 somewhere between #132 and #145 Drop by to chat about T2I model bias and how to approximate human judgments of cultural faithfulness!

Lester Mackey (@lestermackey) 's Twitter Profile Photo

If you're a PhD student interested in interning with me or one of my amazing colleagues at Microsoft Research New England (Microsoft Research New England, Microsoft Research) this summer, please apply here jobs.careers.microsoft.com/global/en/job/… (If you'd like to work with me, please include my name in your cover letter!)

Brenden Lake (@lakebrenden) 's Twitter Profile Photo

There are still open desks in our new Human & Machine Intelligence lab at Princeton. Express your interest in joining us: lake-lab.github.io/apply/

There are still open desks in our new Human &amp; Machine Intelligence lab at Princeton. Express your interest in joining us: lake-lab.github.io/apply/
fly51fly (@fly51fly) 's Twitter Profile Photo

[LG] Imbalanced Gradients in RL Post-Training of Multi-Task LLMs R Wu, A Samanta, A Jain, S Fujimoto... [Meta AI] (2025) arxiv.org/abs/2510.19178

[LG] Imbalanced Gradients in RL Post-Training of Multi-Task LLMs
R Wu, A Samanta, A Jain, S Fujimoto... [Meta AI] (2025)
arxiv.org/abs/2510.19178
Snorkel AI (@snorkelai) 's Twitter Profile Photo

New benchmark drop 🚀 SnorkelSpatial tests how well LLMs can think in space, following text-based moves and rotations in a 2D world.

New benchmark drop 🚀
SnorkelSpatial tests how well LLMs can think in space, following text-based moves and rotations in a 2D world.
Harit Vishwakarma (@harit_v) 's Twitter Profile Photo

Introducing SnorkelSpatial: A New Benchmark for Evaluating Spatial Reasoning in LLMs Spatial reasoning is everywhere from navigating city maps to understanding molecular interactions. But how well do LLMs handle tasks that require tracking objects moving through space?

Harit Vishwakarma (@harit_v) 's Twitter Profile Photo

We built SnorkelSpatial to answer this question. It's a procedurally generated benchmark that tests LLMs on spatial reasoning through a 2D grid world where particles and boards move and rotate through sequences of actions.

We built SnorkelSpatial to answer this question. It's a procedurally generated benchmark that tests LLMs on spatial reasoning through a 2D grid world where particles and boards move and rotate through sequences of actions.
Snorkel AI (@snorkelai) 's Twitter Profile Photo

Evaluating how models reason about space and motion is key to building grounded, trustworthy AI. SnorkelSpatial offers a data-centric benchmark for measuring just that. Explore the research👇

Alex Ratner (@ajratner) 's Twitter Profile Photo

Static benchmarks as the gold standard of measurement will increasingly be a thing of the past. The future is dynamic benchmarks - regularly updated in response to evolving failure modes, error analyses, and objectives. Excited to see Snorkel AI Research leading the way here!

Snorkel AI (@snorkelai) 's Twitter Profile Photo

Excited for this release -- can't wait to see how agents handle the Snorkel-contributed tasks! We'll be at the event too -- see you there!

Jaden Park (@_jadenpark) 's Twitter Profile Photo

Me: memorize past exams 📚💯 Also me: fail on a slight tweak 🤦‍♂️🤦‍♂️ Turns out, we can use the same method to 𝗱𝗲𝘁𝗲𝗰𝘁 𝗰𝗼𝗻𝘁𝗮𝗺𝗶𝗻𝗮𝘁𝗲𝗱 𝗩𝗟𝗠𝘀! 🧵(1/10) - Project Page: mm-semantic-perturbation.github.io

Me: memorize past exams 📚💯
Also me: fail on a slight tweak 🤦‍♂️🤦‍♂️

Turns out, we can use the same method to 𝗱𝗲𝘁𝗲𝗰𝘁 𝗰𝗼𝗻𝘁𝗮𝗺𝗶𝗻𝗮𝘁𝗲𝗱 𝗩𝗟𝗠𝘀! 🧵(1/10)

- Project Page: mm-semantic-perturbation.github.io
Jon Saad-Falcon (@jonsaadfalcon) 's Twitter Profile Photo

Data centers dominate AI, but they're hitting physical limits. What if the future of AI isn't just bigger data centers, but local intelligence in our hands? The viability of local AI depends on intelligence efficiency. To measure this, we propose intelligence per watt (IPW):

Data centers dominate AI, but they're hitting physical limits. What if the future of AI isn't just bigger data centers, but local intelligence in our hands?

The viability of local AI depends on intelligence efficiency. To measure this, we propose intelligence per watt (IPW):