Ayoung Lee (@o_cube01) 's Twitter Profile
Ayoung Lee

@o_cube01

CSE Ph.D. at UMich | Interested in Language Reasoning

ID: 1757296794810437632

calendar_today13-02-2024 06:53:13

17 Tweet

46 Followers

78 Following

Xinliang (Frederick) Zhang (@frederickxzhang) 's Twitter Profile Photo

Heard of the Alaska-Hawaii merger?🤔Wonder if LLMs know it’s pending government approval before it can happen? They stumble, but we’ve got a fix⚒️! Dive into my #EMNLP2024 work 𝐍𝐚𝐫𝐫𝐚𝐭𝐢𝐯𝐞-𝐨𝐟-𝐓𝐡𝐨𝐮𝐠𝐡𝐭—a special prompting technique to unlock LLMs’ temporal reasoning

Heard of the Alaska-Hawaii merger?🤔Wonder if LLMs know it’s pending government approval before it can happen? They stumble, but we’ve got a fix⚒️!
Dive into my #EMNLP2024 work 𝐍𝐚𝐫𝐫𝐚𝐭𝐢𝐯𝐞-𝐨𝐟-𝐓𝐡𝐨𝐮𝐠𝐡𝐭—a special prompting technique to unlock LLMs’ temporal reasoning
Yunxiang Zhang (@yunxiangzhang4) 's Twitter Profile Photo

🚨 New Benchmark Drop! Can LLMs actually do ML research? Not toy problems, not Kaggle tweaks—but real, unsolved ML conference research competitions? We built MLRC-BENCH to find out. Paper: arxiv.org/abs/2504.09702 Leaderboard: huggingface.co/spaces/launch/… Code: github.com/yunx-z/MLRC-Be…

🚨 New Benchmark Drop!
Can LLMs actually do ML research? Not toy problems, not Kaggle tweaks—but real, unsolved ML conference research competitions?
We built MLRC-BENCH to find out.
Paper: arxiv.org/abs/2504.09702
Leaderboard: huggingface.co/spaces/launch/…
Code: github.com/yunx-z/MLRC-Be…
Muhammad Khalifa (@mkhalifaaaa) 's Twitter Profile Photo

🚨Announcing SCALR @ COLM 2025 — Call for Papers!🚨 The 1st Workshop on Test-Time Scaling and Reasoning Models (SCALR) is coming to Conference on Language Modeling in Montreal this October! This is the first workshop dedicated to this growing research area. 🌐 scalr-workshop.github.io

🚨Announcing SCALR @ COLM 2025 — Call for Papers!🚨

The 1st Workshop on Test-Time Scaling and Reasoning Models (SCALR) is coming to <a href="/COLM_conf/">Conference on Language Modeling</a>  in Montreal this October!

This is the first workshop dedicated to this growing research area.

🌐 scalr-workshop.github.io
Yeda Song (@__runamu__) 's Twitter Profile Photo

🔥 GUI agents struggle with real-world mobile tasks. We present MONDAY—a diverse, large-scale dataset built via an automatic pipeline that transforms internet videos into GUI agent data. ✅ VLMs trained on MONDAY show strong generalization ✅ Open data (313K steps) (1/7) 🧵 #CVPR

🔥 GUI agents struggle with real-world mobile tasks.
We present MONDAY—a diverse, large-scale dataset built via an automatic pipeline that transforms internet videos into GUI agent data.
✅ VLMs trained on MONDAY show strong generalization
✅ Open data (313K steps) (1/7) 🧵
#CVPR
Jie Ruan (@jieruan75) 's Twitter Profile Photo

🔍LLMs now give medical diagnoses, legal advice, and even tackle scientific problems. ❓Your LLM sounds smart. But what if it’s just good at faking expertise? 🚀We built ExpertLongBench to find out. 📉And the results? They revealed several concerns.👇 🔗 huggingface.co/spaces/launch/…

🔍LLMs now give medical diagnoses, legal advice, and even tackle scientific problems.
❓Your LLM sounds smart.  But what if it’s just good at faking expertise?
🚀We built ExpertLongBench to find out.
📉And the results? They revealed several concerns.👇
🔗  huggingface.co/spaces/launch/…
Kai Zou (@zkjzou) 's Twitter Profile Photo

🔥 Excited to introduce ManyICLBench (ACL 2025) 🧐 Do many-shot ICL tasks evaluate LCLMs' ability to retrieve the most similar examples or learn from many examples? We carefully analyzed numerous tasks and categorized them. 📄 Paper: arxiv.org/abs/2411.07130 #ACL2025

Xinliang (Frederick) Zhang (@frederickxzhang) 's Twitter Profile Photo

How do LLMs really navigate the thinking space? Straight off to a final answer OR follow a wiggly path? Definitely commit OR get stuck to “infinite” self-doubting? In our latest study, we unravel (over-)thinking through the lens of sub-thoughts: rb.gy/viud7z more in 🧵

How do LLMs really navigate the thinking space? Straight off to a final answer OR follow a wiggly path? Definitely commit OR get stuck to “infinite” self-doubting?
In our latest study, we unravel (over-)thinking through the lens of sub-thoughts: rb.gy/viud7z
more in 🧵
Ayoung Lee (@o_cube01) 's Twitter Profile Photo

I will be at NeurIPS from Dec 2nd to Dec 5th. I am interested in reasoning and alignment, and also looking for 2026 summer internships 👀 Feel free to DM me if you would like to chat or grab coffee ☕️! Excited to reconnect with old friends and make new ones😆