Sungmin Cha (@_sungmin_cha) 's Twitter Profile
Sungmin Cha

@_sungmin_cha

Faculty Fellow @nyuniversity | PhD @SeoulNatlUni

ID: 1146448298795917313

linkhttps://sites.google.com/view/sungmin-cha/ calendar_today03-07-2019 15:59:09

174 Tweet

259 Followers

221 Following

Akshay 🚀 (@akshay_pachaar) 's Twitter Profile Photo

Google just dropped "Attention is all you need (V2)" This paper could solve AI's biggest problem: Catastrophic forgetting. When AI models learn something new, they tend to forget what they previously learned. Humans don't work this way, and now Google Research has a solution.

Google just dropped "Attention is all you need (V2)"

This paper could solve AI's biggest problem:

Catastrophic forgetting.

When AI models learn something new, they tend to forget what they previously learned. Humans don't work this way, and now Google Research has a solution.
Andrew Ng (@andrewyng) 's Twitter Profile Photo

Releasing a new "Agentic Reviewer" for research papers. I started coding this as a weekend project, and Yixing Jiang made it much better. I was inspired by a student who had a paper rejected 6 times over 3 years. Their feedback loop -- waiting ~6 months for feedback each time -- was

Releasing a new "Agentic Reviewer" for research papers. I started coding this as a weekend project, and <a href="/jyx_su/">Yixing Jiang</a> made it much better.

I was inspired by a student who had a paper rejected 6 times over 3 years. Their feedback loop -- waiting ~6 months for feedback each time -- was
Nathan Lambert (@natolambert) 's Twitter Profile Photo

Love to see more fully open post-training recipes (this one multimodal reasoning). It's surprising how rare post-training data is because the opportunity for impact is huge. Lots of people will try it and simple data methods still can improve on SOTA.

Cameron R. Wolfe, Ph.D. (@cwolferesearch) 's Twitter Profile Photo

The original PPO-based RLHF pipeline had 4 model copies: 1. Policy 2. Reference 3. Critic 4. Reward Model Recent GRPO-based RLVR pipelines have eliminated all of these models except for the policy. - The critic is no longer needed because values are estimated from group

The original PPO-based RLHF pipeline had 4 model copies:

1. Policy
2. Reference
3. Critic
4. Reward Model

Recent GRPO-based RLVR pipelines have eliminated all of these models except for the policy.

- The critic is no longer needed because values are estimated from group
Changho Shin @ ICLR 2025 (@changho_shin_) 's Twitter Profile Photo

A few updates * Had an amazing summer at MSR NE -- big thanks to David Alvarez Melis * Defending my PhD this semester and, assuming all goes well (😂), will join Brenden Lake's group at Princeton as a postdoc next year! * I’ll be at NeurIPS 12/1~12/7 -- happy to catch up or meet new folks!

Paul Vicol (@paulvicol) 's Twitter Profile Photo

🚀Introducing TMLR Beyond PDF! 🎬This is a new, HTML-based submission format for TMLR, that supports interactive figures and videos, along with the usual LaTeX and images. 🎉Thanks to TMLR Editors in Chief Hugo Larochelle Gautam Kamath ✈️ NeurIPS 2025 Naila Murray Nihar B. Shah Laurent Charlin!

Kangwook Lee (@kangwook_lee) 's Twitter Profile Photo

LLM as a judge has become a dominant way to evaluate how good a model is at solving a task, since it works without a test set and handles cases where answers are not unique. But despite how widely this is used, almost all reported results are highly biased. Excited to share our

LLM as a judge has become a dominant way to evaluate how good a model is at solving a task, since it works without a test set and handles cases where answers are not unique.

But despite how widely this is used, almost all reported results are highly biased.

Excited to share our
Sungmin Cha (@_sungmin_cha) 's Twitter Profile Photo

I will be at #NeurIPS2025 in San Diego next week! Please stop by my poster if you are interested in the underlying mechanisms of #KnowledgeDistillation in generative models! - Paper: Why Knowledge Distillation Works in Generative Models: A Minimal Working Explanation - Wed 3

I will be at #NeurIPS2025 in San Diego next week!

Please stop by my poster if you are interested in the underlying mechanisms of #KnowledgeDistillation in generative models!

- Paper: Why Knowledge Distillation Works in Generative Models: A Minimal Working Explanation
- Wed 3
Robert Youssef (@rryssf_) 's Twitter Profile Photo

This paper shocked me 🤯 Everyone on X keeps bragging about “LLM-as-a-judge” like it’s some magical truth oracle. But this paper shows something insane: Most LLM evaluations you’ve seen are biased by design not because models are bad, but because the judge itself quietly

This paper shocked me 🤯

Everyone on X keeps bragging about “LLM-as-a-judge” like it’s some magical truth oracle.

But this paper shows something insane:

Most LLM evaluations you’ve seen are biased by design not because models are bad, but because the judge itself quietly
Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

The paper shows that models that judge answers well usually also give good answers themselves. Chatbots are tested by humans or a strong model choosing better answers for prompts, which costs money. This work studies how closely 2 skills move together, writing good answers and

The paper shows that models that judge answers well usually also give good answers themselves.

Chatbots are tested by humans or a strong model choosing better answers for prompts, which costs money.

This work studies how closely 2 skills move together, writing good answers and
Lei Yang @ ICLR (@diyerxx) 's Twitter Profile Photo

Got burned by an Apple ICLR paper — it was withdrawn after my Public Comment. So here’s what happened. Earlier this month, a colleague shared an Apple paper on arXiv with me — it was also under review for ICLR 2026. The benchmark they proposed was perfectly aligned with a

Got burned by an Apple ICLR paper — it was withdrawn after my Public Comment.

So here’s what happened. Earlier this month, a colleague shared an Apple paper on arXiv with me — it was also under review for ICLR 2026.
The benchmark they proposed was perfectly aligned with a
alphaXiv (@askalphaxiv) 's Twitter Profile Photo

How to Properly do LLM-as-a-Judge Raw LLM-as-a-Judge scores are inherently biased due to how LLMs would often make mistakes This paper proposes a simple statistical method to correct the scores and calculate valid confidence intervals via a human-verified calibration set

How to Properly do LLM-as-a-Judge

Raw LLM-as-a-Judge scores are inherently biased due to how LLMs would often make mistakes

This paper proposes a simple statistical method to correct the scores and calculate valid confidence intervals via a human-verified calibration set
Harvey Hu (@harveyhucal) 's Twitter Profile Photo

AI agents can't learn from experience. Until now. Earlier this year, one of our customers asked us: "If your web agent uses my website once, will it be easier the next time it visits?" The question highlighted a fundamental gap between human intelligence and AI agents: Humans

Sungmin Cha (@_sungmin_cha) 's Twitter Profile Photo

I just arrived in San Diego and received my badge for NeurIPS 2025! I will be attending the conference on 12/1 - 12/7. If you'd like to discuss continual learning, machine unlearning, knowledge distillation, and related topics with me, please contact me! :)

I just arrived in San Diego and received my badge for NeurIPS 2025!  I will be attending the conference on 12/1 - 12/7.  If you'd like to discuss continual learning, machine unlearning, knowledge distillation, and related topics with me, please contact me! :)
Shubhra Mishra (@shubhramishra_) 's Twitter Profile Photo

Why can humans learn continually, but LLMs can’t? In our oral presentation at the NeurIPS CCFM workshop, through the lens of a human-curriculum, we explore how continual learning impacts LLMs. We also release our 23.4B-token dataset, models, and data + training pipeline. 🧵

Will Bryk (@williambryk) 's Twitter Profile Photo

We embedded all 5000+ NeurIPS papers! exa.ai/neurips Cool queries: - "new retrieval techniques" - "the paper that elon would love most" - "intersection of coding agents and biology, poster session 5" It uses our in-house model trained for precise semantic retrieval 😌

We embedded all 5000+ NeurIPS papers! exa.ai/neurips

Cool queries:
- "new retrieval techniques"
- "the paper that elon would love most"
- "intersection of coding agents and biology, poster session 5"

It uses our in-house model trained for precise semantic retrieval 😌