Minsu Kim (@minsuuukim) 's Twitter Profile
Minsu Kim

@minsuuukim

AI Researcher at Mila and KAIST

ID: 1897933692162772992

linkhttps://minsuukim.github.io/ calendar_today07-03-2025 08:53:59

8 Tweet

26 Followers

96 Following

Siddarth Venkatraman (@siddarthv66) 's Twitter Profile Photo

Traj level objectives (RLOO, GRPO) seem better than value function based methods for LLM training (Q-learning, PPO). Quite unfortunate. Traj level RL can't do skill stitching, this is extremely sample inefficient for learning compositional reasoning (such as multi turn tool use).

Taiwei Shi (@taiwei_shi) 's Twitter Profile Photo

Reinforcement finetuning (RFT) boosts LLM reasoning — but at what cost? We found that standard RFT drastically increases hallucination rates in LLMs. We term this the 𝐇𝐚𝐥𝐥𝐮𝐜𝐢𝐧𝐚𝐭𝐢𝐨𝐧 𝐓𝐚𝐱 𝐨𝐟 𝐑𝐅𝐓 and propose a simple, effective strategy to mitigate it. 👇

Reinforcement finetuning (RFT) boosts LLM reasoning — but at what cost?  

We found that standard RFT drastically increases hallucination rates in LLMs.  We term this the 𝐇𝐚𝐥𝐥𝐮𝐜𝐢𝐧𝐚𝐭𝐢𝐨𝐧 𝐓𝐚𝐱 𝐨𝐟 𝐑𝐅𝐓 and propose a simple, effective strategy to mitigate it. 👇
Siddarth Venkatraman (@siddarthv66) 's Twitter Profile Photo

Is there a universal strategy to turn any generative model—GANs, VAEs, diffusion models, or flows—into a conditional sampler, or finetuned to optimize a reward function? Yes! Outsourced Diffusion Sampling (ODS) accepted to ICML Conference , does exactly that!

Is there a universal strategy to turn any generative model—GANs, VAEs, diffusion models, or flows—into a conditional sampler, or finetuned to optimize a reward function?
Yes! Outsourced Diffusion Sampling (ODS) accepted to <a href="/icmlconf/">ICML Conference</a> , does exactly that!
LawZero - LoiZéro (@lawzero_) 's Twitter Profile Photo

Every frontier AI system should be grounded in a core commitment: to protect human joy and endeavour. Today, we launch LawZero - LoiZéro, a nonprofit dedicated to advancing safe-by-design AI. lawzero.org

Yoshua Bengio (@yoshua_bengio) 's Twitter Profile Photo

Today marks a big milestone for me. I'm launching LawZero - LoiZéro, a nonprofit focusing on a new safe-by-design approach to AI that could both accelerate scientific discovery and provide a safeguard against the dangers of agentic AI.

Alex Hernandez-Garcia (@alexhdezgcia) 's Twitter Profile Photo

📣 Call for a postdoc! We are looking for a postdoctoral researcher to work at the intersection of machine learning and materials science. Find all details here: assets-v2.circle.so/4htxv8ljrkzrxv…

fly51fly (@fly51fly) 's Twitter Profile Photo

[LG] Fast Monte Carlo Tree Diffusion: 100x Speedup via Parallel Sparse Planning J Yoon, H Cho, Y Bengio, S Ahn [KAIST & Mila – Quebec AI Institute] (2025) arxiv.org/abs/2506.09498

[LG] Fast Monte Carlo Tree Diffusion: 100x Speedup via Parallel Sparse Planning
J Yoon, H Cho, Y Bengio, S Ahn [KAIST &amp; Mila – Quebec AI Institute] (2025)
arxiv.org/abs/2506.09498
Seohong Park (@seohong_park) 's Twitter Profile Photo

Q-learning is not yet scalable seohong.me/blog/q-learnin… I wrote a blog post about my thoughts on scalable RL algorithms. To be clear, I'm still highly optimistic about off-policy RL and Q-learning! I just think we haven't found the right solution yet (the post discusses why).

Q-learning is not yet scalable

seohong.me/blog/q-learnin…

I wrote a blog post about my thoughts on scalable RL algorithms.

To be clear, I'm still highly optimistic about off-policy RL and Q-learning! I just think we haven't found the right solution yet (the post discusses why).
Kirill Neklyudov (@k_neklyudov) 's Twitter Profile Photo

(1/n) Sampling from the Boltzmann density better than Molecular Dynamics (MD)? It is possible with PITA 🫓 Progressive Inference Time Annealing! A spotlight GenBio Workshop @ ICML25 of ICML Conference 2025! PITA learns from "hot," easy-to-explore molecular states 🔥 and then cleverly "cools"

(1/n) Sampling from the Boltzmann density better than Molecular Dynamics (MD)? It is possible with PITA 🫓 Progressive Inference Time Annealing! A spotlight <a href="/genbio_workshop/">GenBio Workshop @ ICML25</a> of <a href="/icmlconf/">ICML Conference</a> 2025!

PITA learns from "hot," easy-to-explore molecular states 🔥 and then cleverly "cools"
Sungjin Ahn (@sungjinahn_) 's Twitter Profile Photo

⚡️ New breakthrough in Monte Carlo Tree Diffusion (MCTD) for System 2 Planning — powered by the KAIST–Mila collaboration! “Fast Monte Carlo Tree Diffusion: 100x Speedup via Parallel Sparse Planning” 📄 arxiv.org/abs/2506.09498 The biggest bottleneck of MCTD was speed. We

⚡️ New breakthrough in Monte Carlo Tree Diffusion (MCTD) for System 2 Planning — powered by the KAIST–Mila collaboration!

“Fast Monte Carlo Tree Diffusion: 100x Speedup via Parallel Sparse Planning”
📄 arxiv.org/abs/2506.09498

The biggest bottleneck of MCTD was speed. We
Kirill Neklyudov (@k_neklyudov) 's Twitter Profile Photo

1/ Where do Probabilistic Models, Sampling, Deep Learning, and Natural Sciences meet? 🤔 The workshop we’re organizing at #NeurIPS2025! 📢 FPI@NeurIPS 2025: Frontiers in Probabilistic Inference – Learning meets Sampling Learn more and submit → fpiworkshop.org

Seohong Park (@seohong_park) 's Twitter Profile Photo

Flow Q-learning (FQL) is a simple method to train/fine-tune an expressive flow policy with RL. Come visit our poster at 4:30p-7p this Wed (evening session, 2nd day)!

Flow Q-learning (FQL) is a simple method to train/fine-tune an expressive flow policy with RL.

Come visit our poster at 4:30p-7p this Wed (evening session, 2nd day)!
yunhuijang (@yunhuijang_) 's Twitter Profile Photo

🎉3 papers are accepted to #EMNLP2025! Huge thanks to my co-authors Sungsoo Ahn, Jaehyung, Hyomin Kim, and Hyosoon. 1️⃣CLEANMOL: SMILES parsing for pre-training (or for RL?) LLM 2️⃣ MT-Mol: agent for molecular optimization 3️⃣ CORE-PO: RL to prefer high-confidence reasoning path

🎉3 papers are accepted to #EMNLP2025! Huge thanks to my co-authors <a href="/sungsoo_ahn_/">Sungsoo Ahn</a>, Jaehyung, <a href="/hyomin126/">Hyomin Kim</a>, and Hyosoon.
1️⃣CLEANMOL: SMILES parsing for pre-training (or for RL?) LLM
2️⃣ MT-Mol: agent  for molecular optimization
3️⃣ CORE-PO: RL to prefer high-confidence reasoning path
Sungjin Ahn (@sungjinahn_) 's Twitter Profile Photo

🚀 Introducing CrafterDojo! Crafter has been a popular testbed for open-ended agent learning—but progress has been limited without foundation models like VPT, CLIP, and STEVE. With CrafterDojo, we provide these models + toolkits so the community can easily prototype

🚀 Introducing CrafterDojo!

Crafter has been a popular testbed for open-ended agent learning—but progress has been limited without foundation models like VPT, CLIP, and STEVE.

With CrafterDojo, we provide these models + toolkits so the community can easily prototype