Minsu Kim (@minsuuukim) Twitter Tweets • TwiCopy

Siddarth Venkatraman

4 months ago

Traj level objectives (RLOO, GRPO) seem better than value function based methods for LLM training (Q-learning, PPO). Quite unfortunate. Traj level RL can't do skill stitching, this is extremely sample inefficient for learning compositional reasoning (such as multi turn tool use).

thumb_up_off_alt17

chat_bubble_outline3

repeat2

shareShare

Taiwei Shi

@taiwei_shi

4 months ago

Reinforcement finetuning (RFT) boosts LLM reasoning — but at what cost? We found that standard RFT drastically increases hallucination rates in LLMs. We term this the 𝐇𝐚𝐥𝐥𝐮𝐜𝐢𝐧𝐚𝐭𝐢𝐨𝐧 𝐓𝐚𝐱 𝐨𝐟 𝐑𝐅𝐓 and propose a simple, effective strategy to mitigate it. 👇

thumb_up_off_alt232

chat_bubble_outline2

repeat33

shareShare

Siddarth Venkatraman

@siddarthv66

3 months ago

Is there a universal strategy to turn any generative model—GANs, VAEs, diffusion models, or flows—into a conditional sampler, or finetuned to optimize a reward function? Yes! Outsourced Diffusion Sampling (ODS) accepted to ICML Conference , does exactly that!

thumb_up_off_alt48

chat_bubble_outline2

repeat24

shareShare

LawZero - LoiZéro

@lawzero_

3 months ago

Every frontier AI system should be grounded in a core commitment: to protect human joy and endeavour. Today, we launch LawZero - LoiZéro, a nonprofit dedicated to advancing safe-by-design AI. lawzero.org

thumb_up_off_alt277

chat_bubble_outline23

repeat75

shareShare

Yoshua Bengio

@yoshua_bengio

3 months ago

Today marks a big milestone for me. I'm launching LawZero - LoiZéro, a nonprofit focusing on a new safe-by-design approach to AI that could both accelerate scientific discovery and provide a safeguard against the dangers of agentic AI.

thumb_up_off_alt575

chat_bubble_outline40

repeat107

shareShare

Alex Hernandez-Garcia

@alexhdezgcia

3 months ago

📣 Call for a postdoc! We are looking for a postdoctoral researcher to work at the intersection of machine learning and materials science. Find all details here: assets-v2.circle.so/4htxv8ljrkzrxv…

thumb_up_off_alt12

chat_bubble_outline0

repeat3

shareShare

fly51fly

@fly51fly

3 months ago

[LG] Fast Monte Carlo Tree Diffusion: 100x Speedup via Parallel Sparse Planning J Yoon, H Cho, Y Bengio, S Ahn [KAIST & Mila – Quebec AI Institute] (2025) arxiv.org/abs/2506.09498

thumb_up_off_alt16

chat_bubble_outline0

repeat6

shareShare

Seohong Park

@seohong_park

3 months ago

Q-learning is not yet scalable seohong.me/blog/q-learnin… I wrote a blog post about my thoughts on scalable RL algorithms. To be clear, I'm still highly optimistic about off-policy RL and Q-learning! I just think we haven't found the right solution yet (the post discusses why).

thumb_up_off_alt1,1K

chat_bubble_outline34

repeat174

shareShare

Kirill Neklyudov

@k_neklyudov

3 months ago

(1/n) Sampling from the Boltzmann density better than Molecular Dynamics (MD)? It is possible with PITA 🫓 Progressive Inference Time Annealing! A spotlight GenBio Workshop @ ICML25 of ICML Conference 2025! PITA learns from "hot," easy-to-explore molecular states 🔥 and then cleverly "cools"

thumb_up_off_alt295

chat_bubble_outline5

repeat52

shareShare

Sungjin Ahn

@sungjinahn_

3 months ago

⚡️ New breakthrough in Monte Carlo Tree Diffusion (MCTD) for System 2 Planning — powered by the KAIST–Mila collaboration! “Fast Monte Carlo Tree Diffusion: 100x Speedup via Parallel Sparse Planning” 📄 arxiv.org/abs/2506.09498 The biggest bottleneck of MCTD was speed. We

thumb_up_off_alt297

chat_bubble_outline4

repeat60

shareShare

Kirill Neklyudov

@k_neklyudov

2 months ago

1/ Where do Probabilistic Models, Sampling, Deep Learning, and Natural Sciences meet? 🤔 The workshop we’re organizing at #NeurIPS2025! 📢 FPI@NeurIPS 2025: Frontiers in Probabilistic Inference – Learning meets Sampling Learn more and submit → fpiworkshop.org

thumb_up_off_alt107

chat_bubble_outline1

repeat37

shareShare

Siddarth Venkatraman

@siddarthv66

2 months ago

Come check out our poster this Wednesday at 4:30pm ICML Conference !! Happy to chat about diffusion, GFlowNets and RL!

thumb_up_off_alt24

chat_bubble_outline0

repeat5

shareShare

Seohong Park

@seohong_park

2 months ago

Flow Q-learning (FQL) is a simple method to train/fine-tune an expressive flow policy with RL. Come visit our poster at 4:30p-7p this Wed (evening session, 2nd day)!

thumb_up_off_alt501

chat_bubble_outline5

repeat64

shareShare

yunhuijang

@yunhuijang_

22 days ago

🎉3 papers are accepted to #EMNLP2025! Huge thanks to my co-authors Sungsoo Ahn, Jaehyung, Hyomin Kim, and Hyosoon. 1️⃣CLEANMOL: SMILES parsing for pre-training (or for RL?) LLM 2️⃣ MT-Mol: agent for molecular optimization 3️⃣ CORE-PO: RL to prefer high-confidence reasoning path

🎉3 papers are accepted to #EMNLP2025! Huge thanks to my co-authors <a href="/sungsoo_ahn_/">Sungsoo Ahn</a>, Jaehyung, <a href="/hyomin126/">Hyomin Kim</a>, and Hyosoon.
1️⃣CLEANMOL: SMILES parsing for pre-training (or for RL?) LLM
2️⃣ MT-Mol: agent for molecular optimization
3️⃣ CORE-PO: RL to prefer high-confidence reasoning path

thumb_up_off_alt32

chat_bubble_outline1

repeat3

shareShare

Sungjin Ahn

@sungjinahn_

14 days ago

🚀 Introducing CrafterDojo! Crafter has been a popular testbed for open-ended agent learning—but progress has been limited without foundation models like VPT, CLIP, and STEVE. With CrafterDojo, we provide these models + toolkits so the community can easily prototype

thumb_up_off_alt24

chat_bubble_outline2

repeat7

shareShare