Hao Sun - IRLxLLM (@holarissun) 's Twitter Profile
Hao Sun - IRLxLLM

@holarissun

4th year PhD Student at @Cambridge_Uni. IRL x LLMs. Superhuman Intelligence needs RL, and LLMs help us to learn from it. On job market 2025.

ID: 1583111063712763912

linkhttps://holarissun.github.io/ calendar_today20-10-2022 15:01:28

142 Tweet

768 Followers

849 Following

Hao Sun - IRLxLLM (@holarissun) 's Twitter Profile Photo

Heading to 🇸🇬ICLR next week! Can’t wait to catch up with old friends and meet new ones — let’s chat about RL, reward models, alignment, reasoning, and agents! Also, fun fact🤓: Yunyi won’t be there physically, but his digital twin will be attending instead. Stay tuned!

Hao Sun - IRLxLLM (@holarissun) 's Twitter Profile Photo

The oral sessions and poster sessions are happening at the same time, so it actually feels like the oral speakers are just talking to each other🤣

Jean-François Ton (@jeanfrancois287) 's Twitter Profile Photo

Happy to share that our paper on "Active Reward Modeling" has been accepted to ICML 2025! #ICML2025 The part I like the most about the project is its simplicity! Huge thanks to my amazing co-authors Yunyi Shen/申云逸 🐺 Hao Sun - RL More to come! For more detailed 🧵 see 👇

Jean-François Ton (@jeanfrancois287) 's Twitter Profile Photo

📢New Paper on Process Reward Modelling 📢 Ever wondered about the pathologies of existing PRMs and how they could be remedied? In our latest paper, we investigate this through the lens of Information theory! #icml2025 Here’s a 🧵on how it works 👇 arxiv.org/abs/2411.11984

📢New Paper on Process Reward Modelling 📢

Ever wondered about the pathologies of existing PRMs and how they could be remedied? In our latest paper, we investigate this through the lens of Information theory! #icml2025 

Here’s a 🧵on how it works 👇
arxiv.org/abs/2411.11984
Hao Sun - IRLxLLM (@holarissun) 's Twitter Profile Photo

"Knowledge belongs to humanity, and is the torch which illuminates the world." — Louis Pasteur Especially for those contributed by the community.

"Knowledge belongs to humanity, and is the torch which illuminates the world."
— Louis Pasteur

Especially for those contributed by the community.
Hao Sun - IRLxLLM (@holarissun) 's Twitter Profile Photo

Now with Qwen’s RL-fine-tuning results, are we witnessing a quiet return of prompt optimization/engineering? Now we have a 2-player game: users become “lazy prompters”, but the system prompts (e.g. thinking patterns) need to be highly optimized. Next: Bi-level optimization?

Now with Qwen’s RL-fine-tuning results, are we witnessing a quiet return of prompt optimization/engineering?

Now we have a 2-player game: users become “lazy prompters”, but the system prompts (e.g. thinking patterns) need to be highly optimized.

Next: Bi-level optimization?
Hao Sun - IRLxLLM (@holarissun) 's Twitter Profile Photo

🚀 RL is powering breakthroughs in LLM alignment, reasoning, and agentic apps. Are you ready to dive into the RL x LLM frontier? Join us at ACL 2025 ACL’25 tutorial: Inverse RL Meets LLM Alignment this Sunday at Vienna🇦🇹(Jul 27th, 9am) 📄 Preprint at huggingface.co/papers/2507.13…