Hao Sun - IRLxLLM (@holarissun) Twitter Tweets • TwiCopy

Hao Sun - IRLxLLM

@holarissun

+ Follow

4th year PhD Student at @Cambridge_Uni. IRL x LLMs. Superhuman Intelligence needs RL, and LLMs help us to learn from it. On job market 2025.

ID: 1583111063712763912

linkhttps://holarissun.github.io/ calendar_today20-10-2022 15:01:28

142 Tweet

768 Followers

849 Following

Hao Sun - IRLxLLM

@holarissun

5 months ago

Heading to 🇸🇬ICLR next week! Can’t wait to catch up with old friends and meet new ones — let’s chat about RL, reward models, alignment, reasoning, and agents! Also, fun fact🤓: Yunyi won’t be there physically, but his digital twin will be attending instead. Stay tuned!

thumb_up_off_alt18

chat_bubble_outline0

repeat2

shareShare

Hao Sun - IRLxLLM

@holarissun

4 months ago

The oral sessions and poster sessions are happening at the same time, so it actually feels like the oral speakers are just talking to each other🤣

thumb_up_off_alt6

chat_bubble_outline0

repeat0

shareShare

Yunyi Shen/申云逸 🐺

@shenraphael

4 months ago

Glad to be there with Hao Sun - RL presenting our work openreview.net/forum?id=rfdbl…

Glad to be there with <a href="/HolarisSun/">Hao Sun - RL</a> presenting our work openreview.net/forum?id=rfdbl…

thumb_up_off_alt44

chat_bubble_outline1

repeat7

shareShare

Hao Sun - IRLxLLM

@holarissun

4 months ago

ICLR wrapped! Eggie and Toastie said it was the BEST🥰

thumb_up_off_alt50

chat_bubble_outline2

repeat2

shareShare

Hao Sun - IRLxLLM

@holarissun

4 months ago

OpenReview Justice!

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

Jean-François Ton

@jeanfrancois287

4 months ago

Happy to share that our paper on "Active Reward Modeling" has been accepted to ICML 2025! #ICML2025 The part I like the most about the project is its simplicity! Huge thanks to my amazing co-authors Yunyi Shen/申云逸 🐺 Hao Sun - RL More to come! For more detailed 🧵 see 👇

thumb_up_off_alt12

chat_bubble_outline0

repeat3

shareShare

Jean-François Ton

@jeanfrancois287

4 months ago

📢New Paper on Process Reward Modelling 📢 Ever wondered about the pathologies of existing PRMs and how they could be remedied? In our latest paper, we investigate this through the lens of Information theory! #icml2025 Here’s a 🧵on how it works 👇 arxiv.org/abs/2411.11984

thumb_up_off_alt307

chat_bubble_outline5

repeat74

shareShare

Hao Sun - IRLxLLM

@holarissun

4 months ago

AI cannot feel time, then how can it really understand humans?

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Hao Sun - IRLxLLM

@holarissun

3 months ago

"Knowledge belongs to humanity, and is the torch which illuminates the world." — Louis Pasteur Especially for those contributed by the community.

thumb_up_off_alt7

chat_bubble_outline0

repeat0

shareShare

Hao Sun - IRLxLLM

@holarissun

3 months ago

Now with Qwen’s RL-fine-tuning results, are we witnessing a quiet return of prompt optimization/engineering? Now we have a 2-player game: users become “lazy prompters”, but the system prompts (e.g. thinking patterns) need to be highly optimized. Next: Bi-level optimization?

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Hao Sun - IRLxLLM

@holarissun

2 months ago

This is SCIENCE🚀!!!

thumb_up_off_alt8

chat_bubble_outline2

repeat0

shareShare

Hao Sun - IRLxLLM

@holarissun

a month ago

🚀 RL is powering breakthroughs in LLM alignment, reasoning, and agentic apps. Are you ready to dive into the RL x LLM frontier? Join us at ACL 2025 ACL’25 tutorial: Inverse RL Meets LLM Alignment this Sunday at Vienna🇦🇹(Jul 27th, 9am) 📄 Preprint at huggingface.co/papers/2507.13…

thumb_up_off_alt67

chat_bubble_outline0

repeat12

shareShare