Yue Wu (@frankyuewu1) 's Twitter Profile
Yue Wu

@frankyuewu1

Post-training @xAI | Prev. Postdoc @Princeton, CS PhD @UCLA. BSc @PKU1898.

ID: 1187500436678266880

linkhttp://yuewu.us calendar_today24-10-2019 22:45:47

32 Tweet

569 Followers

433 Following

Quanquan Gu (@quanquangu) 's Twitter Profile Photo

We've open-sourced the code and models for Self-Play Preference Optimization (SPPO)! 🚀🚀🚀 ⭐ code: github.com/uclaml/SPPO 🤗models: huggingface.co/collections/UC…

We've open-sourced the code and models for Self-Play Preference Optimization (SPPO)! 🚀🚀🚀
 ⭐ code: github.com/uclaml/SPPO
🤗models: huggingface.co/collections/UC…
Yifan Zhang (@yifan_zhang_) 's Twitter Profile Photo

1/8 ⭐General Preference Modeling with Preference Representations for Aligning Language Models⭐ arxiv.org/abs/2410.02197 As Huggingface Daily Papers: huggingface.co/papers/2410.02… We just dropped our latest research on General Preference Modeling (GPM)! 🚀

Quanquan Gu (@quanquangu) 's Twitter Profile Photo

1/n 🚀 Introducing General Preference representation Model (GPM) and General Preference Optimization (GPO) for RLHF! 🎯 Reward modeling plays a central role in RLHF. Most existing reward models are based on the classical Bradley-Terry (BT) reward model. However, the BT model has

1/n 🚀 Introducing General Preference representation Model (GPM) and General Preference Optimization (GPO) for RLHF! 🎯

Reward modeling plays a central role in RLHF. Most existing reward models are based on the classical Bradley-Terry (BT) reward model. However, the BT model has
Tianle Cai @ ICLR 2025🇸🇬 (@tianle_cai) 's Twitter Profile Photo

Excited to see Meta's new paper! Been pondering this exact idea for the past 6 months but never had the resources to verify it properly - and now it's confirmed! Here are some thoughts and ideas I've been exploring that weren't covered in the paper: The key insight, from my

Excited to see Meta's new paper! Been pondering this exact idea for the past 6 months but never had the resources to verify it properly - and now it's confirmed! Here are some thoughts and ideas I've been exploring that weren't covered in the paper:
The key insight, from my
Leqi Liu (@leqi_liu) 's Twitter Profile Photo

Why are there synchronized ups and downs for chosen and rejected log-probs during DPO (and *POs: IPO, SimPO, CPO, R-DPO, DPOP, RRHF, SlicHF) training? Why do chosen logps decrease, and rejected logps sometimes increase? Our answer: Gradient Entanglement! arxiv.org/abs/2410.13828

Why are there synchronized ups and downs for chosen and rejected log-probs during DPO (and *POs: IPO, SimPO, CPO, R-DPO, DPOP, RRHF, SlicHF) training? Why do chosen logps decrease, and rejected logps sometimes increase?

Our answer: Gradient Entanglement! arxiv.org/abs/2410.13828
Jonathan @ICLR (@jonathanmlight) 's Twitter Profile Photo

Want to take your LLM inference scaling to the next level? Our ICLR paper explores how optimization methods such as ACO and multi-start initialization can help enhance LLM generations. Check it out! 📜 Arxiv: arxiv.org/abs/2411.05010 🖇️ Website: codespace-optimization.github.io 🧵1/n

Want to take your LLM inference scaling to the next level? Our ICLR paper explores how optimization methods such as ACO and multi-start initialization can help enhance LLM generations. Check it out!

📜 Arxiv: arxiv.org/abs/2411.05010
🖇️ Website: codespace-optimization.github.io

🧵1/n
Zhiqing Sun (@edwardsun0909) 's Twitter Profile Photo

We’re rolling out Deep Research to Plus users today! Deep Research was the biggest “Feel The AGI” moment I’ve ever had since ChatGPT. And I’m glad more people will experience their first AGI moment! The team also worked super hard to make more tools including image citations /

Zhiqing Sun (@edwardsun0909) 's Twitter Profile Photo

Excited to present with Isa Fulford tonight at the OpenAI Forum, introducing the research behind Deep Research! Join us at 6pm PT to explore how this new agentic capability in ChatGPT works. Register here:

Excited to present with <a href="/isafulf/">Isa Fulford</a> tonight at the OpenAI Forum, introducing the research behind Deep Research!

Join us at 6pm PT to explore how this new agentic capability in ChatGPT works. Register here:
Kaixuan Huang (@kaixuanhuang1) 's Twitter Profile Photo

Just tested Llama4-Scout on our MATH-Perturb benchmark. There is a surprising 18% gap between Original and MATH-P-Simple, making it unique among the 20+ models that came out after 2024. 😂😂 🔗Leaderboard available at math-perturb.github.io. x.com/KaixuanHuang1/…

Just tested Llama4-Scout on our MATH-Perturb benchmark. There is a surprising 18% gap between Original and MATH-P-Simple, making it unique among the 20+ models that came out after 2024. 😂😂 

🔗Leaderboard available at math-perturb.github.io. 

x.com/KaixuanHuang1/…