Yue Wu (@frankyuewu1) Twitter Tweets • TwiCopy

Yue Wu

@frankyuewu1

+ Follow

Post-training @xAI | Prev. Postdoc @Princeton, CS PhD @UCLA. BSc @PKU1898.

ID: 1187500436678266880

linkhttp://yuewu.us calendar_today24-10-2019 22:45:47

32 Tweet

569 Followers

433 Following

Quanquan Gu

@quanquangu

2 years ago

We've open-sourced the code and models for Self-Play Preference Optimization (SPPO)! 🚀🚀🚀 ⭐ code: github.com/uclaml/SPPO 🤗models: huggingface.co/collections/UC…

thumb_up_off_alt311

chat_bubble_outline6

repeat71

shareShare

1/8 ⭐General Preference Modeling with Preference Representations for Aligning Language Models⭐ arxiv.org/abs/2410.02197 As Huggingface Daily Papers: huggingface.co/papers/2410.02… We just dropped our latest research on General Preference Modeling (GPM)! 🚀

thumb_up_off_alt47

chat_bubble_outline4

repeat16

shareShare

Quanquan Gu

@quanquangu

2 years ago

1/n 🚀 Introducing General Preference representation Model (GPM) and General Preference Optimization (GPO) for RLHF! 🎯 Reward modeling plays a central role in RLHF. Most existing reward models are based on the classical Bradley-Terry (BT) reward model. However, the BT model has

thumb_up_off_alt291

chat_bubble_outline4

repeat54

shareShare

Tianle Cai @ ICLR 2025🇸🇬

@tianle_cai

a year ago

Excited to see Meta's new paper! Been pondering this exact idea for the past 6 months but never had the resources to verify it properly - and now it's confirmed! Here are some thoughts and ideas I've been exploring that weren't covered in the paper: The key insight, from my

thumb_up_off_alt548

chat_bubble_outline8

repeat60

shareShare

Leqi Liu

@leqi_liu

a year ago

Why are there synchronized ups and downs for chosen and rejected log-probs during DPO (and *POs: IPO, SimPO, CPO, R-DPO, DPOP, RRHF, SlicHF) training? Why do chosen logps decrease, and rejected logps sometimes increase? Our answer: Gradient Entanglement! arxiv.org/abs/2410.13828

thumb_up_off_alt11

chat_bubble_outline1

repeat3

shareShare

Jonathan @ICLR

@jonathanmlight

a year ago

Want to take your LLM inference scaling to the next level? Our ICLR paper explores how optimization methods such as ACO and multi-start initialization can help enhance LLM generations. Check it out! 📜 Arxiv: arxiv.org/abs/2411.05010 🖇️ Website: codespace-optimization.github.io 🧵1/n

thumb_up_off_alt7

chat_bubble_outline1

repeat1

shareShare

Zhiqing Sun

@edwardsun0909

a year ago

We’re rolling out Deep Research to Plus users today! Deep Research was the biggest “Feel The AGI” moment I’ve ever had since ChatGPT. And I’m glad more people will experience their first AGI moment! The team also worked super hard to make more tools including image citations /

thumb_up_off_alt498

chat_bubble_outline26

repeat29

shareShare

Zhiqing Sun

@edwardsun0909

a year ago

Excited to present with Isa Fulford tonight at the OpenAI Forum, introducing the research behind Deep Research! Join us at 6pm PT to explore how this new agentic capability in ChatGPT works. Register here:

Excited to present with <a href="/isafulf/">Isa Fulford</a> tonight at the OpenAI Forum, introducing the research behind Deep Research!

Join us at 6pm PT to explore how this new agentic capability in ChatGPT works. Register here:

thumb_up_off_alt85

chat_bubble_outline1

repeat6

shareShare

Kaixuan Huang

@kaixuanhuang1

a year ago

Just tested Llama4-Scout on our MATH-Perturb benchmark. There is a surprising 18% gap between Original and MATH-P-Simple, making it unique among the 20+ models that came out after 2024. 😂😂 🔗Leaderboard available at math-perturb.github.io. x.com/KaixuanHuang1/…