Mohit (@mohit_r9a) 's Twitter Profile
Mohit

@mohit_r9a

brb looking at data

ID: 1459294994

calendar_today26-05-2013 10:17:04

16 Tweet

68 Takipçi

228 Takip Edilen

Anisha Gunjal (@anisha_gunjal) 's Twitter Profile Photo

🤔 How do we train LLMs on real-world tasks where it’s hard to define a single verifiable answer? Our work at Scale AI introduces Rubrics as Rewards (RaR) — a framework for on-policy post-training that uses structured, checklist-style rubrics as interpretable reward signals. 🧵

🤔 How do we train LLMs on real-world tasks where it’s hard to define a single verifiable answer?

Our work at <a href="/scale_AI/">Scale AI</a> introduces Rubrics as Rewards (RaR) — a framework for on-policy post-training that uses structured, checklist-style rubrics as interpretable reward signals. 🧵
Mohit (@mohit_r9a) 's Twitter Profile Photo

Unfortunately, I had to miss out on attending in person in Vienna, but glad to see the recognition. We need more research on understanding data and posttraining of LLMs. Always a pleasure working with Alan and Junmo Kang