Mohit
@mohit_r9a
brb looking at data
ID: 1459294994
26-05-2013 10:17:04
16 Tweet
68 Followers
228 Following
🤔 How do we train LLMs on real-world tasks where it’s hard to define a single verifiable answer? Our work at Scale AI introduces Rubrics as Rewards (RaR) — a framework for on-policy post-training that uses structured, checklist-style rubrics as interpretable reward signals. 🧵
Our team: MohammadHossein Rezaei (MohammadHossein Rezaei), Robert Vacareanu (Robert Vacareanu), Zihao Wang (Zihao Wang), Clinton Wang (Clinton Wang), Yunzhong He (Yunzhong), Feyza Akyürek (Afra Feyza Akyürek) Paper: arxiv.org/pdf/2510.07284