Mohit (@mohit_r9a) Twitter Tweets • TwiCopy

Mohit

@mohit_r9a

+ Follow

brb looking at data

ID: 1459294994

calendar_today26-05-2013 10:17:04

16 Tweet

68 Followers

228 Following

Anisha Gunjal

@anisha_gunjal

5 months ago

🤔 How do we train LLMs on real-world tasks where it’s hard to define a single verifiable answer? Our work at Scale AI introduces Rubrics as Rewards (RaR) — a framework for on-policy post-training that uses structured, checklist-style rubrics as interpretable reward signals. 🧵

🤔 How do we train LLMs on real-world tasks where it’s hard to define a single verifiable answer?

Our work at <a href="/scale_AI/">Scale AI</a> introduces Rubrics as Rewards (RaR) — a framework for on-policy post-training that uses structured, checklist-style rubrics as interpretable reward signals. 🧵

thumb_up_off_alt199

chat_bubble_outline5

repeat34

shareShare

Mohit

@mohit_r9a

4 months ago

Unfortunately, I had to miss out on attending in person in Vienna, but glad to see the recognition. We need more research on understanding data and posttraining of LLMs. Always a pleasure working with Alan and Junmo Kang

thumb_up_off_alt6

chat_bubble_outline0

repeat1

shareShare

Bing Liu

@vbingliu

2 months ago

Our team: MohammadHossein Rezaei (MohammadHossein Rezaei), Robert Vacareanu (Robert Vacareanu), Zihao Wang (Zihao Wang), Clinton Wang (Clinton Wang), Yunzhong He (Yunzhong), Feyza Akyürek (Afra Feyza Akyürek) Paper: arxiv.org/pdf/2510.07284

thumb_up_off_alt4

chat_bubble_outline0

repeat1

shareShare