Victoria Graf (@victoriawgraf) Twitter Tweets • TwiCopy

Victoria Graf

a year ago

Had a wonderful time at #NAACL2024 this week! Thanks to everyone who came to my oral presentation on defending LLMs against backdoor attacks!

thumb_up_off_alt9

chat_bubble_outline0

repeat0

shareShare

Meet Tülu 3 -- a set of state-of-the-art instruct models with fully open data, eval code, and training algorithms. We invented new methods for fine-tuning language models with RL and built upon best practices in the community to scale synthetic instruction and preference data.

thumb_up_off_alt532

chat_bubble_outline12

repeat130

shareShare

Victoria Graf

@victoriawgraf

a year ago

Super excited to release Tülu 3, a family of fully-open state-of-the-art post-trained models, including its data, eval, code, and training recipes in a comprehensive guide for post-training techniques! allenai.org/papers/tulu-3-…

thumb_up_off_alt8

chat_bubble_outline0

repeat1

shareShare

Ai2

@allen_ai

5 months ago

Introducing IFBench, a benchmark to measure how well AI models follow new, challenging, and diverse verifiable instructions. Top models like Gemini 2.5 Pro or Claude 4 Sonnet are only able to score up to 50%, presenting an open frontier for post-training. 🧵

thumb_up_off_alt314

chat_bubble_outline3

repeat49

shareShare

Nathan Lambert

@natolambert

5 months ago

This new benchmark created by Valentina Pyatkin should be the new default replacing IFEval. Some of the best frontier models get <50% and it comes with separate training prompts so people don’t effectively train on test. Wild gap from o3 to Gemini 2.5 pro of like 30 points.

thumb_up_off_alt198

chat_bubble_outline10

repeat22

shareShare

Victoria Graf

@victoriawgraf

5 months ago

Worried about overfitting to IFEval? 🤔 Use ✨IFBench✨ our new, challenging instruction-following benchmark! Loved working w/ Valentina Pyatkin! Personal highlight: our multi-turn eval setting makes it possible to isolate constraint-following from the rest of the instruction 🔍

thumb_up_off_alt48

chat_bubble_outline2

repeat13

shareShare

Scott Geng

@scottgeng00

5 months ago

🤔 How do we train AI models that surpass their teachers? 🚨 In #COLM2025: ✨Delta learning ✨makes LLM post-training cheap and easy – with only weak data, we beat open 8B SOTA 🤯 The secret? Learn from the *differences* in weak data pairs! 📜 arxiv.org/abs/2507.06187 🧵 below

thumb_up_off_alt159

chat_bubble_outline7

repeat46

shareShare

Victoria Graf

@victoriawgraf

5 months ago

A game-changer for post-training!

thumb_up_off_alt6

chat_bubble_outline0

repeat0

shareShare