Ai2 (@allen_ai) 's Twitter Profile
Ai2

@allen_ai

Breakthrough AI to solve the world's biggest problems.

› Join us: allenai.org/careers
› Newsletter: tinyurl.com/3vc2r2m8

ID: 3442793834

linkhttp://allenai.org calendar_today04-09-2015 01:21:25

2,2K Tweet

69,69K Takipçi

394 Takip Edilen

Ai2 (@allen_ai) 's Twitter Profile Photo

Introducing IFBench, a benchmark to measure how well AI models follow new, challenging, and diverse verifiable instructions. Top models like Gemini 2.5 Pro or Claude 4 Sonnet are only able to score up to 50%, presenting an open frontier for post-training. 🧵

Introducing IFBench, a benchmark to measure how well AI models follow new, challenging, and diverse verifiable instructions. Top models like Gemini 2.5 Pro or Claude 4 Sonnet are only able to score up to 50%, presenting an open frontier for post-training. 🧵