Aref Wardak (@arefwardak) 's Twitter Profile
Aref Wardak

@arefwardak

When I'm not battling my 3 boys at bedtime, I lead in-house Legal and People teams (WSGR, Mulesoft, Applied Intuition).Tweets are my own, ill-advised, opinions.

ID: 10033122

calendar_today07-11-2007 14:30:38

21 Tweet

160 Followers

1,1K Following

Brendan (can/do) (@brendanfoody) 's Twitter Profile Photo

GPT 5.4 is the best model we’ve ever tested on APEX-Agents. It’s also the first model to pass 50% mean score. A year ago, frontier models couldn’t even edit an Excel sheet and scored less than 5%. Now, in less than 3 months GPT 5.4 has improved by 15.7%. ChatGPT will imminently

GPT 5.4 is the best model we’ve ever tested on APEX-Agents. It’s also the first model to pass 50% mean score.

A year ago, frontier models couldn’t even edit an Excel sheet and scored less than 5%. Now, in less than 3 months GPT 5.4 has improved by 15.7%.

ChatGPT will imminently