(((ل()(ل() 'yoav))))👾 (@yoavgo) 's Twitter Profile
(((ل()(ل() 'yoav))))👾

@yoavgo

ID: 39547749

linkhttp://www.cs.biu.ac.il/~yogo/ calendar_today12-05-2009 17:26:34

133,133K Tweet

59,59K Takipçi

2,2K Takip Edilen

(((ل()(ل() 'yoav))))👾 (@yoavgo) 's Twitter Profile Photo

(a) how did MMLU become the defacto standard benchmark every LLM is trying to beat? (b) it is estimated to contain 9% questions that human experts think are wrong. do we know if humans and models agree on which ones belong in this 9%?