@_saurabh : More than 550% of the reported reasoning abilities of LLMs might not be true reasoning. How do we evaluate models trained on the entire internet? I.e., what novel questions can we ask of something that has seen all written knowledge? Below: new eval, results, code, and paper. • TwiCopy

Saurabh Srivastava

@_saurabh

+ Follow

Building the next stage of AI @ Essential AI
Previously: 2x YC (W15, S18); PhD + Postdoc in Program Synthesis

ID: 17696643

calendar_today28-11-2008 02:32:24

171 Tweet

929 Takipçi

721 Takip Edilen

Saurabh Srivastava

@_saurabh

2 years ago

More than 50% of the reported reasoning abilities of LLMs might not be true reasoning. How do we evaluate models trained on the entire internet? I.e., what novel questions can we ask of something that has seen all written knowledge? Below: new eval, results, code, and paper.

thumb_up_off_alt1,1K

chat_bubble_outline45

repeat228

shareShare