david rein (@idavidrein) 's Twitter Profile
david rein

@idavidrein

Sentio ergo sum. AI alignment research, early employee @cohere

ID: 1019763768631447552

calendar_today19-07-2018 02:00:03

4,4K Tweet

2,2K Followers

1,1K Following

david rein (@idavidrein) 's Twitter Profile Photo

Is GPQA garbage? A couple weeks ago, typedfemale pointed out some mistakes in a GPQA question, so I figured this would be a good opportunity to discuss how we interpret benchmark scores, and what our goals should be when creating benchmarks.

Is GPQA garbage?

A couple weeks ago, <a href="/typedfemale/">typedfemale</a> pointed out some mistakes in a GPQA question, so I figured this would be a good opportunity to discuss how we interpret benchmark scores, and what our goals should be when creating benchmarks.