david rein
@idavidrein
Sentio ergo sum. AI alignment research, early employee @cohere
ID: 1019763768631447552
19-07-2018 02:00:03
4,4K Tweet
2,2K Followers
1,1K Following
Is GPQA garbage? A couple weeks ago, typedfemale pointed out some mistakes in a GPQA question, so I figured this would be a good opportunity to discuss how we interpret benchmark scores, and what our goals should be when creating benchmarks.