Miles Turpin
@milesaturpin
LLM safety research, SEAL team @scale_AI. Previously alignment research @nyuniversity, early employee @cohere
ID: 865609028579213312
http://milesturp.in/about 19-05-2017 16:44:09
371 Tweet
1,1K Followers
1,1K Following
Is GPQA garbage? A couple weeks ago, typedfemale pointed out some mistakes in a GPQA question, so I figured this would be a good opportunity to discuss how we interpret benchmark scores, and what our goals should be when creating benchmarks.