
Brian Huang ✈️ ICLR
@brianryhuang
@windsurf_ai
prev research at MIT madrylab & @haizelabs
ID: 1584559309828014080
http://briteroses.github.io 24-10-2022 14:56:06
2,2K Tweet
2,2K Takipçi
1,1K Takip Edilen













Kaiwan Turel, awzf , and I were researching long horizon reasoning (with Jacob Andreas). We found existing benchmarks’ hard problems often featured tricky puzzles, not tests of system understanding. So we made Breakpoint: a SWE benchmark designed to disambiguate this capability.






