Zhanhui Zhou (@asapzzhou) 's Twitter Profile
Zhanhui Zhou

@asapzzhou

CS PhD @berkeley_ai

ID: 1961300478870130691

linkhttps://github.com/ZHZisZZ calendar_today29-08-2025 05:30:53

32 Tweet

554 Takipçi

46 Takip Edilen

Hao Wang (@mogiciantony) 's Twitter Profile Photo

Benchmarks are often easier to game than they look. We build BenchJack to audit benchmarks for hidden shortcuts and reward hacks — before they evaluate your agent. Now in preview. Fully open source, with support for auditing your own benchmarks too. github.com/benchjack/benc…

Benchmarks are often easier to game than they look.
We build BenchJack to audit benchmarks for hidden shortcuts and reward hacks — before they evaluate your agent.

Now in preview. Fully open source, with support for auditing your own benchmarks too.

github.com/benchjack/benc…
Zhe Ye (@0xlf_) 's Twitter Profile Photo

1/ 🧵Introducing: VeriSpecGen 🚀 Formal verification is a principled way to guarantee code correctness, but writing high-quality specifications remains expensive and expertise-intensive. What if LLMs could reliably synthesize intent-aligned formal specs directly from natural

1/ 🧵Introducing: VeriSpecGen 🚀

Formal verification is a principled way to guarantee code correctness, but writing high-quality specifications remains expensive and expertise-intensive. What if LLMs could reliably synthesize intent-aligned formal specs directly from natural