Kevin Wei (he/they) (@kevinlwei) 's Twitter Profile
Kevin Wei (he/they)

@kevinlwei

Science of AI evaluations + U.S. AI policy @RANDCorporation | @Harvard_Law '26, @SchwarzmanOrg '23, @GTOMSCS '22 | Views mine only 🏳️‍🌈 🎉

ID: 1561550569

linkhttp://kevinlwei.com calendar_today01-07-2013 21:36:36

41 Tweet

958 Followers

1,1K Following

Institute for Law & AI (@law_ai_) 's Twitter Profile Photo

📢 Last Call for Applications! Apply by May 31 to join one of our three in-person events this summer: 📆 Summer Institute on Law and AI: July 11-15, Washington, DC-Area 📆 Workshop on Law-Following AI: August 6-8, Cambridge University, UK 📆 Cambridge Forum on Law and AI: August

Daniel Kang (@daniel_d_kang) 's Twitter Profile Photo

As AI agents near real-world use, how do we know what they can actually do? Reliable benchmarks are critical but agentic benchmarks are broken! Example: WebArena marks "45+8 minutes" on a duration calculation task as correct (real answer: "63 minutes"). Other benchmarks

EvalEval Coalition (@evaluatingevals) 's Twitter Profile Photo

🚨 AI Evals Crisis: Officially kicking off the Eval Science Workstream 🚨 We’re building a shared scientific foundation for evaluating AI systems, one that’s rigorous, open, and grounded in real-world & cross-disciplinary best practices👇 (1/2) evalevalai.com/research/2025/…

Michael Aird (@michael__aird) 's Twitter Profile Photo

🚀Come join my team at RAND! We’re looking for research leads, researchers, & project managers for our compute, US AI policy, Europe, & talent management teams. All teams have urgent, important work to do & broad options for the future. Some roles close July 27⏰

Janet Egan (@janet_e_egan) 's Twitter Profile Photo

Selling H20 (and potentially Blackwell?) chips to China gives up valuable leverage. Lennart Heim and I argue there's a smarter approach: let China access these chips remotely via the cloud. 1/

Selling H20 (and potentially Blackwell?) chips to China gives up valuable leverage. <a href="/ohlennart/">Lennart Heim</a> and I argue there's a smarter approach: let China access these chips remotely via the cloud. 1/
Kevin Wei (he/they) (@kevinlwei) 's Twitter Profile Photo

We wrote a paper last year about all the ways industry orgs could influence policy tl;dr: unsurprisingly, there's lots of places you could spend money to influence policy, and industry is massively outspending civil society orgs on AI arxiv.org/abs/2410.13042

Cozmin Ududec (@cududec) 's Twitter Profile Photo

Very excited that this systematic analysis is out! We found a bunch of failure modes, as well as interesting and surprising behaviours. Theres a lot more insight we can get from looking carefully at how models are solving evaluation tasks!