Sam Bowman (@sleepinyourhat) 's Twitter Profile
Sam Bowman

@sleepinyourhat

AI alignment + LLMs at NYU & Anthropic. Views not employers'. No relation to @s8mb. I think you should join @givingwhatwecan.

ID: 338526004

linkhttps://sleepinyourhat.github.io/ calendar_today19-07-2011 18:19:52

2,2K Tweet

36,36K Followers

3,3K Following

METR (@metr_evals) 's Twitter Profile Photo

How well can LLM agents complete diverse tasks compared to skilled humans? Our preliminary results indicate that our baseline agents based on several public models (Claude 3.5 Sonnet and GPT-4o) complete a proportion of tasks similar to what humans can do in ~30 minutes. 🧡

How well can LLM agents complete diverse tasks compared to skilled humans? Our preliminary results indicate that our baseline agents based on several public models (Claude 3.5 Sonnet and GPT-4o) complete a proportion of tasks similar to what humans can do in ~30 minutes. 🧡
Kelsey Piper (@kelseytuoc) 's Twitter Profile Photo

"I’m not sold that superhuman systems will do the right thing without better supervision than we can currently provide....There’s a low chance the current paradigm gets all the way there. The chance is still higher than I’m comfortable with." The most reasonable take imo.

Zhijing Jin (@zhijingjin) 's Twitter Profile Photo

Happy to announce that I'm joining as an Asst. Prof. in CS at UToronto U of T Department of Computer Science+Vector Institute in Fall '25, working on #NLProc, Causality, and AI Safety! I want to sincerely thank my dear mentors, friends, collabs & many who mean a lot to me. Welcome #PhDs/Research MSc to apply!

Ethan Perez (@ethanjperez) 's Twitter Profile Photo

My team built a system we think might be pretty jailbreak resistant, enough to offer up to $15k for a novel jailbreak. Come prove us wrong!

NYU Data Science (@nyudatascience) 's Twitter Profile Photo

CDS welcomes Eunsol Choi (Eunsol Choi) as an Assistant Professor of Computer Science (NYU Courant) and Data Science! Her research focuses on advancing how computers interpret human language in real-world contexts. nyudatascience.medium.com/meet-the-facul…

Saffron Huang (@saffronhuang) 's Twitter Profile Photo

Life update! I'm joining Anthropic's Societal Impacts team as a research scientist in September. I'll be shifting to a part-time role at Collective Intelligence Project, with the amazing Zarinah Agnew taking over as research director.

andy jones (@andy_l_jones) 's Twitter Profile Photo

Despite working on LLMs for going on four years now, Zed & Sonnet 3.5 is the first time I've found myself using a model all day every day for my work. There's some rubicon it crosses of 'smart enough model' and 'good enough UX' that everything I tried previously fell short on.

Christopher Potts (@chrisgpotts) 's Twitter Profile Photo

A short story of fast progress: NVIDIA released an β‰ˆ8B parameter model they called Megatron in 2019, and five years later they have released an β‰ˆ8B model they call Minitron. (I did round off an entire BERT-large for the 2019 model.)

Rob Wiblin (@robertwiblin) 's Twitter Profile Photo

I interview Anthropic co-founder Nicholas Joseph about the policy Anthropic uses to ensure their AI models never go rogue or cause a catastrophe, and whether it's good enough. Nick sees 3 big virtues to their 'responsible scaling policy' approach: 1. It allows us to set aside

Anthropic (@anthropicai) 's Twitter Profile Photo

Today, we're making Artifacts available for all Claude users. You can now also create and view Artifacts on the Claude iOS and Android apps. Since launching in preview in June, tens of millions of Artifacts have been created. But where did it all begin? Here's how we built it.

Jack Clark (@jackclarksf) 's Twitter Profile Photo

Looking forward to doing a pre-deployment test on our next model with the US AISI! Third-party testing is a really important part of the AI ecosystem and it's been amazing to see governments stand up safety institutes to facilitate this. nist.gov/news-events/ne…

Sasha Rush (@srush_nlp) 's Twitter Profile Photo

There are still a few tickets remaining for COLM-1 next month in Philly. Paper list is pretty incredible, and student tickets are only $300. We'd love to see you there. colmweb.org

There are still a few tickets remaining for COLM-1 next month in Philly. Paper list is pretty incredible, and student tickets are only $300. We'd love to see you there.

colmweb.org
Allan Dafoe (@allandafoe) 's Twitter Profile Photo

We are hiring! Google DeepMind's Frontier Safety and Governance team is dedicated to mitigating frontier AI risks; we work closely with technical safety, policy, responsibility, security, and GDM leadership. Please encourage great people to apply! 1/ boards.greenhouse.io/deepmind/jobs/…

Jide πŸ” (@jide_alaga) 's Twitter Profile Photo

Really loved this quote on RSPs from Sam Bowman's recent blog post. Highly recommend reading the whole post! sleepinyourhat.github.io/checklist/

Really loved this quote on RSPs from <a href="/sleepinyourhat/">Sam Bowman</a>'s recent blog post. Highly recommend reading the whole post!

sleepinyourhat.github.io/checklist/
mrinank πŸ’— (@mrinanksharma) 's Twitter Profile Photo

come and help us improve adversarial robustness of frontier LLMs at Anthropic as LLMs become more capable, robustness issues will pose larger misuse risks, but as carlini says, the academic community has made "limited progress" so far

Sam Bowman (@sleepinyourhat) 's Twitter Profile Photo

I'm honored to have been part of this and thrilled with how it turned out. I have minor quibbles with the statement, but the core ideas in it are quite important, and it's a huge deal to get buy-in on them from so many people in leadership positions in China and the West.