Sam Bowman (@sleepinyourhat) Twitter Tweets • TwiCopy

Sam Bowman

@sleepinyourhat

+ Follow

AI alignment + LLMs at NYU & Anthropic. Views not employers'. No relation to @s8mb. I think you should join @givingwhatwecan.

ID: 338526004

linkhttps://sleepinyourhat.github.io/ calendar_today19-07-2011 18:19:52

2,2K Tweet

36,36K Followers

3,3K Following

METR

2 months ago

How well can LLM agents complete diverse tasks compared to skilled humans? Our preliminary results indicate that our baseline agents based on several public models (Claude 3.5 Sonnet and GPT-4o) complete a proportion of tasks similar to what humans can do in ~30 minutes. 🧵

How well can LLM agents complete diverse tasks compared to skilled humans? Our preliminary results indicate that our baseline agents based on several public models (Claude 3.5 Sonnet and GPT-4o) complete a proportion of tasks similar to what humans can do in ~30 minutes. 🧵

thumb_up_off_alt435

chat_bubble_outline10

Sam Bowman

@sleepinyourhat

a month ago

👇 I think these anti-jailbreak measures will be quite strong. I'd love it for you to try proving me wrong!

thumb_up_off_alt38

chat_bubble_outline3

Kelsey Piper

a month ago

"I’m not sold that superhuman systems will do the right thing without better supervision than we can currently provide....There’s a low chance the current paradigm gets all the way there. The chance is still higher than I’m comfortable with." The most reasonable take imo.

thumb_up_off_alt57

chat_bubble_outline0

Zhijing Jin

a month ago

Happy to announce that I'm joining as an Asst. Prof. in CS at UToronto U of T Department of Computer Science+Vector Institute in Fall '25, working on #NLProc, Causality, and AI Safety! I want to sincerely thank my dear mentors, friends, collabs & many who mean a lot to me. Welcome #PhDs/Research MSc to apply!

thumb_up_off_alt654

chat_bubble_outline56

Ethan Perez

a month ago

My team built a system we think might be pretty jailbreak resistant, enough to offer up to $15k for a novel jailbreak. Come prove us wrong!

thumb_up_off_alt257

chat_bubble_outline20

NYU Data Science

@nyudatascience

a month ago

CDS welcomes Eunsol Choi (Eunsol Choi) as an Assistant Professor of Computer Science (NYU Courant) and Data Science! Her research focuses on advancing how computers interpret human language in real-world contexts. nyudatascience.medium.com/meet-the-facul…

thumb_up_off_alt184

chat_bubble_outline0

Saffron Huang

a month ago

Life update! I'm joining Anthropic's Societal Impacts team as a research scientist in September. I'll be shifting to a part-time role at Collective Intelligence Project, with the amazing Zarinah Agnew taking over as research director.

thumb_up_off_alt502

chat_bubble_outline31

andy jones

a month ago

Despite working on LLMs for going on four years now, Zed & Sonnet 3.5 is the first time I've found myself using a model all day every day for my work. There's some rubicon it crosses of 'smart enough model' and 'good enough UX' that everything I tried previously fell short on.

thumb_up_off_alt337

chat_bubble_outline11

Christopher Potts

a month ago

A short story of fast progress: NVIDIA released an ≈8B parameter model they called Megatron in 2019, and five years later they have released an ≈8B model they call Minitron. (I did round off an entire BERT-large for the 2019 model.)

thumb_up_off_alt85

chat_bubble_outline1

Rob Wiblin

a month ago

I interview Anthropic co-founder Nicholas Joseph about the policy Anthropic uses to ensure their AI models never go rogue or cause a catastrophe, and whether it's good enough. Nick sees 3 big virtues to their 'responsible scaling policy' approach: 1. It allows us to set aside

thumb_up_off_alt52

chat_bubble_outline2

Anthropic

a month ago

Today, we're making Artifacts available for all Claude users. You can now also create and view Artifacts on the Claude iOS and Android apps. Since launching in preview in June, tens of millions of Artifacts have been created. But where did it all begin? Here's how we built it.

thumb_up_off_alt2,2K

chat_bubble_outline122

Jack Clark

24 days ago

Looking forward to doing a pre-deployment test on our next model with the US AISI! Third-party testing is a really important part of the AI ecosystem and it's been amazing to see governments stand up safety institutes to facilitate this. nist.gov/news-events/ne…

thumb_up_off_alt158

chat_bubble_outline7

Sasha Rush

24 days ago

There are still a few tickets remaining for COLM-1 next month in Philly. Paper list is pretty incredible, and student tickets are only $300. We'd love to see you there. colmweb.org

There are still a few tickets remaining for COLM-1 next month in Philly. Paper list is pretty incredible, and student tickets are only $300. We'd love to see you there.

colmweb.org

thumb_up_off_alt94

chat_bubble_outline3

Allan Dafoe

19 days ago

We are hiring! Google DeepMind's Frontier Safety and Governance team is dedicated to mitigating frontier AI risks; we work closely with technical safety, policy, responsibility, security, and GDM leadership. Please encourage great people to apply! 1/ boards.greenhouse.io/deepmind/jobs/…

thumb_up_off_alt131

chat_bubble_outline2

Julian

19 days ago

if my name was sam bowman I would say (no relation) any time I introduce myself, in any circumstance

thumb_up_off_alt47

chat_bubble_outline4

Jide 🔍

17 days ago

Really loved this quote on RSPs from Sam Bowman's recent blog post. Highly recommend reading the whole post! sleepinyourhat.github.io/checklist/

Really loved this quote on RSPs from <a href="/sleepinyourhat/">Sam Bowman</a>'s recent blog post. Highly recommend reading the whole post!

sleepinyourhat.github.io/checklist/

thumb_up_off_alt11

chat_bubble_outline1

mrinank 💗

5 days ago

come and help us improve adversarial robustness of frontier LLMs at Anthropic as LLMs become more capable, robustness issues will pose larger misuse risks, but as carlini says, the academic community has made "limited progress" so far

thumb_up_off_alt48

chat_bubble_outline2

Sam Bowman

@sleepinyourhat

3 days ago

I'm honored to have been part of this and thrilled with how it turned out. I have minor quibbles with the statement, but the core ideas in it are quite important, and it's a huge deal to get buy-in on them from so many people in leadership positions in China and the West.

thumb_up_off_alt24

chat_bubble_outline2