Sam Bowman(@sleepinyourhat) 's Twitter Profileg
Sam Bowman

@sleepinyourhat

AI alignment + LLMs at NYU & Anthropic. Views not employers'. No relation to @s8mb. I think you should join @givingwhatwecan.

ID:338526004

linkhttps://cims.nyu.edu/~sbowman/ calendar_today19-07-2011 18:19:52

2,2K Tweets

35,2K Followers

3,1K Following

Follow People
Sam Bowman(@sleepinyourhat) 's Twitter Profile Photo

I made a bet internally that we wouldn't have a million people engage with tweets about Claude being a bridge, but I'm pretty happy to be on track to lose that bet.

account_circle
Human-aligned AI Summer School(@humanalignedai) 's Twitter Profile Photo

Join us in Prague on July 17-20, 2024 for the 4th Human-aligned AI Summer School! We'll have researchers, students, and practitioners for four intensive days focused on the latest approaches to aligning AI systems with human values. You can apply now at humanaligned.ai!

Join us in Prague on July 17-20, 2024 for the 4th Human-aligned AI Summer School! We'll have researchers, students, and practitioners for four intensive days focused on the latest approaches to aligning AI systems with human values. You can apply now at humanaligned.ai!
account_circle
Sasha de Marigny(@sashadem) 's Twitter Profile Photo

The first ever detailed look inside a modern, production-grade large language model (in this case, Claude 3 Sonnet.)

account_circle
METR(@METR_Evals) 's Twitter Profile Photo

Over the last few months, we’ve increased our focus on developing evaluations for automated AI research and development, because we think this capability could be extraordinarily destabilizing if realized.

We are looking for ML engineers and researchers to help drive AI R&D

account_circle
METR(@METR_Evals) 's Twitter Profile Photo

We were very excited to see the publication of the Frontier Safety Framework from Google DeepMind! More companies sharing their concrete proposals for preparing for transformative capabilities from AI systems is great: it increases the concreteness of the options available to the

account_circle
andy jones(@andy_l_jones) 's Twitter Profile Photo

this is extremely cool
* activations & activation steering
* multi-tenant to keep costs for users down
* pip install-able
* actually taking a swing at public engineering infra 😍😍😍

account_circle
Tomek Korbak(@tomekkorbak) 's Twitter Profile Photo

If you're at , come see our poster (#129) tomorrow (Tuesday) at 10:45am to learn about the role human preferences play in making LLMs more sycophantic!

account_circle
Jason Wei(@_jasonwei) 's Twitter Profile Photo

Enjoyed this paper that plots emergent abilities with pretraining loss on the x-axis, which is actually a suggestion that Oriol Vinyals also made a few years back: arxiv.org/abs/2403.15796

The paper uses intermediate checkpoints to plot a variety of pretraining losses. For some

Enjoyed this paper that plots emergent abilities with pretraining loss on the x-axis, which is actually a suggestion that @OriolVinyalsML also made a few years back: arxiv.org/abs/2403.15796 The paper uses intermediate checkpoints to plot a variety of pretraining losses. For some
account_circle
Ajeya Cotra(@ajeya_cotra) 's Twitter Profile Photo

We just clarified eligibility criteria for our agent benchmarks RFP (openphilanthropy.org/rfp-llm-benchm…) and included a link to METR's task development resources (metr.github.io/autonomy-evals…) which many applicants may find helpful

account_circle
Sam Bowman(@sleepinyourhat) 's Twitter Profile Photo

This result is pretty clearly specific to the style of backdoor we're working with, and doesn't support broad claims like 'interpretability solves misalignment', but it's still surprisingly strong. Worth a look!

account_circle
david rein(@idavidrein) 's Twitter Profile Photo

I very distinctly remember while I was in the thick of it making GPQA telling Robert Long that “I knew the project was going to be ambitious/hard, but I didn’t appreciate what that actually meant”

In retrospect I probably still would’ve done it, but we basically had to restart the

account_circle
Owain Evans(@OwainEvans_UK) 's Twitter Profile Photo

Full lecture slides and reading list for Roger Grosse's class on AI Alignment are up:
alignment-w2024.notion.site

Full lecture slides and reading list for Roger Grosse's class on AI Alignment are up: alignment-w2024.notion.site
account_circle
David Krueger(@DavidSKrueger) 's Twitter Profile Photo

I’m super excited to release our 100+ page collaborative agenda - led by Usman Anwar - on “Foundational Challenges In Assuring Alignment and Safety of LLMs” alongside 35+ co-authors from NLP, ML, and AI Safety communities!

Some highlights below...

I’m super excited to release our 100+ page collaborative agenda - led by @usmananwar391 - on “Foundational Challenges In Assuring Alignment and Safety of LLMs” alongside 35+ co-authors from NLP, ML, and AI Safety communities! Some highlights below...
account_circle
Sasha Rush(@srush_nlp) 's Twitter Profile Photo

I like to think of myself as a researcher, but almost certainly the most valuable use of my time is writing US Visa letters.

account_circle
Cem Anil(@cem__anil) 's Twitter Profile Photo

One of our most crisp findings was that in-context learning usually follows simple power laws as a function of number of demonstrations.

We were surprised we didn’t find this stated explicitly in the literature.

Soliciting pointers: have we missed anything?

account_circle