Ansh Radhakrishnan (@anshrad) Twitter Tweets • TwiCopy

Ansh Radhakrishnan

@anshrad

+ Follow

Researcher @AnthropicAI

ID: 1494503784004800517

calendar_today18-02-2022 02:49:44

44 Tweet

551 Followers

2,2K Following

david rein

10 months ago

🧵Announcing GPQA, a graduate-level “Google-proof” Q&A benchmark designed for scalable oversight! w/ Julian Michael, Sam Bowman GPQA is a dataset of *really hard* questions that PhDs with full access to Google can’t answer. Paper: arxiv.org/abs/2311.12022

🧵Announcing GPQA, a graduate-level “Google-proof” Q&A benchmark designed for scalable oversight! w/ <a href="/_julianmichael_/">Julian Michael</a>, <a href="/sleepinyourhat/">Sam Bowman</a>

GPQA is a dataset of *really hard* questions that PhDs with full access to Google can’t answer.

Paper: arxiv.org/abs/2311.12022

thumb_up_off_alt889

chat_bubble_outline23

Sam Bowman

@sleepinyourhat

10 months ago

🚨New dataset for LLM/scalable oversight evaluations! 🚨 This has been one of the big central efforts of my NYU lab over the last year, and I’m really exited to start using it.

thumb_up_off_alt142

chat_bubble_outline2

Sam Bowman

@sleepinyourhat

10 months ago

If you'll be at #NeurIPS2023 and you're interested in chatting with someone at Anthropic about research or roles, there'll be a few people of us around. Expression of interest form here: docs.google.com/forms/d/e/1FAI…

If you'll be at #NeurIPS2023 and you're interested in chatting with someone at Anthropic about research or roles, there'll be a few people of us around.

Expression of interest form here: docs.google.com/forms/d/e/1FAI…

thumb_up_off_alt200

chat_bubble_outline2

Buck Shlegeris

9 months ago

New paper! We design and test safety techniques that prevent models from causing bad outcomes even if the models collude to subvert them. We think that this approach is the most promising available strategy for minimizing risk from deceptively aligned models. 🧵

New paper! We design and test safety techniques that prevent models from causing bad outcomes even if the models collude to subvert them. We think that this approach is the most promising available strategy for minimizing risk from deceptively aligned models. 🧵

thumb_up_off_alt139

chat_bubble_outline3

Sam Bowman

@sleepinyourhat

8 months ago

I'm hiring research engineers for several alignment/technical safety teams at Anthropic!

I'm hiring research engineers for several alignment/technical safety teams at Anthropic!

thumb_up_off_alt689

chat_bubble_outline22

Anthropic

7 months ago

Today, we're announcing Claude 3, our next generation of AI models. The three state-of-the-art models—Claude 3 Opus, Claude 3 Sonnet, and Claude 3 Haiku—set new industry benchmarks across reasoning, math, coding, multilingual understanding, and vision.

Today, we're announcing Claude 3, our next generation of AI models.

The three state-of-the-art models—Claude 3 Opus, Claude 3 Sonnet, and Claude 3 Haiku—set new industry benchmarks across reasoning, math, coding, multilingual understanding, and vision.

thumb_up_off_alt9,9K

chat_bubble_outline571

Sam Bowman

@sleepinyourhat

7 months ago

Claude 3 is out, and tops out at 59.5 (or 50.4 zero-shot) on GPQA.

thumb_up_off_alt117

chat_bubble_outline3

Jesse Mu

6 months ago

We’re hiring for the adversarial robustness team Anthropic! As an Alignment subteam, we're making a big effort on red-teaming, test-time monitoring, and adversarial training. If you’re interested in these areas, let us know! (emails in 🧵)

We’re hiring for the adversarial robustness team <a href="/AnthropicAI/">Anthropic</a>!

As an Alignment subteam, we're making a big effort on red-teaming, test-time monitoring, and adversarial training. If you’re interested in these areas, let us know! (emails in 🧵)

thumb_up_off_alt462

chat_bubble_outline4

Ethan Perez

6 months ago

Come join our team! We're trying to make LLMs unjailbreakable, or clearly demonstrate it's not possible. More in this 🧵 on what we're up to

thumb_up_off_alt66

chat_bubble_outline0

Tristan Hume

6 months ago

Here's Claude 3 Haiku running at >200 tokens/s (>2x as fast as prod)! We've been working on capacity optimizations but we can have fun testing those as speed optimizations via overly-costly low batch size. Come work with me at Anthropic on things like this, more info in thread 🧵

thumb_up_off_alt429

chat_bubble_outline10

Jan Leike

4 months ago

I'm excited to join Anthropic to continue the superalignment mission! My new team will work on scalable oversight, weak-to-strong generalization, and automated alignment research. If you're interested in joining, my dms are open.

thumb_up_off_alt8,8K

chat_bubble_outline371

Sam Bowman

@sleepinyourhat

4 months ago

✨🪩 Woo! 🪩✨ Jan's led some seminally important work on technical AI safety and I'm thrilled to be working with him! We'll be leading twin teams aimed at different parts of the problem of aligning AI systems at human level and beyond.

thumb_up_off_alt248

chat_bubble_outline2

Ethan Perez

4 months ago

Welcome!! My team and I will be joining Jan's new, larger team, to help spin up a new push on these areas of alignment. Come join us!

thumb_up_off_alt214

chat_bubble_outline0

Buck Shlegeris

3 months ago

ARC-AGI’s been hyped over the last week as a benchmark that LLMs can’t solve. This claim triggered my dear coworker Ryan Greenblatt so he spent the last week trying to solve it with LLMs. Ryan gets 71% accuracy on a set of examples where humans get 85%; this is SOTA.

ARC-AGI’s been hyped over the last week as a benchmark that LLMs can’t solve. This claim triggered my dear coworker Ryan Greenblatt so he spent the last week trying to solve it with LLMs. Ryan gets 71% accuracy on a set of examples where humans get 85%; this is SOTA.

thumb_up_off_alt1,1K

chat_bubble_outline46

Jan Leike

3 months ago

Very exciting that this is out now (from my time at OpenAI): We trained an LLM critic to find bugs in code, and this helps humans find flaws on real-world production tasks that they would have missed otherwise. A promising sign for scalable oversight! openai.com/index/finding-…

Very exciting that this is out now (from my time at OpenAI):

We trained an LLM critic to find bugs in code, and this helps humans find flaws on real-world production tasks that they would have missed otherwise.

A promising sign for scalable oversight!

openai.com/index/finding-…

thumb_up_off_alt1,1K

chat_bubble_outline21

Nat McAleese

3 months ago

As AI improves humans will need more and more help to monitor and control it. So my team at OpenAI have trained an AI that helps humans to evaluate AI! (1/5)

thumb_up_off_alt347

chat_bubble_outline16

Dan Valentine

@danvalentine256

2 months ago

We won a Best Paper award for our Debate paper at #ICML2024! What an amazing group of co-authors, it's been so great to work with them on this over the past year. ❤️ akbir. John Hughes Laura Ruis Tim Rocktäschel Kshitij Sachan Ansh Radhakrishnan Edward Grefenstette Sam Bowman Ethan Perez

We won a Best Paper award for our Debate paper at #ICML2024! What an amazing group of co-authors, it's been so great to work with them on this over the past year. ❤️

<a href="/akbirkhan/">akbir.</a> <a href="/McHughes288/">John Hughes</a> <a href="/LauraRuis/">Laura Ruis</a> <a href="/_rockt/">Tim Rocktäschel</a> <a href="/SachanKshitij/">Kshitij Sachan</a> <a href="/anshrad/">Ansh Radhakrishnan</a> <a href="/egrefen/">Edward Grefenstette</a> <a href="/sleepinyourhat/">Sam Bowman</a> <a href="/EthanJPerez/">Ethan Perez</a>

thumb_up_off_alt78

chat_bubble_outline1

Charlie George

a month ago

1/ Can GPT-3.5 supervise GPT-4o debates on hard closed QA tasks? We find some early results that suggest yes!

thumb_up_off_alt24

chat_bubble_outline1

Sam Bowman

@sleepinyourhat

19 days ago

A big part of my job these days is to think about what technical work Anthropic needs to do to make things go well with the development of very powerful AI. I digested my thinking on this, plus some of the Anthropic zeitgeist around it, into this piece: sleepinyourhat.github.io/checklist/

A big part of my job these days is to think about what technical work Anthropic needs to do to make things go well with the development of very powerful AI.

I digested my thinking on this, plus some of the Anthropic zeitgeist around it, into this piece:
sleepinyourhat.github.io/checklist/

thumb_up_off_alt308

chat_bubble_outline11