Ansh Radhakrishnan (@anshrad) 's Twitter Profile
Ansh Radhakrishnan

@anshrad

Researcher @AnthropicAI

ID: 1494503784004800517

calendar_today18-02-2022 02:49:44

44 Tweet

551 Followers

2,2K Following

david rein (@idavidrein) 's Twitter Profile Photo

đŸ§”Announcing GPQA, a graduate-level “Google-proof” Q&A benchmark designed for scalable oversight! w/ Julian Michael, Sam Bowman GPQA is a dataset of *really hard* questions that PhDs with full access to Google can’t answer. Paper: arxiv.org/abs/2311.12022

đŸ§”Announcing GPQA, a graduate-level “Google-proof” Q&amp;A benchmark designed for scalable oversight! w/ <a href="/_julianmichael_/">Julian Michael</a>, <a href="/sleepinyourhat/">Sam Bowman</a>

GPQA is a dataset of *really hard* questions that PhDs with full access to Google can’t answer.

Paper: arxiv.org/abs/2311.12022
Sam Bowman (@sleepinyourhat) 's Twitter Profile Photo

🚹New dataset for LLM/scalable oversight evaluations! 🚹 This has been one of the big central efforts of my NYU lab over the last year, and I’m really exited to start using it.

Sam Bowman (@sleepinyourhat) 's Twitter Profile Photo

If you'll be at #NeurIPS2023 and you're interested in chatting with someone at Anthropic about research or roles, there'll be a few people of us around. Expression of interest form here: docs.google.com/forms/d/e/1FAI


If you'll be at #NeurIPS2023 and you're interested in chatting with someone at Anthropic about research or roles, there'll be a few people of us around.

Expression of interest form here: docs.google.com/forms/d/e/1FAI

Buck Shlegeris (@bshlgrs) 's Twitter Profile Photo

New paper! We design and test safety techniques that prevent models from causing bad outcomes even if the models collude to subvert them. We think that this approach is the most promising available strategy for minimizing risk from deceptively aligned models. đŸ§”

New paper! We design and test safety techniques that prevent models from causing bad outcomes even if the models collude to subvert them. We think that this approach is the most promising available strategy for minimizing risk from deceptively aligned models. đŸ§”
Anthropic (@anthropicai) 's Twitter Profile Photo

Today, we're announcing Claude 3, our next generation of AI models. The three state-of-the-art models—Claude 3 Opus, Claude 3 Sonnet, and Claude 3 Haiku—set new industry benchmarks across reasoning, math, coding, multilingual understanding, and vision.

Today, we're announcing Claude 3, our next generation of AI models. 

The three state-of-the-art models—Claude 3 Opus, Claude 3 Sonnet, and Claude 3 Haiku—set new industry benchmarks across reasoning, math, coding, multilingual understanding, and vision.
Jesse Mu (@jayelmnop) 's Twitter Profile Photo

We’re hiring for the adversarial robustness team Anthropic! As an Alignment subteam, we're making a big effort on red-teaming, test-time monitoring, and adversarial training. If you’re interested in these areas, let us know! (emails in đŸ§”)

We’re hiring for the adversarial robustness team <a href="/AnthropicAI/">Anthropic</a>!

As an Alignment subteam, we're making a big effort on red-teaming, test-time monitoring, and adversarial training. If you’re interested in these areas, let us know! (emails in đŸ§”)
Ethan Perez (@ethanjperez) 's Twitter Profile Photo

Come join our team! We're trying to make LLMs unjailbreakable, or clearly demonstrate it's not possible. More in this đŸ§” on what we're up to

Tristan Hume (@trishume) 's Twitter Profile Photo

Here's Claude 3 Haiku running at >200 tokens/s (>2x as fast as prod)! We've been working on capacity optimizations but we can have fun testing those as speed optimizations via overly-costly low batch size. Come work with me at Anthropic on things like this, more info in thread đŸ§”

Jan Leike (@janleike) 's Twitter Profile Photo

I'm excited to join Anthropic to continue the superalignment mission! My new team will work on scalable oversight, weak-to-strong generalization, and automated alignment research. If you're interested in joining, my dms are open.

Sam Bowman (@sleepinyourhat) 's Twitter Profile Photo

✹đŸȘ© Woo! đŸȘ©âœš Jan's led some seminally important work on technical AI safety and I'm thrilled to be working with him! We'll be leading twin teams aimed at different parts of the problem of aligning AI systems at human level and beyond.

Ethan Perez (@ethanjperez) 's Twitter Profile Photo

Welcome!! My team and I will be joining Jan's new, larger team, to help spin up a new push on these areas of alignment. Come join us!

Buck Shlegeris (@bshlgrs) 's Twitter Profile Photo

ARC-AGI’s been hyped over the last week as a benchmark that LLMs can’t solve. This claim triggered my dear coworker Ryan Greenblatt so he spent the last week trying to solve it with LLMs. Ryan gets 71% accuracy on a set of examples where humans get 85%; this is SOTA.

ARC-AGI’s been hyped over the last week as a benchmark that LLMs can’t solve. This claim triggered my dear coworker Ryan Greenblatt so he spent the last week trying to solve it with LLMs. Ryan gets 71% accuracy on a set of examples where humans get 85%; this is SOTA.
Jan Leike (@janleike) 's Twitter Profile Photo

Very exciting that this is out now (from my time at OpenAI): We trained an LLM critic to find bugs in code, and this helps humans find flaws on real-world production tasks that they would have missed otherwise. A promising sign for scalable oversight! openai.com/index/finding-


Very exciting that this is out now (from my time at OpenAI):

We trained an LLM critic to find bugs in code, and this helps humans find flaws on real-world production tasks that they would have missed otherwise.

A promising sign for scalable oversight!

openai.com/index/finding-

Nat McAleese (@__nmca__) 's Twitter Profile Photo

As AI improves humans will need more and more help to monitor and control it. So my team at OpenAI have trained an AI that helps humans to evaluate AI! (1/5)

Sam Bowman (@sleepinyourhat) 's Twitter Profile Photo

A big part of my job these days is to think about what technical work Anthropic needs to do to make things go well with the development of very powerful AI. I digested my thinking on this, plus some of the Anthropic zeitgeist around it, into this piece: sleepinyourhat.github.io/checklist/

A big part of my job these days is to think about what technical work Anthropic needs to do to make things go well with the development of very powerful AI.

I digested my thinking on this, plus some of the Anthropic zeitgeist around it, into this piece:
sleepinyourhat.github.io/checklist/