Kshitij Sachan (@sachankshitij) Twitter Tweets • TwiCopy

Ajeya Cotra

@ajeya_cotra

2 years ago

Do you know what GPT-4 can do? A game from Nicholas Carlini: nicholas.carlini.com/writing/llm-fo…

thumb_up_off_alt110

chat_bubble_outline6

repeat18

shareShare

We all need to join in a race for AI safety. In the coming weeks, Anthropic will share more specific plans concerning cybersecurity, red teaming, and responsible scaling, and we hope others will move forward swiftly as well. whitehouse.gov/briefing-room/…

thumb_up_off_alt362

chat_bubble_outline26

repeat68

shareShare

Sherry

@schrodingrsbrat

2 years ago

“I barely studied for this test” kids all grew up to be “please don’t mind but my apartment’s a mess” adults

thumb_up_off_alt21,21K

chat_bubble_outline87

repeat3,3K

shareShare

Fabien Roger

@fabiendroger

2 years ago

If powerful AIs could mess with our perceptions and lead us to provide incorrect training signals, we could lose control over the training process without even realizing it. We made the first empirical benchmark for this problem (arxiv.org/abs/2308.15605). 🧵 (1/6)

thumb_up_off_alt73

chat_bubble_outline3

repeat15

shareShare

Kshitij Sachan

@sachankshitij

2 years ago

This is surprisingly general! For example, you could design networks that literally “write” arbitrary info about their training data to memory (ie weights)

thumb_up_off_alt8

chat_bubble_outline0

repeat1

shareShare

Diego Zviovich

@dzviovich

2 years ago

Nassim Nicholas Taleb Now to see if BetaBinomialDistribution works.

<a href="/nntaleb/">Nassim Nicholas Taleb</a> Now to see if BetaBinomialDistribution works.

thumb_up_off_alt256

chat_bubble_outline12

repeat4

shareShare

Kshitij Sachan

@sachankshitij

2 years ago

I think AI control will be a crucial part of safely deploying ASL-4 level models and am excited about people doing followup research in this area!

thumb_up_off_alt8

chat_bubble_outline0

repeat1

shareShare

Nell Salzman

@nellbsalzman

2 years ago

Thank you for having me MSNBC. It was an honor to discuss a subject I care deeply about.

Thank you for having me <a href="/MSNBC/">MSNBC</a>. It was an honor to discuss a subject I care deeply about.

thumb_up_off_alt70

chat_bubble_outline2

repeat14

shareShare

akbir.

@akbirkhan

2 years ago

How can we check LLM outputs in domains where we are not experts? We find that non-expert humans answer questions better after reading debates between expert LLMs. Moreover, human judges are more accurate as experts get more persuasive. 📈 github.com/ucl-dark/llm_d…

thumb_up_off_alt282

chat_bubble_outline6

repeat59

shareShare

Emmanuel Ameisen

@mlpowered

a year ago

Claude 3 Opus is great at following multiple complex instructions. To test it, Erik Schluntz and I had it take on Andrej Karpathy's challenge to transform his 2h13m tokenizer video into a blog post, in ONE prompt, and it just... did it Here are some details:

thumb_up_off_alt1,1K

chat_bubble_outline48

repeat259

shareShare

Jesse Mu

@jayelmnop

a year ago

We’re hiring for the adversarial robustness team Anthropic! As an Alignment subteam, we're making a big effort on red-teaming, test-time monitoring, and adversarial training. If you’re interested in these areas, let us know! (emails in 🧵)

We’re hiring for the adversarial robustness team <a href="/AnthropicAI/">Anthropic</a>!

As an Alignment subteam, we're making a big effort on red-teaming, test-time monitoring, and adversarial training. If you’re interested in these areas, let us know! (emails in 🧵)

thumb_up_off_alt456

chat_bubble_outline4

repeat72

shareShare

Jason D. Clinton 🔸

@jasondclinton

a year ago

x.com/i/article/1772…

thumb_up_off_alt57

chat_bubble_outline0

repeat13

shareShare

Logan Graham

@logangraham

a year ago

I’m hiring ambitious Research Scientists at Anthropic to measure and prepare for models acting autonomously in the world. This is one of the most novel and difficult capabilities to measure, and critical for safety. Join the Frontier Red Team at Anthropic:

thumb_up_off_alt724

chat_bubble_outline12

repeat72

shareShare

Will Cathcart

@wcathcart

a year ago

Many have said this already, but worth repeating: this is not correct. We take security seriously and that's why we end-to-end encrypt your messages. They don't get sent to us every night or exported to us. If you do want to backup your messages, you can use your cloud provider

thumb_up_off_alt1,1K

chat_bubble_outline215

repeat114

shareShare

Kshitij Sachan

@sachankshitij

a year ago

I’m going to be honest, I had no idea what this meant

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Orowa Sikder

@orowasikder

a year ago

Enjoying Claude Artifacts? Want to build the next generation of Human-AI Interfaces? We're hiring an ML Lead for Artifacts. This is a unique full-stack role where you'll help co-develop new user interfaces along with the model capabilities which support them. You'll work as

thumb_up_off_alt135

chat_bubble_outline5

repeat21

shareShare

Jack Clark

@jackclarksf

a year ago

Here's a letter we sent to Governor Newsom about SB 1047. This isn't an endorsement but rather a view of the costs and benefits of the bill. cdn.sanity.io/files/4zrzovbb…

thumb_up_off_alt194

chat_bubble_outline13

repeat31

shareShare

Robert Heaton

@robjheaton

7 months ago

My team at Anthropic is hiring research engineers and scientists. We find out whether AI models possess critical, advanced capabilities and then help the world to prepare. We'd love to hear from you! robertheaton.com/anthropic/

thumb_up_off_alt119

chat_bubble_outline4

repeat15

shareShare