Kshitij Sachan (@sachankshitij) 's Twitter Profile
Kshitij Sachan

@sachankshitij

beep boop at @AnthropicAI

ID: 3225684992

calendar_today24-05-2015 23:48:01

133 Tweet

307 Followers

543 Following

Anthropic (@anthropicai) 's Twitter Profile Photo

We all need to join in a race for AI safety.  In the coming weeks, Anthropic will share more specific plans concerning cybersecurity, red teaming, and responsible scaling, and we hope others will move forward swiftly as well. whitehouse.gov/briefing-room/…

Fabien Roger (@fabiendroger) 's Twitter Profile Photo

If powerful AIs could mess with our perceptions and lead us to provide incorrect training signals, we could lose control over the training process without even realizing it. We made the first empirical benchmark for this problem (arxiv.org/abs/2308.15605). 🧵 (1/6)

Kshitij Sachan (@sachankshitij) 's Twitter Profile Photo

This is surprisingly general! For example, you could design networks that literally “write” arbitrary info about their training data to memory (ie weights)

Kshitij Sachan (@sachankshitij) 's Twitter Profile Photo

I think AI control will be a crucial part of safely deploying ASL-4 level models and am excited about people doing followup research in this area!

akbir. (@akbirkhan) 's Twitter Profile Photo

How can we check LLM outputs in domains where we are not experts? We find that non-expert humans answer questions better after reading debates between expert LLMs. Moreover, human judges are more accurate as experts get more persuasive. 📈 github.com/ucl-dark/llm_d…

How can we check LLM outputs in domains where we are not experts?

We find that non-expert humans answer questions better after reading debates between expert LLMs.
Moreover, human judges are more accurate as experts get more persuasive. 📈
github.com/ucl-dark/llm_d…
Emmanuel Ameisen (@mlpowered) 's Twitter Profile Photo

Claude 3 Opus is great at following multiple complex instructions. To test it, Erik Schluntz and I had it take on Andrej Karpathy's challenge to transform his 2h13m tokenizer video into a blog post, in ONE prompt, and it just... did it Here are some details:

Jesse Mu (@jayelmnop) 's Twitter Profile Photo

We’re hiring for the adversarial robustness team Anthropic! As an Alignment subteam, we're making a big effort on red-teaming, test-time monitoring, and adversarial training. If you’re interested in these areas, let us know! (emails in 🧵)

We’re hiring for the adversarial robustness team <a href="/AnthropicAI/">Anthropic</a>!

As an Alignment subteam, we're making a big effort on red-teaming, test-time monitoring, and adversarial training. If you’re interested in these areas, let us know! (emails in 🧵)
Logan Graham (@logangraham) 's Twitter Profile Photo

I’m hiring ambitious Research Scientists at Anthropic to measure and prepare for models acting autonomously in the world. This is one of the most novel and difficult capabilities to measure, and critical for safety. Join the Frontier Red Team at Anthropic:

Will Cathcart (@wcathcart) 's Twitter Profile Photo

Many have said this already, but worth repeating: this is not correct. We take security seriously and that's why we end-to-end encrypt your messages. They don't get sent to us every night or exported to us. If you do want to backup your messages, you can use your cloud provider

Orowa Sikder (@orowasikder) 's Twitter Profile Photo

Enjoying Claude Artifacts? Want to build the next generation of Human-AI Interfaces? We're hiring an ML Lead for Artifacts. This is a unique full-stack role where you'll help co-develop new user interfaces along with the model capabilities which support them. You'll work as

Jack Clark (@jackclarksf) 's Twitter Profile Photo

Here's a letter we sent to Governor Newsom about SB 1047. This isn't an endorsement but rather a view of the costs and benefits of the bill. cdn.sanity.io/files/4zrzovbb…

Robert Heaton (@robjheaton) 's Twitter Profile Photo

My team at Anthropic is hiring research engineers and scientists. We find out whether AI models possess critical, advanced capabilities and then help the world to prepare. We'd love to hear from you! robertheaton.com/anthropic/