Roger Grosse (@rogergrosse) 's Twitter Profile
Roger Grosse

@rogergrosse

ID: 3301643341

calendar_today30-07-2015 14:30:37

1,1K Tweet

11,11K Takipçi

798 Takip Edilen

Schwartz Reisman Institute (@torontosri) 's Twitter Profile Photo

What is "safe" AI? Why is it difficult to achieve? Can LLMs be hacked? Are the existential risks of advanced AI exaggerated—or justified? Join us next week on Sept. 10 to hear from AI experts Karina Vold,Roger Grosse,Sedef Akinli Kocak, and Sheila McIlraith. 🔗 uoft.me/aLB

What is "safe" AI? Why is it difficult to achieve? Can LLMs be hacked? Are the existential risks of advanced AI exaggerated—or justified?  

Join us next week on Sept. 10 to hear from AI experts <a href="/karinavold/">Karina Vold</a>,<a href="/RogerGrosse/">Roger Grosse</a>,<a href="/sedak99/">Sedef Akinli Kocak</a>, and <a href="/SheilaMcIlraith/">Sheila McIlraith</a>.

🔗 uoft.me/aLB
Roger Grosse (@rogergrosse) 's Twitter Profile Photo

Is there a good history of how academic AI research came to be so focused on hill-climbing on benchmarks? Any classic essays championing this as a good way to make progress?

Roger Grosse (@rogergrosse) 's Twitter Profile Photo

AI benchmarks are most useful for judging the aggregate progress of a subfield, and hence for evaluating its methodologies. And if there's one methodological lesson they've taught us, it's that the progress is made by people who aren't hill-climbing on the benchmarks!

Anthropic (@anthropicai) 's Twitter Profile Photo

New Anthropic research: Sabotage evaluations for frontier models How well could AI models mislead us, or secretly sabotage tasks, if they were trying to? Read our paper and blog post here: anthropic.com/research/sabot…

New Anthropic research: Sabotage evaluations for frontier models

How well could AI models mislead us, or secretly sabotage tasks, if they were trying to?

Read our paper and blog post here: anthropic.com/research/sabot…
Transluce (@transluceai) 's Twitter Profile Photo

Announcing Transluce, a nonprofit research lab building open source, scalable technology for understanding AI systems and steering them in the public interest. Read a letter from the co-founders Jacob Steinhardt and Sarah Schwettmann: transluce.org/introducing-tr…

Marius Hobbhahn (@mariushobbhahn) 's Twitter Profile Photo

xAI is hiring for AI safety engineers: boards.greenhouse.io/xai/jobs/45317… Their safety agenda isn't public, so I can't judge it. However, joining as a fairly early employee could be highly impactful.

Colin Raffel (@colinraffel) 's Twitter Profile Photo

Relu (yes, that ReLU), the superstar sysadmin who built the GPU servers behind AlexNet and many subsequent years of deep learning advances at U of Toronto, has left 😭 We are hiring his replacement; if you feel you could fill (or grow to fill) his big shoes, please apply!

Jan Leike (@janleike) 's Twitter Profile Photo

Apply to join the Anthropic Fellows Program! This is an exceptional opportunity to join AI safety research, collaborating with leading researchers on one of the world's most pressing problems. 👇 alignment.anthropic.com/2024/anthropic…

Marius Hobbhahn (@mariushobbhahn) 's Twitter Profile Photo

Oh man :( We tried really hard to neither over- nor underclaim the results in our communication, but, predictably, some people drastically overclaimed them, and then based on that, others concluded that there was nothing to be seen here (see examples in thread). So, let me try

Ryan Greenblatt (@ryanpgreenblatt) 's Twitter Profile Photo

New Redwood Research (Redwood Research) paper in collaboration with Anthropic: We demonstrate cases where Claude fakes alignment when it strongly dislikes what it is being trained to do. (Thread)

Owain Evans (@owainevans_uk) 's Twitter Profile Photo

New paper: We train LLMs on a particular behavior, e.g. always choosing risky options in economic decisions. They can *describe* their new behavior, despite no explicit mentions in the training data. So LLMs have a form of intuitive self-awareness 🧵

New paper:
We train LLMs on a particular behavior, e.g. always choosing risky options in economic decisions.
They can *describe* their new behavior, despite no explicit mentions in the training data.
So LLMs have a form of intuitive self-awareness 🧵