Roger Grosse (@rogergrosse) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

What is "safe" AI? Why is it difficult to achieve? Can LLMs be hacked? Are the existential risks of advanced AI exaggerated—or justified? Join us next week on Sept. 10 to hear from AI experts Karina Vold,Roger Grosse,Sedef Akinli Kocak, and Sheila McIlraith. 🔗 uoft.me/aLB

thumb_up_off_alt14

chat_bubble_outline2

repeat4

shareShare

Roger Grosse

@rogergrosse

10 months ago

These days, the number of citations your papers get is a nonmonotonic function of their usefulness.

thumb_up_off_alt25

chat_bubble_outline1

repeat0

shareShare

Roger Grosse

@rogergrosse

10 months ago

Is there a good history of how academic AI research came to be so focused on hill-climbing on benchmarks? Any classic essays championing this as a good way to make progress?

thumb_up_off_alt140

chat_bubble_outline21

repeat11

shareShare

Roger Grosse

@rogergrosse

10 months ago

AI benchmarks are most useful for judging the aggregate progress of a subfield, and hence for evaluating its methodologies. And if there's one methodological lesson they've taught us, it's that the progress is made by people who aren't hill-climbing on the benchmarks!

thumb_up_off_alt58

chat_bubble_outline4

repeat1

shareShare

Anthropic

@anthropicai

9 months ago

New Anthropic research: Sabotage evaluations for frontier models How well could AI models mislead us, or secretly sabotage tasks, if they were trying to? Read our paper and blog post here: anthropic.com/research/sabot…

thumb_up_off_alt978

chat_bubble_outline91

repeat156

shareShare

Transluce

@transluceai

9 months ago

Announcing Transluce, a nonprofit research lab building open source, scalable technology for understanding AI systems and steering them in the public interest. Read a letter from the co-founders Jacob Steinhardt and Sarah Schwettmann: transluce.org/introducing-tr…

thumb_up_off_alt703

chat_bubble_outline35

repeat148

shareShare

Marius Hobbhahn

@mariushobbhahn

8 months ago

xAI is hiring for AI safety engineers: boards.greenhouse.io/xai/jobs/45317… Their safety agenda isn't public, so I can't judge it. However, joining as a fairly early employee could be highly impactful.

thumb_up_off_alt251

chat_bubble_outline5

repeat22

shareShare

IHPST | University of Toronto

@uoft_ihpst

8 months ago

📢 The AI safety conversation continues! Check insights from last month's expert panel Karina Vold Roger Grosse Sheila McIlraith Sedef Akinli Kocak 🌐 More on the IHPST site: ihpst.utoronto.ca/news/intersect… Schwartz Reisman Institute Vector Institute Centre for Ethics Vic College UofT

📢 The AI safety conversation continues! Check insights from last month's expert panel <a href="/karinavold/">Karina Vold</a> <a href="/RogerGrosse/">Roger Grosse</a> <a href="/SheilaMcIlraith/">Sheila McIlraith</a> <a href="/sedak99/">Sedef Akinli Kocak</a> 🌐 More on the IHPST site: ihpst.utoronto.ca/news/intersect… <a href="/TorontoSRI/">Schwartz Reisman Institute</a> <a href="/VectorInst/">Vector Institute</a> <a href="/UofTEthics/">Centre for Ethics</a> <a href="/VicCollege_UofT/">Vic College UofT</a>

thumb_up_off_alt5

chat_bubble_outline0

repeat3

shareShare

Colin Raffel

@colinraffel

7 months ago

Relu (yes, that ReLU), the superstar sysadmin who built the GPU servers behind AlexNet and many subsequent years of deep learning advances at U of Toronto, has left 😭 We are hiring his replacement; if you feel you could fill (or grow to fill) his big shoes, please apply!

thumb_up_off_alt231

chat_bubble_outline11

repeat25

shareShare

Jan Leike

@janleike

7 months ago

Apply to join the Anthropic Fellows Program! This is an exceptional opportunity to join AI safety research, collaborating with leading researchers on one of the world's most pressing problems. 👇 alignment.anthropic.com/2024/anthropic…

thumb_up_off_alt651

chat_bubble_outline14

repeat83

shareShare

Marius Hobbhahn

@mariushobbhahn

7 months ago

Oh man :( We tried really hard to neither over- nor underclaim the results in our communication, but, predictably, some people drastically overclaimed them, and then based on that, others concluded that there was nothing to be seen here (see examples in thread). So, let me try

thumb_up_off_alt617

chat_bubble_outline18

repeat94

shareShare

Ryan Greenblatt

@ryanpgreenblatt

7 months ago

New Redwood Research (Redwood Research) paper in collaboration with Anthropic: We demonstrate cases where Claude fakes alignment when it strongly dislikes what it is being trained to do. (Thread)

thumb_up_off_alt347

chat_bubble_outline9

repeat44

shareShare

Owain Evans

@owainevans_uk

6 months ago

New paper: We train LLMs on a particular behavior, e.g. always choosing risky options in economic decisions. They can *describe* their new behavior, despite no explicit mentions in the training data. So LLMs have a form of intuitive self-awareness 🧵

thumb_up_off_alt939

chat_bubble_outline43

repeat153

shareShare