Jide ๐Ÿ” (@jide_alaga) 's Twitter Profile
Jide ๐Ÿ”

@jide_alaga

AI Governance @METR_Evals | Rooting for the better angels of our nature..

ID: 744242732223213568

calendar_today18-06-2016 18:57:49

500 Tweet

575 Followers

539 Following

Ajeya Cotra (@ajeya_cotra) 's Twitter Profile Photo

Impressions from talking to ML researchers and engineers about how they use AI, focusing on weaknesses and frictions (strengths are better covered by benchmarks) ๐Ÿงต

Yo Shavit (@yonashav) 's Twitter Profile Photo

3) If you get greedy and decide to directly train the CoT not to think about reward hacking, it seems work for a bit, but then models eventually still learn to reward-hackโ€ฆ except they hide misaligned reasoning so it doesnโ€™t show up in their CoT!

3) If you get greedy and decide to directly train the CoT not to think about reward hacking, it seems work for a bit, but then models eventually still learn to reward-hackโ€ฆ except they hide misaligned reasoning so it doesnโ€™t show up in their CoT!
Marius Hobbhahn (@mariushobbhahn) 's Twitter Profile Photo

I think this paper is really important! 1. It shows that current models already have the capabilities and propensities to do surprisingly clever reward hacks. 2. It shows the utility of CoT monitoring in the regime where the CoT is legible and faithful. 3. IMO, the most

Jide ๐Ÿ” (@jide_alaga) 's Twitter Profile Photo

I genuinely believe I have a happier life than a lot of famous people and probably most A-listers. I don't understand why people want to be famous so badly..

Jide ๐Ÿ” (@jide_alaga) 's Twitter Profile Photo

Shower thought: I would love to see something like a memetic dashboard, showing the most powerful memes in the world, where they are growing/declining, and describing how (and how strongly) they tend to motivate behaviour.

Chris Painter (@chrispainteryup) 's Twitter Profile Photo

When should AI companies publish system cards? I want to make the case that the ideal system would involve something closer to quarterly reporting, rather than focusing so much on deployment. Sharing here to get pushback and debate๐Ÿงต

Jide ๐Ÿ” (@jide_alaga) 's Twitter Profile Photo

I think these kinds of company-evaluator collaborations provide much better public assurances for safety than the current status quo, and I think it's incredibly exciting! We need more of this, kudos to Amazon!

OpenAI (@openai) 's Twitter Profile Photo

Introducing the Safety Evaluations Hubโ€”a resource to explore safety results for our models. While system cards share safety metrics at launch, the Hub will be updated periodically as part of our efforts to communicate proactively about safety. openai.com/safety/evaluatโ€ฆ

Joel Becker (@joel_bkr) 's Twitter Profile Photo

wicked preliminary result from Thomas Akira Kwa. AI time horizon, and doubling time of time horizon, seems to vary a lot by domain -- and METR's HCAST task suite is in the middle for both

wicked preliminary result from <a href="/Kwathomas0/">Thomas Akira Kwa</a>. AI time horizon, and doubling time of time horizon, seems to vary a lot by domain -- and METR's HCAST task suite is in the middle for both
Rob Wiblin (@robertwiblin) 's Twitter Profile Photo

AI models currently have a 50% chance of doing something that takes a human expert one hour. This doubles every 7 months. In 2 years? They could automate full workdays. In 4 years? A full month. I discuss the most important graph in AI today with Beth Barnes, the CEO of METR,