bilal 🇵🇸 (@bilalchughtai_) 's Twitter Profile
bilal 🇵🇸

@bilalchughtai_

interpretability @ google deepmind | ai safety | cambridge mmath

ID: 3297675443

linkhttps://bilalchughtai.co.uk/ calendar_today25-05-2015 10:59:27

229 Tweet

777 Followers

660 Following

L Rudolf L (@lrudl_) 's Twitter Profile Photo

If you're at NeurIPS, come see Kaivu Hariharan present our LLM situational awareness benchmark, the SAD paper, on Friday, 4:30-7:30pm, West Ballroom A-D #5101

bilal 🇵🇸 (@bilalchughtai_) 's Twitter Profile Photo

SAD to announce i won't be at neurips this year, but Kaivu Hariharan will be presenting our work on situational awareness on friday from 4:30-7:30pm in west ballroom a-d, poster #5101 - go check it out!

SAD to announce i won't be at neurips this year, but <a href="/KaivuHariharan/">Kaivu Hariharan</a> will be presenting our work on situational awareness on friday from 4:30-7:30pm in west ballroom a-d, poster #5101 - go check it out!
bilal 🇵🇸 (@bilalchughtai_) 's Twitter Profile Photo

nosetgauge.substack.com/p/capital-agi-… Rudolf has a good new blog post on the importance of capital and default decline of relevance of human labour post-AGI.

bilal 🇵🇸 (@bilalchughtai_) 's Twitter Profile Photo

new paper! we discuss open problems in - methods and foundations of mech interp - applications of mech interp towards scientific and engineering goals - sociotechnical aspects of mech interp

bilal 🇵🇸 (@bilalchughtai_) 's Twitter Profile Photo

improving the public discourse surrounding AI development and its impacts seems incredibly important to me, yet very few people are working on it! i've been impressed with tarbell's work so far and would encourage applications!

Neel Nanda (@neelnanda5) 's Twitter Profile Photo

Apps are open for my MATS stream, where I try to teach how to do great mech interp research. Due Feb 28! I love mentoring and have had 40+ mentees, who’ve made valuable contributions to the field, incl 10 top conference papers! You don’t need to be at a big lab to do mech interp

Apps are open for my MATS stream, where I try to teach how to do great mech interp research. Due Feb 28!

I love mentoring and have had 40+ mentees, who’ve made valuable contributions to the field, incl 10 top conference papers! You don’t need to be at a big lab to do mech interp
Max Nadeau (@maxnadeau_) 's Twitter Profile Photo

🧵 Announcing Open Philanthropy's Technical AI Safety RFP! We're seeking proposals across 21 research areas to help make AI systems more trustworthy, rule-following, and aligned, even as they become more capable.

🧵 Announcing <a href="/open_phil/">Open Philanthropy</a>'s Technical AI Safety RFP!

We're seeking proposals across 21 research areas to help make AI systems more trustworthy, rule-following, and aligned, even as they become more capable.
bilal 🇵🇸 (@bilalchughtai_) 's Twitter Profile Photo

another new paper! we build and evaluate the efficacy of simple probes, trained on internal model activations, in detecting instances of models acting strategically deceptive when placed in semi-realistic agentic scenarios.

Cas (Stephen Casper) (@stephenlcasper) 's Twitter Profile Photo

Imagine if the 2015 Paris Climate Summit was renamed the "Energy Action Summit," invited leaders from across the fossil fuel industry, raised millions for fossil fuels, ignored IPCC reports, and produced an agreement that didn't even mention climate change. #AIActionSummit 🤦

L Rudolf L (@lrudl_) 's Twitter Profile Photo

for years tech's had a meme: being a lawyer/doctor/engineer is the unambitious normie thing. but the AIs will shortly do all the coding. what's left? human legitimacy, human care, physically twisting the damn screws. full revenge of the normie career. checkmate, techies

Luke Drago (@luke_drago_) 's Twitter Profile Photo

Everyone’s trying to build AGI, loosely defined as systems that could outperform humans at all work. What happens to you when it exists? Let’s talk about how AGI will take your (white collar) job. Allow me to introduce you to pyramid replacement:

Everyone’s trying to build AGI, loosely defined as systems that could outperform humans at all work. What happens to you when it exists?

Let’s talk about how AGI will take your (white collar) job. Allow me to introduce you to pyramid replacement:
L Rudolf L (@lrudl_) 's Twitter Profile Photo

everyone says transformative AI is coming. but what might such a world actually look like when it comes to the most important questions: will Demis Hassabis win another Nobel? what does North Korea do? what's the future of academia? my insanely detailed scenario has the answers: