Krueger AI Safety Lab (@kasl_ai) 's Twitter Profile
Krueger AI Safety Lab

@kasl_ai

We are a research group at the University of Cambridge led by @DavidSKrueger, focused on avoiding catastrophic risks from AI

ID: 1711727229212794880

linkhttps://www.kasl.ai/ calendar_today10-10-2023 12:57:09

26 Tweet

356 Followers

133 Following

David Krueger (@davidskrueger) 's Twitter Profile Photo

I’m super excited to release our 100+ page collaborative agenda - led by Usman Anwar - on “Foundational Challenges In Assuring Alignment and Safety of LLMs” alongside 35+ co-authors from NLP, ML, and AI Safety communities! Some highlights below...

I’m super excited to release our 100+ page collaborative agenda - led by <a href="/usmananwar391/">Usman Anwar</a> - on “Foundational Challenges In Assuring Alignment and Safety of LLMs” alongside 35+ co-authors from NLP, ML, and AI Safety communities! 

Some highlights below...
Seán Ó hÉigeartaigh (@s_oheigeartaigh) 's Twitter Profile Photo

I'm delighted to have contributed to this new Agenda Paper on AI Safety * Governance of LLMs can be a v powerful tool in helping assure their safety and alignment. It could complement and *substitute* for technical interventions. But LLM governance is currently challenging! 🧵⬇️

Gabriel Recchia (@mesotronium) 's Twitter Profile Photo

Super proud to have been able to make my little contribution to this monumental work. Huge credit to Usman Anwar for recognizing the need for this paper and pulling everything together to make it happen

Usman Anwar (@usmananwar391) 's Twitter Profile Photo

We released this new agenda on LLM-safety yesterday. This is VERY comprehensive covering 18 different challenges. My co-authors have posted tweets for each of these challenges. I am going to collect them all here! P.S. this is also now on arxiv: arxiv.org/abs/2404.09932

Department for Science, Innovation and Technology (@scitechgovuk) 's Twitter Profile Photo

The #AISeoulSummit is just a month away 🇬🇧 🇰🇷 Jointly hosted by the UK & the Republic of Korea, the summit will focus on: 🤝 international agreements on AI safety 🛡️ responsible development of AI by companies 💡 showcasing the benefits of safe AI

The #AISeoulSummit is just a month away 🇬🇧 🇰🇷

Jointly hosted by the UK &amp; the Republic of Korea, the summit will focus on:

🤝 international agreements on AI safety 
🛡️ responsible development of AI by companies 
💡 showcasing the benefits of safe AI
Krueger AI Safety Lab (@kasl_ai) 's Twitter Profile Photo

Congrats to our affiliate Fazl Barez 🔜 @NeurIPS whose paper has won best poster at Tokyo Technical AI Safety Conference @tais_2024 We have had the pleasure of working with Fazl since February

Krueger AI Safety Lab (@kasl_ai) 's Twitter Profile Photo

We will be at ICLR again this year! 🎉 Catch our poster next week in Vienna ICLR 2026. We’ll be in Hall B, booth #228 on Wed 8 May from 4:30-6:30 PM.

Micah Carroll (@micahcarroll) 's Twitter Profile Photo

Working to make RL agents safer and more aligned? Using RL methods to engineer safer AI? Developing audits or governance mechanisms for RL agents? Share your work with us at the RL Safety workshop at RL_Conference 2024! ‼️ Updated deadline ‼️ ➡️ 24th of May AoE

Working to make RL agents safer and more aligned? Using RL methods to engineer safer AI? Developing audits or governance mechanisms for RL agents? Share your work with us at the RL Safety workshop at <a href="/RL_Conference/">RL_Conference</a> 2024!

‼️ Updated deadline ‼️ ➡️ 24th of May AoE
Jan Brauner (@janmbrauner) 's Twitter Profile Photo

Out in Science today: In our paper, we describe extreme AI risks and concrete actions to manage them, including tech R&D and governance. “For AI to be a boon, we must reorient; pushing AI capabilities alone is not enough.”

Out in Science today:
In our paper, we describe extreme AI risks and concrete actions to manage them, including tech R&amp;D and governance.
 “For AI to be a boon, we must reorient; pushing AI capabilities alone is not enough.”
David Krueger (@davidskrueger) 's Twitter Profile Photo

It's great that governments and researchers are finally waking up to the extreme risks posed by AI. But we're still not doing nearly enough! Our short-but-sweet Science paper, with an all-star author list, argues for concrete steps that urgently need to be taken.

Fazl Barez (@fazlbarez) 's Twitter Profile Photo

Super proud to have contributed to Anthropic's new paper. We explore whether AI could learn to hack its own reward system through generalization from training. Important implications as AI systems become more capable.

ai@cam (@ai_cam_mission) 's Twitter Profile Photo

Could you help us build Cambridge University's #AI research community? We are looking for a Programme Manager who can deliver key programmes, scope new opportunities & ensure that our mission embeds agile project management. 📅 Deadline: 8 July Read more ⬇️ ai.cam.ac.uk/opportunities/…

David Krueger (@davidskrueger) 's Twitter Profile Photo

"hot take" (((shouldn't in fact be a hot take, but in the context of current AI policy discussions anything other than "do some evals" is a hot take, sadly....)))