Krueger AI Safety Lab (@kasl_ai) Twitter Tweets • TwiCopy

David Krueger

2 years ago

I’m super excited to release our 100+ page collaborative agenda - led by Usman Anwar - on “Foundational Challenges In Assuring Alignment and Safety of LLMs” alongside 35+ co-authors from NLP, ML, and AI Safety communities! Some highlights below...

I’m super excited to release our 100+ page collaborative agenda - led by <a href="/usmananwar391/">Usman Anwar</a> - on “Foundational Challenges In Assuring Alignment and Safety of LLMs” alongside 35+ co-authors from NLP, ML, and AI Safety communities!

Some highlights below...

thumb_up_off_alt460

chat_bubble_outline7

repeat154

shareShare

Seán Ó hÉigeartaigh

@s_oheigeartaigh

2 years ago

I'm delighted to have contributed to this new Agenda Paper on AI Safety * Governance of LLMs can be a v powerful tool in helping assure their safety and alignment. It could complement and *substitute* for technical interventions. But LLM governance is currently challenging! 🧵⬇️

thumb_up_off_alt22

chat_bubble_outline2

repeat6

shareShare

Gabriel Recchia

@mesotronium

2 years ago

Super proud to have been able to make my little contribution to this monumental work. Huge credit to Usman Anwar for recognizing the need for this paper and pulling everything together to make it happen

thumb_up_off_alt17

chat_bubble_outline0

repeat1

shareShare

Usman Anwar

@usmananwar391

2 years ago

We released this new agenda on LLM-safety yesterday. This is VERY comprehensive covering 18 different challenges. My co-authors have posted tweets for each of these challenges. I am going to collect them all here! P.S. this is also now on arxiv: arxiv.org/abs/2404.09932

thumb_up_off_alt81

chat_bubble_outline5

repeat24

shareShare

David Krueger

@davidskrueger

2 years ago

Big congrats to my student Usman Anwar for this!

thumb_up_off_alt29

chat_bubble_outline0

repeat1

shareShare

Department for Science, Innovation and Technology

@scitechgovuk

2 years ago

The #AISeoulSummit is just a month away 🇬🇧 🇰🇷 Jointly hosted by the UK & the Republic of Korea, the summit will focus on: 🤝 international agreements on AI safety 🛡️ responsible development of AI by companies 💡 showcasing the benefits of safe AI

thumb_up_off_alt27

chat_bubble_outline3

repeat10

shareShare

Krueger AI Safety Lab

@kasl_ai

2 years ago

Watch our alumnus Jesse Hoogland presenting his work on singular learning theory

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Krueger AI Safety Lab

@kasl_ai

2 years ago

Congrats to our affiliate Fazl Barez 🔜 @NeurIPS whose paper has won best poster at Tokyo Technical AI Safety Conference @tais_2024 We have had the pleasure of working with Fazl since February

thumb_up_off_alt27

chat_bubble_outline0

repeat7

shareShare

Krueger AI Safety Lab

@kasl_ai

2 years ago

We will be at ICLR again this year! 🎉 Catch our poster next week in Vienna ICLR 2026. We’ll be in Hall B, booth #228 on Wed 8 May from 4:30-6:30 PM.

thumb_up_off_alt29

chat_bubble_outline0

repeat4

shareShare

Micah Carroll

@micahcarroll

2 years ago

Working to make RL agents safer and more aligned? Using RL methods to engineer safer AI? Developing audits or governance mechanisms for RL agents? Share your work with us at the RL Safety workshop at RL_Conference 2024! ‼️ Updated deadline ‼️ ➡️ 24th of May AoE

thumb_up_off_alt37

chat_bubble_outline1

repeat12

shareShare

Krueger AI Safety Lab

@kasl_ai

2 years ago

Catch Samyak, David Krueger and others at our ICLR 2026 poster tomorrow🚀

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Krueger AI Safety Lab

@kasl_ai

2 years ago

Congrats to Alan Chan , David Krueger , Markus Anderljung and the rest of the team on this ACM FAccT accepted paper

thumb_up_off_alt6

chat_bubble_outline0

repeat1

shareShare

Jan Brauner

@janmbrauner

2 years ago

Out in Science today: In our paper, we describe extreme AI risks and concrete actions to manage them, including tech R&D and governance. “For AI to be a boon, we must reorient; pushing AI capabilities alone is not enough.”

thumb_up_off_alt175

chat_bubble_outline10

repeat46

shareShare

David Krueger

@davidskrueger

2 years ago

It's great that governments and researchers are finally waking up to the extreme risks posed by AI. But we're still not doing nearly enough! Our short-but-sweet Science paper, with an all-star author list, argues for concrete steps that urgently need to be taken.

thumb_up_off_alt64

chat_bubble_outline1

repeat7

shareShare

Seán Ó hÉigeartaigh

@s_oheigeartaigh

2 years ago

Real privilege today to get scholars from Future Intelligence ,Centre for the Study of Existential Risk, Bennett Institute for Public Policy, & Krueger AI Safety Lab together today for a discussion of Concordia's State of AI Safety in China report with Kwan Yee Ng. Important work, buzzing exchange. concordia-ai.com

thumb_up_off_alt15

chat_bubble_outline1

repeat3

shareShare

Krueger AI Safety Lab

@kasl_ai

2 years ago

New paper from Krueger Lab alum Micah Carroll. Congrats 🎉

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

Fazl Barez

@fazlbarez

a year ago

Super proud to have contributed to Anthropic's new paper. We explore whether AI could learn to hack its own reward system through generalization from training. Important implications as AI systems become more capable.

thumb_up_off_alt70

chat_bubble_outline0

repeat6

shareShare

David Krueger

@davidskrueger

a year ago

New paper on sandbagging and password-locked models, concurrent with our work arxiv.org/abs/2405.19550

thumb_up_off_alt16

chat_bubble_outline1

repeat3

shareShare

ai@cam

@ai_cam_mission

a year ago

Could you help us build Cambridge University's #AI research community? We are looking for a Programme Manager who can deliver key programmes, scope new opportunities & ensure that our mission embeds agile project management. 📅 Deadline: 8 July Read more ⬇️ ai.cam.ac.uk/opportunities/…

thumb_up_off_alt4

chat_bubble_outline0

repeat4

shareShare

David Krueger

@davidskrueger

a year ago

"hot take" (((shouldn't in fact be a hot take, but in the context of current AI policy discussions anything other than "do some evals" is a hot take, sadly....)))

thumb_up_off_alt19

chat_bubble_outline1

repeat1

shareShare