Micah Carroll
@micahcarroll
AI PhD student @berkeley_ai
ID: 356711942
http://micahcarroll.github.io 17-08-2011 07:40:21
512 Tweet
1,1K Takipçi
634 Takip Edilen
In preference elicitation (or active learning), we usually never ask the same question twice, because we think we already know the answer. In this upcoming AI, Ethics, and Society Conference (AIES) paper, we study how stable people's responses about moral preferences actually are. arxiv.org/abs/2408.02862
Really enjoyed helping out xuan (ɕɥɛn / sh-yen) with this tour-de-force on preferences and their limitations for AI Alignment! We have gone down every rabbit hole so you don't have to!
Really loved working on this! xuan (ɕɥɛn / sh-yen) Micah Carroll and Hal Ashton were a pleasure to think with. I will write a longer post tomorrow but please read our paper or at the very least skim the summary tables
AI safety frameworks (RSPs) could be one of our best tools for managing AI risks. But how do we know if they're good? In our new paper, Jonas Schuett, Markus Anderljung, and I propose a rubric to find out.👇
I recommend this internship to folks interested in doing technical AI safety work. I did it in 2021 (mentored by Scott Emmons, who I also recommend!) and it played a big role in launching my career in the area, even though I do policy work nowadays.