Micah Carroll (@micahcarroll) 's Twitter Profile
Micah Carroll

@micahcarroll

AI PhD student @berkeley_ai

ID: 356711942

linkhttp://micahcarroll.github.io calendar_today17-08-2011 07:40:21

512 Tweet

1,1K Followers

634 Following

Dan Hendrycks (@danhendrycks) 's Twitter Profile Photo

SB 1047 has passed through the Appropriations Committee! It has significant amendments responding to industry engagement. These amendments are summarized in the link and in the images below safesecureai.org/amendments

SB 1047 has passed through the Appropriations Committee!

It has significant amendments responding to industry engagement.
These amendments are summarized in the link and in the images below
safesecureai.org/amendments
Eugene Vinitsky (@eugenevinitsky) 's Twitter Profile Photo

Some examples of evident drive so people don't pattern match to the wrong thing and keep sending 100 "hello, I want to do ML" emails: 1) keeping a giant doc of papers and thoughts on them 2) studying up before work or after work despite holding a full-time job (I did this) 1/4

Vincent Conitzer (@conitzer) 's Twitter Profile Photo

In preference elicitation (or active learning), we usually never ask the same question twice, because we think we already know the answer. In this upcoming AI, Ethics, and Society Conference (AIES) paper, we study how stable people's responses about moral preferences actually are. arxiv.org/abs/2408.02862

Lauro (@laurolangosco) 's Twitter Profile Photo

The drawing up of the GPAI code of practice is easily in the top 3 most important things happening in AI policy right now. Consider participating in a working group (deadline to apply is **August 25**!). Link in thread.

Sriyash Poddar (@sriyash__) 's Twitter Profile Photo

How can we align foundation models with populations of diverse users with different preferences? We are excited to share our work on Personalizing RLHF using Variational Preference Learning! 🧵 📜: arxiv.org/abs/2408.10075 🌎: weirdlabuw.github.io/vpl/

Micah Carroll (@micahcarroll) 's Twitter Profile Photo

Really enjoyed helping out xuan (ɕɥɛn / sh-yen) with this tour-de-force on preferences and their limitations for AI Alignment! We have gone down every rabbit hole so you don't have to!

Matija Franklin (@franklinmatija) 's Twitter Profile Photo

Really loved working on this! xuan (ɕɥɛn / sh-yen) Micah Carroll and Hal Ashton were a pleasure to think with. I will write a longer post tomorrow but please read our paper or at the very least skim the summary tables

Ameesh Shah (@ameeshsh) 's Twitter Profile Photo

Anyone considering applying to EE/Computer Science PhD programs - We run a program at Berkeley where your application gets reviewed by a current EECS PhD student! Check the link below!! And if you know any PhD hopefuls, please share for visibility!!

Jide 🔍 (@jide_alaga) 's Twitter Profile Photo

AI safety frameworks (RSPs) could be one of our best tools for managing AI risks. But how do we know if they're good? In our new paper, Jonas Schuett, Markus Anderljung, and I propose a rubric to find out.👇

AI safety frameworks (RSPs) could be one of our best tools for managing AI risks. But how do we know if they're good? 

In our new paper, <a href="/jonasschuett/">Jonas Schuett</a>, <a href="/Manderljung/">Markus Anderljung</a>, and I propose a rubric to find out.👇
Thomas Woodside (@thomas_woodside) 's Twitter Profile Photo

I recommend this internship to folks interested in doing technical AI safety work. I did it in 2021 (mentored by Scott Emmons, who I also recommend!) and it played a big role in launching my career in the area, even though I do policy work nowadays.

Jessy Lin (@realjessylin) 's Twitter Profile Photo

Really cool of ICLR to experiment with making AI part of the reviewing process. Instead of rejecting AI assistance and pretending that people aren't already using LMs to read/write/understand things, we can learn a lot from trying to make it part of our process (even if

Really cool of ICLR to experiment with making AI part of the reviewing process. Instead of rejecting AI assistance and pretending that people aren't already using LMs to read/write/understand things, we can learn a lot from trying to make it part of our process (even if
Gokul Swamy (@g_k_swamy) 's Twitter Profile Photo

Current multi-turn RLHF methods only getting you halfway there? Go the distance with REFUEL⛽️: a clean, regression-based approach that avoids covariate shift in multi-turn RLHF without a critic, learning policies that can converse with realistic users over multiple turns! (1/n)

Current multi-turn RLHF methods only getting you halfway there? Go the distance with REFUEL⛽️: a clean, regression-based approach that avoids covariate shift in multi-turn RLHF without a critic, learning policies that can converse with realistic users over multiple turns! (1/n)
Michael Cohen (@michael05156007) 's Twitter Profile Photo

New paper! Over-optimization in RL is well-known, but it even occurs when KL(policy || base model) is constrained fairly tightly. Why? And can we fix it? 🧵

New paper! Over-optimization in RL is well-known, but it even occurs when KL(policy || base model) is constrained fairly tightly. Why? And can we fix it?  🧵