Micah Carroll (@micahcarroll) Twitter Tweets • TwiCopy

Dan Hendrycks

5 months ago

SB 1047 has passed through the Appropriations Committee! It has significant amendments responding to industry engagement. These amendments are summarized in the link and in the images below safesecureai.org/amendments

thumb_up_off_alt185

chat_bubble_outline12

repeat17

shareShare

Eugene Vinitsky

@eugenevinitsky

5 months ago

Some examples of evident drive so people don't pattern match to the wrong thing and keep sending 100 "hello, I want to do ML" emails: 1) keeping a giant doc of papers and thoughts on them 2) studying up before work or after work despite holding a full-time job (I did this) 1/4

thumb_up_off_alt161

chat_bubble_outline7

repeat5

shareShare

Vincent Conitzer

@conitzer

5 months ago

In preference elicitation (or active learning), we usually never ask the same question twice, because we think we already know the answer. In this upcoming AI, Ethics, and Society Conference (AIES) paper, we study how stable people's responses about moral preferences actually are. arxiv.org/abs/2408.02862

thumb_up_off_alt33

chat_bubble_outline4

repeat7

shareShare

Dan Hendrycks

@danhendrycks

5 months ago

Here is a section-by-section summary of SB 1047, with key clarifications highlighted.

thumb_up_off_alt83

chat_bubble_outline11

repeat8

shareShare

Lauro

@laurolangosco

5 months ago

The drawing up of the GPAI code of practice is easily in the top 3 most important things happening in AI policy right now. Consider participating in a working group (deadline to apply is **August 25**!). Link in thread.

thumb_up_off_alt23

chat_bubble_outline1

repeat5

shareShare

Sriyash Poddar

@sriyash__

5 months ago

How can we align foundation models with populations of diverse users with different preferences? We are excited to share our work on Personalizing RLHF using Variational Preference Learning! 🧵 📜: arxiv.org/abs/2408.10075 🌎: weirdlabuw.github.io/vpl/

thumb_up_off_alt116

chat_bubble_outline2

repeat28

shareShare

Micah Carroll

@micahcarroll

4 months ago

Really enjoyed helping out xuan (ɕɥɛn / sh-yen) with this tour-de-force on preferences and their limitations for AI Alignment! We have gone down every rabbit hole so you don't have to!

thumb_up_off_alt56

chat_bubble_outline3

repeat5

shareShare

Matija Franklin

@franklinmatija

4 months ago

Really loved working on this! xuan (ɕɥɛn / sh-yen) Micah Carroll and Hal Ashton were a pleasure to think with. I will write a longer post tomorrow but please read our paper or at the very least skim the summary tables

thumb_up_off_alt68

chat_bubble_outline1

repeat12

shareShare

Michael Cohen

@michael05156007

4 months ago

We sent this letter to Gavin Newsom this morning. He should sign SB 1047! 🧵

We sent this letter to <a href="/GavinNewsom/">Gavin Newsom</a> this morning. He should sign SB 1047! 🧵

thumb_up_off_alt105

chat_bubble_outline8

repeat28

shareShare

Ameesh Shah

@ameeshsh

4 months ago

Anyone considering applying to EE/Computer Science PhD programs - We run a program at Berkeley where your application gets reviewed by a current EECS PhD student! Check the link below!! And if you know any PhD hopefuls, please share for visibility!!

thumb_up_off_alt11

chat_bubble_outline0

repeat2

shareShare

Jide 🔍

@jide_alaga

4 months ago

AI safety frameworks (RSPs) could be one of our best tools for managing AI risks. But how do we know if they're good? In our new paper, Jonas Schuett, Markus Anderljung, and I propose a rubric to find out.👇

AI safety frameworks (RSPs) could be one of our best tools for managing AI risks. But how do we know if they're good?

In our new paper, <a href="/jonasschuett/">Jonas Schuett</a>, <a href="/Manderljung/">Markus Anderljung</a>, and I propose a rubric to find out.👇

thumb_up_off_alt32

chat_bubble_outline1

repeat6

shareShare

Thomas Woodside

@thomas_woodside

3 months ago

I recommend this internship to folks interested in doing technical AI safety work. I did it in 2021 (mentored by Scott Emmons, who I also recommend!) and it played a big role in launching my career in the area, even though I do policy work nowadays.

thumb_up_off_alt5

chat_bubble_outline1

repeat1

shareShare

Dylan Cope

@dylanrobertcope

3 months ago

I can attest that this is a great experience and lots of fun :)

thumb_up_off_alt7

chat_bubble_outline0

repeat1

shareShare

Jessy Lin

@realjessylin

3 months ago

Really cool of ICLR to experiment with making AI part of the reviewing process. Instead of rejecting AI assistance and pretending that people aren't already using LMs to read/write/understand things, we can learn a lot from trying to make it part of our process (even if

thumb_up_off_alt66

chat_bubble_outline1

repeat8

shareShare

Gokul Swamy

@g_k_swamy

3 months ago

Current multi-turn RLHF methods only getting you halfway there? Go the distance with REFUEL⛽️: a clean, regression-based approach that avoids covariate shift in multi-turn RLHF without a critic, learning policies that can converse with realistic users over multiple turns! (1/n)

thumb_up_off_alt85

chat_bubble_outline2

repeat21

shareShare

Michael Cohen

@michael05156007

3 months ago

New paper! Over-optimization in RL is well-known, but it even occurs when KL(policy || base model) is constrained fairly tightly. Why? And can we fix it? 🧵

thumb_up_off_alt397

chat_bubble_outline6

repeat68

shareShare