Victoria Li (@victoria_r_li) 's Twitter Profile
Victoria Li

@victoria_r_li

ID: 1809291349205872645

calendar_today05-07-2024 18:21:00

1 Tweet

23 Followers

23 Following

Naomi Saphra hiring a lab 🧈🪰 (@nsaphra) 's Twitter Profile Photo

Chatbots have biases in what they say—but what about biases in what they WON'T say? Our new paper (w/Victoria Li & Yida Chen) shows that personal info like a user's race, age, or love for the Los Angeles Chargers decides if ChatGPT refuses a request. arxiv.org/abs/2407.06866

Chatbots have biases in what they say—but what about biases in what they WON'T say? Our new paper (w/<a href="/victoria_r_li/">Victoria Li</a> &amp; <a href="/YidaEdward/">Yida Chen</a>) shows that personal info like a user's race, age, or love for the Los Angeles Chargers decides if ChatGPT refuses a request. arxiv.org/abs/2407.06866
Naomi Saphra hiring a lab 🧈🪰 (@nsaphra) 's Twitter Profile Photo

🚨 New preprint! 🚨 Everyone loves causal interp. It’s coherently defined! It makes testable predictions about mechanistic interventions! But what if we had a different objective: predicting model behavior not under mechanistic interventions, but on unseen input data?

🚨 New preprint! 🚨

Everyone loves causal interp. It’s coherently defined! It makes testable predictions about mechanistic interventions! But what if we had a different objective: predicting model behavior not under mechanistic interventions, but on unseen input data?