Rakshit Trivedi
@rstriv
Postdoctoral Associate at MIT
ID: 49674797
22-06-2009 16:13:06
24 Tweet
47 Takipçi
166 Takip Edilen
New piece in Tech Policy, with Sina Fazelpour and Luca (also on the other platforms): we get into the details of red-teaming in the context of AI systems. We make the case that the inherent subjectivity of assessing AI means that the details around the red team matter. techpolicy.press/red-teaming-ai…
But wait, there might be hope... 🌟 Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data w/ Rylan Schaeffer Apratim Dey Rafael Rafailov Sanmi Koyejo Dan Roberts Andrey Gromov Diyi Yang David Donoho arxiv.org/abs/2404.01413 2/N
New paper out from Algorithmic Alignment Group! The key takeaway: training against an adversary that perturbs intermediate latent activations *with a well-defined target* is quite effective at robustly removing the behavior.
We're excited to be partnering again with Apart Research for a hackathon next weekend in the run-up to the Concordia Contest at NeurIPS Conference! The challenge: advancing the cooperative intelligence of language model agents. Sign up here: apartresearch.com/event/the-conc….
In collaboration with colleagues from Google DeepMind, Massachusetts Institute of Technology (MIT), UC Berkeley, and UCL, we are excited to announce that the NeurIPS 2024 Concordia Contest is now open! Deadline: October 31st. Prizes: $10,000 + more. Further details: cooperativeai.com/contests/conco…. youtube.com/watch?v=Xtb1WZ…
Video from our tutorial NeurIPS Conference 2024 is up! Dylan HadfieldMenell Joel Z Leibo Rakshit Trivedi and I explore how frameworks from economics, institutional and political theory, and biological and cultural evolution can advance approaches to AI alignment neurips.cc/virtual/2024/t…
🚨New paper led by Ariba Khan Lots of prior research has assumed that LLMs have stable preferences, align with coherent principles, or can be steered to represent specific worldviews. No ❌, no ❌, and definitely no ❌. We need to be careful not to anthropomorphize LLMs too much.