Tom Bush ✈️ ICLR2025 (@_tom_bush) 's Twitter Profile
Tom Bush ✈️ ICLR2025

@_tom_bush

AI Alignment and Interpretability

ID: 1728022741763194880

linkhttp://tuphs28.github.io calendar_today24-11-2023 12:08:37

33 Tweet

116 Followers

173 Following

Fernando Rosas 🦋 (@_fernando_rosas) 's Twitter Profile Photo

Preprint time: “AI in a vat: Fundamental limits of efficient world modelling for agent sandboxing and interpretability” arxiv.org/abs/2504.04608 Exploring the fundamental limits that shape the design space of world modelling for agent sandboxing and interpretability