
Hadas Orgad
@orgadhadas
PhD student @ Technion | Focused on AI interpretability, robustness & safety | Because black boxes don’t belong in critical systems
ID: 1121835405454786561
https://orgadhadas.github.io/ 26-04-2019 17:56:18
171 Tweet
456 Followers
116 Following

We received more submissions to our Actionable Interpretability workshop (Actionable Interpretability Workshop ICML2025) than expected, and we're now looking for additional reviewers! We're seeking reviewers to handle 2–3 papers between May 24 – June 7. Sign up here: forms.gle/FLToWY3keb832n… Thank you! 🙏


🚨New paper at #ACL2025 Findings! REVS: Unlearning Sensitive Information in LMs via Rank Editing in the Vocabulary Space. LMs memorize and leak sensitive data—emails, SSNs, URLs from their training. We propose a surgical method to unlearn it. 🧵👇w/Yonatan Belinkov Martin Tutek 1/8



I'm excited that our Actionable Interpretability Workshop ICML2025 workshop at ICML Conference received over 150 submissions! We had to expand our reviewer pool to accommodate all submissions. I hope this reflects a growing interest in more actionable approaches to interpretability.



Going to #icml2025? Don't miss the Actionable Interpretability Workshop (Actionable Interpretability Workshop ICML2025)! We've got an amazing lineup of speakers, panelists, and papers, all focused on leveraging insights from interpretability research to tackle practical, real-world problems ✨



After a thunderstorm cancelled my flight, I finally made it to Vancouver for #ICML2025 and the Actionable Interpretability Workshop ICML2025 workshop! DM if you want to chat about using interpretability for safer and more controllable AI. We’ll also present the Mech-Interp Benchmark (MIB) on Thu @ 11:00—come by!


Hope everyone’s getting the most out of #icml25. We’re excited and ready for the Actionable Interpretability (Actionable Interpretability Workshop ICML2025) workshop this Saturday! Check out the schedule and join us to discuss how we can move interpretability toward more practical impact.





🎉 Presenting my poster today at #ACL2025 ! REVS: Unlearning Sensitive Info in LMs via Rank Editing Come by to chat about unlearning memorized info without gradients. 🕥 10:30–12:00 With Yonatan Belinkov & Martin Tutek 📄 Paper: arxiv.org/abs/2405.18100 🌐 Website: technion-cs-nlp.github.io/REVS/
