Seraphina Goldfarb-Tarrant
@seraphinagt
Head of AI Safety @cohere. Phd @EdinburghNLP @InfAtED.
If you don't recognise me, that's because I am invisible dl.acm.org/doi/10.1145/25…
ID: 85239409
https://seraphinatarrant.github.io 26-10-2009 04:41:41
243 Tweet
976 Followers
379 Following
A break from the usual #ACL2024 program: Preethi Seshadri 🔥is working with me on open research for fairness of LLMs in hiring. 👇 is a short form to share a resume so we can check our synth data is predictive of real data. Please help! 🙏💖 (no training/ sharing, we delete it after)
A good reminder for those of us in LLM land (like me) that we don't only need to mitigate gender biases *caused* by LM generation, but we should enable researchers to *use* LMs to discover biases in human content. From Isabelle Augenstein's keynote genderbiasnlp #ACL2024
The Oral paper presentation are starting now genderbiasnlp #ACL2024 !
First oral in genderbiasnlp on stereotype reduction -- it's nice to see human evals on stereotypes instead of just benchmark results! *especially* because the benchmarks are so flawed (srsly don't use just the benchmarks) #ACL2024
Final oral of genderbiasnlp ! Actually being given by my MSc supervisor Fei 😂🔥 who is the last author. They do a super detailed taxonomy of gender bias types (way beyond usual) and use it to analyse bias in educational materials.
Last event of the day genderbiasnlp , the lightning talks!! Particularly love this one so far: an analysis of over refusal of certain identities in LLMs 🔥. We also don't talk about how safety tuning risks exacerbating erasure of minorities 🙊. We should 📢. #ACL2024
A cool survey from our genderbiasnlp lightning talks: it's a nice visualisation of longitudinal fads in measurement and debiasing in language models #ACL2024