Hossein A. (Saeed) Rahmani (@srahmanidashti) 's Twitter Profile
Hossein A. (Saeed) Rahmani

@srahmanidashti

PhD Student at WI (@ucl_wi_group) | @FAICDT1 | @UCL

ID: 975115762712096768

linkhttp://rahmanidashti.github.io calendar_today17-03-2018 21:04:49

735 Tweet

925 Followers

2,2K Following

Tim Rocktäschel (@_rockt) 's Twitter Profile Photo

Proud to announce that Dr Laura Ruis defended her PhD thesis titled "Understanding and Evaluating Reasoning in Large Language Models" last week 🥳. Massive thanks to Noah Goodman and Emine Yilmaz for examining! As is customary, Laura received a personal mortarboard from

Proud to announce that Dr <a href="/LauraRuis/">Laura Ruis</a> defended her PhD thesis titled "Understanding and Evaluating Reasoning in Large Language Models" last week 🥳. Massive thanks to Noah Goodman and Emine Yilmaz for examining! As is customary, Laura received a personal mortarboard from
Hossein A. (Saeed) Rahmani (@srahmanidashti) 's Twitter Profile Photo

Congrats, Dr. Laura Ruis! Laura is amazing, and I'm very happy my PhD journey overlapped with hers. I've learned so much from her, from working together on a project to the many times we discussed my research and challenges, and that's just a few examples.

EvalEval Coalition (@evaluatingevals) 's Twitter Profile Photo

🚨 AI keeps scaling, but social impact evaluations aren’t–and the data proves it Our new paper, 📎“Who Evaluates AI’s Social Impacts? Mapping Coverage and Gaps in First and Third Party Evaluations,” analyzes hundreds of evaluation reports and reveals major blind spots ‼️🧵 (1/7)

🚨 AI keeps scaling, but social impact evaluations aren’t–and the data proves it

Our new paper, 📎“Who Evaluates AI’s Social Impacts? Mapping Coverage and Gaps in First and Third Party Evaluations,” analyzes hundreds of evaluation reports and reveals major blind spots ‼️🧵 (1/7)
EvalEval Coalition (@evaluatingevals) 's Twitter Profile Photo

While general capability evaluations are common, social impact assessments, covering bias, fairness, and privacy, etc., are often fragmented or missing. 🧠 🎯Our goal: Explore the AI Eval landscape to answer who evaluates what and identify gaps in social impact evals!! (2/7)

EvalEval Coalition (@evaluatingevals) 's Twitter Profile Photo

📊 What we did: 🔎 Analyzed 186 first-party release reports from model developers & 183 post-release evaluations (third-party) 📏 Scored 7 social impact dimensions: bias, harmful content, performance disparities, environmental costs, privacy, financial costs, & labor (3/7)

EvalEval Coalition (@evaluatingevals) 's Twitter Profile Photo

Continued... 💡 We also interviewed developers from for-profit and non-profit orgs to understand why some disclosures happen and why others don’t. 💬TLDR; Incentives and constraints shape reporting (4/7)

EvalEval Coalition (@evaluatingevals) 's Twitter Profile Photo

Key Takeaways: ⛔️ First-party reporting is often sparse & superficial, with many reporting NO social impact evals 📉 On average, first-party scores are far lower than third-party evals (0.72 vs 2.62/3) 🎯 Third parties provide some complementary coverage (GPT-4 and LLaMA) (5/7)

Key Takeaways:

⛔️ First-party reporting is often sparse &amp; superficial, with many reporting NO social impact evals 
📉 On average, first-party scores are far lower than third-party evals (0.72 vs 2.62/3)
🎯 Third parties provide some complementary coverage (GPT-4 and LLaMA) (5/7)
EvalEval Coalition (@evaluatingevals) 's Twitter Profile Photo

Continued.. 📉 Reporting on social impact dimensions has steadily declined, both in frequency and detail, across major providers 🧑‍💻 Sensitive content gets the most attention, as it’s easier to define and measure 🛡️Solution? Standardized reporting & safety policies (6/7)

Continued..

📉 Reporting on social impact dimensions has steadily declined, both in frequency and detail, across major providers
🧑‍💻 Sensitive content gets the most attention, as it’s easier to define and measure 

🛡️Solution? Standardized reporting &amp; safety policies (6/7)
EvalEval Coalition (@evaluatingevals) 's Twitter Profile Photo

📜Paper: arxiv.org/pdf/2511.05613… 📝Blog: tinyurl.com/blogAI1 🤝At EvalEval, we are a coalition of researchers working towards better AI evals. Interested in joining us? Check out: evalevalai.com 7/7 🧵

Mubashara Akhtar (@akhtarmubashara) 's Twitter Profile Photo

It was a pleasure to contribute to this EvalEval Coalition project on social impact evaluation: “Who Evaluates AI’s Social Impacts? Mapping Coverage and Gaps in First and Third Party Evaluations”, led by Avijit Ghosh ➡️ Neurips, Anka Reuel | @ankareuel.bsky.social, and Jenny Chim! More details and preprint👇.

Lisa Alazraki (@lisaalazraki) 's Twitter Profile Photo

I am going to be at #NeurIPS2025 in San Diego to present our spotlight poster ‘Reverse Engineering Human Preferences with Reinforcement Learning’. Come check it out and have a chat on Wed 3 Dec, 11am to 2pm, Hall C,D,E poster #1909

I am going to be at #NeurIPS2025 in San Diego to present our spotlight poster ‘Reverse Engineering Human Preferences with Reinforcement Learning’. Come check it out and have a chat on Wed 3 Dec, 11am to 2pm, Hall C,D,E poster #1909
Laura Ruis (@lauraruis) 's Twitter Profile Photo

Apply to do research with me on emergence of agency/planning in LLMs, out-of-context reasoning, understanding generalization from data, or propose your own direction! Very excited to be mentoring this spring 💫