Hossein A. (Saeed) Rahmani (@srahmanidashti) Twitter Tweets • TwiCopy

Lorenz Wolf

@lorenz_wlf

3 months ago

Very excited about our proposal and grateful to be supported by the NVIDIA Academic Grant!!

thumb_up_off_alt4

chat_bubble_outline0

repeat1

shareShare

Proud to announce that Dr Laura Ruis defended her PhD thesis titled "Understanding and Evaluating Reasoning in Large Language Models" last week 🥳. Massive thanks to Noah Goodman and Emine Yilmaz for examining! As is customary, Laura received a personal mortarboard from

Proud to announce that Dr <a href="/LauraRuis/">Laura Ruis</a> defended her PhD thesis titled "Understanding and Evaluating Reasoning in Large Language Models" last week 🥳. Massive thanks to Noah Goodman and Emine Yilmaz for examining! As is customary, Laura received a personal mortarboard from

thumb_up_off_alt90

chat_bubble_outline6

repeat12

shareShare

Hossein A. (Saeed) Rahmani

@srahmanidashti

3 months ago

Congrats, Dr. Laura Ruis! Laura is amazing, and I'm very happy my PhD journey overlapped with hers. I've learned so much from her, from working together on a project to the many times we discussed my research and challenges, and that's just a few examples.

thumb_up_off_alt6

chat_bubble_outline1

repeat0

shareShare

EvalEval Coalition

@evaluatingevals

a month ago

🚨 AI keeps scaling, but social impact evaluations aren’t–and the data proves it Our new paper, 📎“Who Evaluates AI’s Social Impacts? Mapping Coverage and Gaps in First and Third Party Evaluations,” analyzes hundreds of evaluation reports and reveals major blind spots ‼️🧵 (1/7)

thumb_up_off_alt22

chat_bubble_outline1

repeat6

shareShare

EvalEval Coalition

@evaluatingevals

a month ago

While general capability evaluations are common, social impact assessments, covering bias, fairness, and privacy, etc., are often fragmented or missing. 🧠 🎯Our goal: Explore the AI Eval landscape to answer who evaluates what and identify gaps in social impact evals!! (2/7)

thumb_up_off_alt3

chat_bubble_outline1

repeat1

shareShare

EvalEval Coalition

@evaluatingevals

a month ago

📊 What we did: 🔎 Analyzed 186 first-party release reports from model developers & 183 post-release evaluations (third-party) 📏 Scored 7 social impact dimensions: bias, harmful content, performance disparities, environmental costs, privacy, financial costs, & labor (3/7)

thumb_up_off_alt3

chat_bubble_outline1

repeat1

shareShare

EvalEval Coalition

@evaluatingevals

a month ago

Continued... 💡 We also interviewed developers from for-profit and non-profit orgs to understand why some disclosures happen and why others don’t. 💬TLDR; Incentives and constraints shape reporting (4/7)

thumb_up_off_alt3

chat_bubble_outline1

repeat1

shareShare

EvalEval Coalition

@evaluatingevals

a month ago

Key Takeaways: ⛔️ First-party reporting is often sparse & superficial, with many reporting NO social impact evals 📉 On average, first-party scores are far lower than third-party evals (0.72 vs 2.62/3) 🎯 Third parties provide some complementary coverage (GPT-4 and LLaMA) (5/7)

thumb_up_off_alt5

chat_bubble_outline1

repeat1

shareShare

EvalEval Coalition

@evaluatingevals

a month ago

Continued.. 📉 Reporting on social impact dimensions has steadily declined, both in frequency and detail, across major providers 🧑‍💻 Sensitive content gets the most attention, as it’s easier to define and measure 🛡️Solution? Standardized reporting & safety policies (6/7)

thumb_up_off_alt3

chat_bubble_outline1

repeat1

shareShare

EvalEval Coalition

@evaluatingevals

a month ago

📜Paper: arxiv.org/pdf/2511.05613… 📝Blog: tinyurl.com/blogAI1 🤝At EvalEval, we are a coalition of researchers working towards better AI evals. Interested in joining us? Check out: evalevalai.com 7/7 🧵

thumb_up_off_alt2

chat_bubble_outline0

repeat1

shareShare

Mubashara Akhtar

@akhtarmubashara

a month ago

It was a pleasure to contribute to this EvalEval Coalition project on social impact evaluation: “Who Evaluates AI’s Social Impacts? Mapping Coverage and Gaps in First and Third Party Evaluations”, led by Avijit Ghosh ➡️ Neurips, Anka Reuel | @ankareuel.bsky.social, and Jenny Chim! More details and preprint👇.

thumb_up_off_alt6

chat_bubble_outline0

repeat1

shareShare

Lisa Alazraki

@lisaalazraki

20 days ago

I am going to be at #NeurIPS2025 in San Diego to present our spotlight poster ‘Reverse Engineering Human Preferences with Reinforcement Learning’. Come check it out and have a chat on Wed 3 Dec, 11am to 2pm, Hall C,D,E poster #1909

thumb_up_off_alt96

chat_bubble_outline4

repeat8

shareShare

Laura Ruis

@lauraruis

17 days ago

Apply to do research with me on emergence of agency/planning in LLMs, out-of-context reasoning, understanding generalization from data, or propose your own direction! Very excited to be mentoring this spring 💫

thumb_up_off_alt195

chat_bubble_outline4

repeat21

shareShare

Hossein A. (Saeed) Rahmani

Lorenz Wolf

Tim Rocktäschel

Hossein A. (Saeed) Rahmani

EvalEval Coalition

EvalEval Coalition

EvalEval Coalition

EvalEval Coalition

EvalEval Coalition

EvalEval Coalition

EvalEval Coalition

Mubashara Akhtar

Lisa Alazraki

Laura Ruis