LLM Evals Workshop @NeurIPS (@llm_eval) 's Twitter Profile
LLM Evals Workshop @NeurIPS

@llm_eval

NeurIPS 2025 Workshop. Evaluating the Evolving LLM Lifecycle: Benchmarks, Emergent Abilities, and Scaling

ID: 1945760836083216384

linkhttps://sites.google.com/corp/view/llm-eval-workshop calendar_today17-07-2025 08:22:36

18 Tweet

88 Takipçi

16 Takip Edilen

Berivan Isik (@berivanisik) 's Twitter Profile Photo

Ahmad Beirami ✈️ NeurIPS Indeed! In our NeurIPS workshop LLM Evals Workshop @NeurIPS, we’ll tackle the most pressing evaluation challenges. Join us to discuss how we should design the next generation of evaluations with experts in the field! More details: sites.google.com/view/llm-eval-…

LLM Evals Workshop @NeurIPS (@llm_eval) 's Twitter Profile Photo

Last 2 days to the deadline! If you're interested in helping with the reviewing process, please consider volunteering using the following form. There will be best reviewer awards! docs.google.com/forms/d/e/1FAI…

Riccardo Cadei (@riccardocadeii) 's Twitter Profile Photo

The Narcissus Hypothesis: --Recursive training on semi-synthetic corpora enforcing human alignment induces a Social Desirability Bias: world-models (Narcissus) aim to please rather than represent, polluting data lakes and charming us (Echo) into hanging on their every word.

The Narcissus Hypothesis:
--Recursive training on semi-synthetic corpora enforcing human alignment induces a Social Desirability Bias: world-models (Narcissus) aim to please rather than represent, polluting data lakes and charming us (Echo) into hanging on their every word.
Riccardo Cadei (@riccardocadeii) 's Twitter Profile Photo

Sketched on a few Parisian summer nights with a friend, Christian Internò . If you care about (causal) identification in a semi-synthetic future, we’d value your read and critique. Preprint: arxiv.org/pdf/2509.17999 Accepted at LLM Evals Workshop @NeurIPS workshop NeurIPS Conference

Berivan Isik (@berivanisik) 's Twitter Profile Photo

I’ll be NeurIPS Conference all week and would love to connect on LLM data, evaluation, benchmarking, and scaling laws. If you’re working on related problems, feel free to reach out. PS: Don’t miss our one-of-a-kind workshop on LLM evaluation: sites.google.com/view/llm-eval-…