Ariel Gera (@arielgera2) 's Twitter Profile
Ariel Gera

@arielgera2

ID: 1461322741387300864

calendar_today18-11-2021 13:18:06

52 Tweet

42 Followers

54 Following

Sumit (@_reachsumit) 's Twitter Profile Photo

JuStRank: Benchmarking LLM Judges for System Ranking IBM presents a large-scale benchmark to evaluate how well LLMs can rank other AI systems, showing that reward models often match larger LLMs at this task. 📝arxiv.org/abs/2412.09569

fly51fly (@fly51fly) 's Twitter Profile Photo

[CL] JuStRank: Benchmarking LLM Judges for System Ranking A Gera, O Boni, Y Perlitz, R Bar-Haim... [IBM Research] (2024) arxiv.org/abs/2412.09569

[CL] JuStRank: Benchmarking LLM Judges for System Ranking
A Gera, O Boni, Y Perlitz, R Bar-Haim... [IBM Research] (2024)
arxiv.org/abs/2412.09569
Shir Ashury-Tahan (@shirashurytahan) 's Twitter Profile Photo

LLMs struggle with tables—but how robust are they really? 🔍 ToRR goes beyond accuracy, testing real-world robustness across formats & tasks. 📊 Different formats, same data—models show brittle behavior affecting rankings. Prompt configuration is a key dimension for evaluation!🚀

LLMs struggle with tables—but how robust are they really?
🔍 ToRR goes beyond accuracy, testing real-world robustness across formats & tasks.
📊 Different formats, same data—models show brittle behavior affecting rankings.
Prompt configuration is a key dimension for evaluation!🚀
Ariel Gera (@arielgera2) 's Twitter Profile Photo

How do LLMs cope with multi-constraint instructions from real users? Not too well, it turns out... So lots of room for improvement! 🦾 Great internship work by Gili Lior 🌟

Ariel Gera (@arielgera2) 's Twitter Profile Photo

Can LLMs judge debate speeches? 🤖 And how do the LLMaaJ judgments differ from human annotations? 👨‍⚖️ Great new work by Noy Sternlicht 🌟

Noy Sternlicht (@noysternlicht) 's Twitter Profile Photo

🎉 Proud to share that "Debatable Intelligence" has now been accepted to #EMNLP2025 (Main Conference)! noy-sternlicht.github.io/Debatable-Inte… Huge thenks to my amazing collaborators Ariel Gera, Roy Bar Haim, Tom Hope, Noam Slonim 🟢

Ramon Astudillo (@ramonastudill12) 's Twitter Profile Photo

The Generative Model Alignment team at IBM Research is looking for next summer interns! Two candidates for two topics 🍰Reinforcement Learning environments for LLMs 🐎Speculative and non-auto regressive generation for LLMs interested/curious? DM / email [email protected]

Ariel Gera (@arielgera2) 's Twitter Profile Photo

Why I really enjoyed this project: It combines a lot: multimodality + hybrid retrieval + test-time optimization 🤯 At the same time, it is actually quite simple 💡 and helps to achieve more (retrieval quality) with less (compute resources) 🦾 plus Omri Uzan is pretty great