Nir Mazor (@nirmmazor) 's Twitter Profile
Nir Mazor

@nirmmazor

ID: 1897588396723253248

calendar_today06-03-2025 10:01:53

12 Tweet

11 Followers

40 Following

Noam Dahan (@dahan_noam) 's Twitter Profile Photo

Look at the CRAZY domain gap we found in summarization datasets: while English resources are diverse, other languages are mostly restricted to news. Presenting our survey following 130+ datasets in 100+ languages! Explore: github.com/edahanoam/Awes… Gabriel Stanovsky, HUJI NLP 1/6

Look at the CRAZY domain gap we found in summarization datasets: while English resources are diverse, other languages are mostly restricted to news.

Presenting our survey following 130+ datasets in 100+ languages!

Explore: github.com/edahanoam/Awes…

<a href="/GabiStanovsky/">Gabriel Stanovsky</a>, <a href="/nlphuji/">HUJI NLP</a>
1/6
Guy Kaplan ✈️🇸🇬 ICLR2025 (@gkaplan38844) 's Twitter Profile Photo

✨ Ever tried generating an image from a prompt but ended up with unexpected outputs? Check out our new paper #FollowTheFlow - tackling T2I issues like bias, failed binding, and leakage from the textual encoding side! 💼🔍 arxiv.org/pdf/2504.01137 guykap12.github.io/guykap12.githu… 🧵[1/7]

HUJI NLP (@nlphuji) 's Twitter Profile Photo

That’s a wrap on our first Huji NLP Hackathon! Congrats to the winning team! Noy Sternlicht, Niv Eckhaus, Nir Mazor, Noam Bensason They explored gender bias in AI-generated movie scripts using the Bechdel Test — and yep, you can guess the results...

That’s a wrap on our first Huji NLP Hackathon!
Congrats to the winning team!
<a href="/NoySternlicht/">Noy Sternlicht</a>, <a href="/niveckhaus/">Niv Eckhaus</a>, <a href="/NirMazor/">Nir Mazor</a>, <a href="/NoamBenSason/">Noam Bensason</a>

They explored gender bias in AI-generated movie scripts using the Bechdel Test — and yep, you can guess the results...
Eliahu Horwitz | @ ICLR2025 (@eliahuhorwitz) 's Twitter Profile Photo

Our work maps hereditary relationships between models. We find that weights🏋️ are sufficient for decoding the Origin of Models🌳 Presenting today at ICLR 2026, 15:00–17:30, Hall 3, poster #360. Come by to see our method, visualizations, and interactive demo atlas👀🚀 #ICLR2025

Our work maps hereditary relationships between models. We find that weights🏋️ are sufficient for decoding the Origin of Models🌳

Presenting today at <a href="/iclr_conf/">ICLR 2026</a>, 15:00–17:30, Hall 3, poster #360.

Come by to see our method, visualizations, and interactive demo atlas👀🚀 #ICLR2025
Eliahu Horwitz | @ ICLR2025 (@eliahuhorwitz) 's Twitter Profile Photo

What if models could be the data🤔Find out at ICLR 2026 #ICLR2025 Join the 1st workshop on Model Weights as a New Data Modality. We're training networks on model weights for a wide variety of tasks. Featuring an amazing lineup of papers & speakers🚀 🗓️Sunday 9-17 📍Topaz 220-225

What if models could be the data🤔Find out at <a href="/iclr_conf/">ICLR 2026</a> #ICLR2025
Join the 1st workshop on Model Weights as a New Data Modality. We're training networks on model weights for a wide variety of tasks. Featuring an amazing lineup of papers &amp; speakers🚀
🗓️Sunday 9-17
📍Topaz 220-225
Anubhav Jain (@anubhavj480) 's Twitter Profile Photo

Think your latent-noise diffusion watermarking method is robust? Think again! We show that they are susceptible to adversarial attacks that only require one watermarked example and an off-the-shelf encoder. This attack can forge and remove the watermark with very high accuracy

Think your latent-noise diffusion watermarking method is robust? Think again! 

We show that they are susceptible to adversarial attacks that only require one watermarked example and an off-the-shelf encoder. This attack can forge and remove the watermark with very high accuracy
Asaf Yehudai (@asafyehudai) 's Twitter Profile Photo

Interested in Agent Evaluation? 🤖 We’re excited to launch our new repo: “Evaluation of LLM-based Agents: A Reading List” 📚 Browse benchmarks, methods, and frameworks from our recent survey. 👉 Explore & Contribute: github.com/Asaf-Yehudai/L… #LLMAgents #AgentEvaluation

Eitan Wagner (@eitanwagner) 's Twitter Profile Photo

- “I flipped a biased coin with p(Heads) = 0.55.” - “What did it land on?” What is the probability of the answer being “Heads”? Does it depend on whether the outcome is seen? Should we expect it to be 0.55? Check out our new paper! arxiv.org/abs/2505.02072 w/ Omri Abend (1/10)

Eliya Habba (@eliyahabba) 's Twitter Profile Photo

🎉 Our paper DOVE 🕊️ has been accepted to #ACL2025 Findings! DOVE 🕊️ is a massive collection (250M!) of LLM outputs across different prompts, domains, and models, aimed at democratizing LLM evaluation research! Thanks to all collaborators! Paper: slab-nlp.github.io/DOVE/

Oren Sultan (@oren_sultan) 's Twitter Profile Photo

🚀 I'm excited to share that our latest research titled: “Toward Reliable Proof Generation with LLMs: Leveraging Analogical Guidance and Symbolic Verification” is now available on ArXiv 📄 arxiv.org/pdf/2505.14479 w/ Eitan Stern Hyadata Lab (Dafna Shahaf)

🚀 I'm excited to share that our latest research titled:
“Toward Reliable Proof Generation with LLMs: Leveraging Analogical Guidance and Symbolic Verification” is now available on ArXiv 📄 
arxiv.org/pdf/2505.14479
w/ <a href="/StrnYtn/">Eitan Stern</a>  <a href="/HyadataLab/">Hyadata Lab (Dafna Shahaf)</a>
Michael Hassid (@michaelhassid) 's Twitter Profile Photo

The longer reasoning LLM thinks - the more likely to be correct, right? Apparently not. Presenting our paper: “Don’t Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning”. Link: arxiv.org/abs/2505.17813 1/n

The longer reasoning LLM thinks - the more likely to be correct, right?

Apparently not.

Presenting our paper: “Don’t Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning”.

Link: arxiv.org/abs/2505.17813

1/n
Noy Sternlicht (@noysternlicht) 's Twitter Profile Photo

🚨 New paper! We present CHIMERA — a KB of 28K+ scientific idea recombinations 💡 It captures how researchers blend concepts or take inspiration across fields, enabling: 1. Meta-science 2. Training models to predict new combos noy-sternlicht.github.io/CHIMERA-Web 👇 Findings & data:

Noy Sternlicht (@noysternlicht) 's Twitter Profile Photo

🔔 New Paper! We propose a challenging new benchmark for LLM judges: Evaluating debate speeches. Are they comparable to humans? Well... it’s debatable. 🤔 noy-sternlicht.github.io/Debatable-Inte… 👇 Here are our findings:

Niv Eckhaus (@niveckhaus) 's Twitter Profile Photo

🚨 New Paper: "Time to Talk"! 🕵️ We built an LLM agent that doesn't just decide WHAT to say, but also WHEN to say it! Introducing "Time to Talk" - LLM agents for asynchronous group communication, tested in real Mafia games with human players. 🌐niveck.github.io/Time-to-Talk 🧵1/7

Eliahu Horwitz | @ ICLR2025 (@eliahuhorwitz) 's Twitter Profile Photo

Andrej Karpathy Thanks for the inspiring talk (as always!). I'm the author of the Model Atlas. I'm delighted you liked our work, seeing the figure in your slides felt like an "achievement unlocked"🙌Would really appreciate a link to our work in your slides/tweet arxiv.org/abs/2503.10633

<a href="/karpathy/">Andrej Karpathy</a> Thanks for the inspiring talk (as always!). I'm the author of the Model Atlas. I'm delighted you liked our work, seeing the figure in your slides felt like an "achievement unlocked"🙌Would really appreciate a link to our work in your slides/tweet arxiv.org/abs/2503.10633
Esther Shizgal (@esthershizgal) 's Twitter Profile Photo

🇵🇹 Spoke at #DH2025 about Religious Journeys in Holocaust Testimonies (arXiv link in thread) 🐟 Connecting with researchers using novel computational tools on real-world challenges in the humanities was inspiring! 🏰 Excited to build on these interdisciplinary methods!

🇵🇹 Spoke at #DH2025 about Religious Journeys in Holocaust Testimonies (arXiv link in thread)

🐟 Connecting with researchers using novel computational tools on real-world challenges in the humanities was inspiring!

🏰 Excited to build on these interdisciplinary methods!
Eliya Habba (@eliyahabba) 's Twitter Profile Photo

Presenting my poster : 🕊️ DOVE - A large-scale multi-dimensional predictions dataset towards meaningful LLM evaluation, Monday 18:00 Vienna, #ACL2025 Come chat about LLM evaluation, prompt sensitivity, and our 250M COLLECTION OF MODEL OUTPUTS!

Presenting my poster :
🕊️ DOVE - A large-scale multi-dimensional predictions dataset towards meaningful LLM evaluation, Monday 18:00 Vienna, 
#ACL2025

Come chat about LLM evaluation, prompt sensitivity, and our 250M COLLECTION OF MODEL OUTPUTS!
Asaf Yehudai (@asafyehudai) 's Twitter Profile Photo

🚨 Benchmarks tell us which model is better — but not why it fails. For developers, this means tedious, manual error analysis. We're bridging that gap. Meet CLEAR: an open-source tool for actionable error analysis of LLMs. 🧵👇

🚨 Benchmarks tell us which model is better — but not why it fails.

For developers, this means tedious, manual error analysis. We're bridging that gap.

Meet CLEAR: an open-source tool for actionable error analysis of LLMs.

🧵👇
Noam Dahan (@dahan_noam) 's Twitter Profile Photo

Old news: Single-prompt eval is unreliable🤯 New news: PromptSuite🌈 - an easy way to augment your benchmark with thousands of paraphrases ➡️ robust eval, zero sweat! - Works on any dataset! - Python API + web UI Eliya Habba, Gili Lior, Gabriel Stanovsky eliyahabba.github.io/PromptSuite/

Noy Sternlicht (@noysternlicht) 's Twitter Profile Photo

🎉 Proud to share that "Debatable Intelligence" has now been accepted to #EMNLP2025 (Main Conference)! noy-sternlicht.github.io/Debatable-Inte… Huge thenks to my amazing collaborators Ariel Gera, Roy Bar Haim, Tom Hope, Noam Slonim 🟢