Nir Mazor (@nirmmazor) Twitter Tweets • TwiCopy

Noam Dahan

a year ago

Look at the CRAZY domain gap we found in summarization datasets: while English resources are diverse, other languages are mostly restricted to news. Presenting our survey following 130+ datasets in 100+ languages! Explore: github.com/edahanoam/Awes… Gabriel Stanovsky, HUJI NLP 1/6

thumb_up_off_alt42

chat_bubble_outline4

repeat15

shareShare

Guy Kaplan ✈️🇸🇬 ICLR2025

@gkaplan38844

6 months ago

✨ Ever tried generating an image from a prompt but ended up with unexpected outputs? Check out our new paper #FollowTheFlow - tackling T2I issues like bias, failed binding, and leakage from the textual encoding side! 💼🔍 arxiv.org/pdf/2504.01137 guykap12.github.io/guykap12.githu… 🧵[1/7]

thumb_up_off_alt58

chat_bubble_outline1

repeat17

shareShare

HUJI NLP

@nlphuji

6 months ago

That’s a wrap on our first Huji NLP Hackathon! Congrats to the winning team! Noy Sternlicht, Niv Eckhaus, Nir Mazor, Noam Bensason They explored gender bias in AI-generated movie scripts using the Bechdel Test — and yep, you can guess the results...

That’s a wrap on our first Huji NLP Hackathon!
Congrats to the winning team!
<a href="/NoySternlicht/">Noy Sternlicht</a>, <a href="/niveckhaus/">Niv Eckhaus</a>, <a href="/NirMazor/">Nir Mazor</a>, <a href="/NoamBenSason/">Noam Bensason</a>

They explored gender bias in AI-generated movie scripts using the Bechdel Test — and yep, you can guess the results...

thumb_up_off_alt21

chat_bubble_outline1

repeat5

shareShare

Eliahu Horwitz | @ ICLR2025

@eliahuhorwitz

6 months ago

Our work maps hereditary relationships between models. We find that weights🏋️ are sufficient for decoding the Origin of Models🌳 Presenting today at ICLR 2026, 15:00–17:30, Hall 3, poster #360. Come by to see our method, visualizations, and interactive demo atlas👀🚀 #ICLR2025

thumb_up_off_alt67

chat_bubble_outline0

repeat10

shareShare

Eliahu Horwitz | @ ICLR2025

@eliahuhorwitz

6 months ago

What if models could be the data🤔Find out at ICLR 2026 #ICLR2025 Join the 1st workshop on Model Weights as a New Data Modality. We're training networks on model weights for a wide variety of tasks. Featuring an amazing lineup of papers & speakers🚀 🗓️Sunday 9-17 📍Topaz 220-225

What if models could be the data🤔Find out at <a href="/iclr_conf/">ICLR 2026</a> #ICLR2025
Join the 1st workshop on Model Weights as a New Data Modality. We're training networks on model weights for a wide variety of tasks. Featuring an amazing lineup of papers & speakers🚀
🗓️Sunday 9-17
📍Topaz 220-225

thumb_up_off_alt80

chat_bubble_outline2

repeat15

shareShare

Anubhav Jain

@anubhavj480

5 months ago

Think your latent-noise diffusion watermarking method is robust? Think again! We show that they are susceptible to adversarial attacks that only require one watermarked example and an off-the-shelf encoder. This attack can forge and remove the watermark with very high accuracy

thumb_up_off_alt31

chat_bubble_outline1

repeat10

shareShare

Asaf Yehudai

@asafyehudai

5 months ago

Interested in Agent Evaluation? 🤖 We’re excited to launch our new repo: “Evaluation of LLM-based Agents: A Reading List” 📚 Browse benchmarks, methods, and frameworks from our recent survey. 👉 Explore & Contribute: github.com/Asaf-Yehudai/L… #LLMAgents #AgentEvaluation

thumb_up_off_alt85

chat_bubble_outline4

repeat23

shareShare

Eitan Wagner

@eitanwagner

5 months ago

- “I flipped a biased coin with p(Heads) = 0.55.” - “What did it land on?” What is the probability of the answer being “Heads”? Does it depend on whether the outcome is seen? Should we expect it to be 0.55? Check out our new paper! arxiv.org/abs/2505.02072 w/ Omri Abend (1/10)

thumb_up_off_alt31

chat_bubble_outline1

repeat10

shareShare

Eliya Habba

@eliyahabba

5 months ago

🎉 Our paper DOVE 🕊️ has been accepted to #ACL2025 Findings! DOVE 🕊️ is a massive collection (250M!) of LLM outputs across different prompts, domains, and models, aimed at democratizing LLM evaluation research! Thanks to all collaborators! Paper: slab-nlp.github.io/DOVE/

thumb_up_off_alt61

chat_bubble_outline3

repeat23

shareShare

Oren Sultan

@oren_sultan

5 months ago

🚀 I'm excited to share that our latest research titled: “Toward Reliable Proof Generation with LLMs: Leveraging Analogical Guidance and Symbolic Verification” is now available on ArXiv 📄 arxiv.org/pdf/2505.14479 w/ Eitan Stern Hyadata Lab (Dafna Shahaf)

thumb_up_off_alt49

chat_bubble_outline4

repeat16

shareShare

Michael Hassid

@michaelhassid

5 months ago

The longer reasoning LLM thinks - the more likely to be correct, right? Apparently not. Presenting our paper: “Don’t Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning”. Link: arxiv.org/abs/2505.17813 1/n

thumb_up_off_alt104

chat_bubble_outline5

repeat34

shareShare

Noy Sternlicht

@noysternlicht

4 months ago

🚨 New paper! We present CHIMERA — a KB of 28K+ scientific idea recombinations 💡 It captures how researchers blend concepts or take inspiration across fields, enabling: 1. Meta-science 2. Training models to predict new combos noy-sternlicht.github.io/CHIMERA-Web 👇 Findings & data:

thumb_up_off_alt57

chat_bubble_outline4

repeat22

shareShare

Noy Sternlicht

@noysternlicht

4 months ago

🔔 New Paper! We propose a challenging new benchmark for LLM judges: Evaluating debate speeches. Are they comparable to humans? Well... it’s debatable. 🤔 noy-sternlicht.github.io/Debatable-Inte… 👇 Here are our findings:

thumb_up_off_alt46

chat_bubble_outline3

repeat15

shareShare

Niv Eckhaus

@niveckhaus

4 months ago

🚨 New Paper: "Time to Talk"! 🕵️ We built an LLM agent that doesn't just decide WHAT to say, but also WHEN to say it! Introducing "Time to Talk" - LLM agents for asynchronous group communication, tested in real Mafia games with human players. 🌐niveck.github.io/Time-to-Talk 🧵1/7

thumb_up_off_alt51

chat_bubble_outline2

repeat12

shareShare

Eliahu Horwitz | @ ICLR2025

@eliahuhorwitz

4 months ago

Andrej Karpathy Thanks for the inspiring talk (as always!). I'm the author of the Model Atlas. I'm delighted you liked our work, seeing the figure in your slides felt like an "achievement unlocked"🙌Would really appreciate a link to our work in your slides/tweet arxiv.org/abs/2503.10633

<a href="/karpathy/">Andrej Karpathy</a> Thanks for the inspiring talk (as always!). I'm the author of the Model Atlas. I'm delighted you liked our work, seeing the figure in your slides felt like an "achievement unlocked"🙌Would really appreciate a link to our work in your slides/tweet arxiv.org/abs/2503.10633

thumb_up_off_alt47

chat_bubble_outline1

repeat9

shareShare

Esther Shizgal

@esthershizgal

3 months ago

🇵🇹 Spoke at #DH2025 about Religious Journeys in Holocaust Testimonies (arXiv link in thread) 🐟 Connecting with researchers using novel computational tools on real-world challenges in the humanities was inspiring! 🏰 Excited to build on these interdisciplinary methods!

thumb_up_off_alt18

chat_bubble_outline1

repeat7

shareShare

Eliya Habba

@eliyahabba

3 months ago

Presenting my poster : 🕊️ DOVE - A large-scale multi-dimensional predictions dataset towards meaningful LLM evaluation, Monday 18:00 Vienna, #ACL2025 Come chat about LLM evaluation, prompt sensitivity, and our 250M COLLECTION OF MODEL OUTPUTS!

thumb_up_off_alt46

chat_bubble_outline2

repeat11

shareShare

Asaf Yehudai

@asafyehudai

3 months ago

🚨 Benchmarks tell us which model is better — but not why it fails. For developers, this means tedious, manual error analysis. We're bridging that gap. Meet CLEAR: an open-source tool for actionable error analysis of LLMs. 🧵👇

thumb_up_off_alt41

chat_bubble_outline1

repeat13

shareShare

Noam Dahan

@dahan_noam

a month ago

Old news: Single-prompt eval is unreliable🤯 New news: PromptSuite🌈 - an easy way to augment your benchmark with thousands of paraphrases ➡️ robust eval, zero sweat! - Works on any dataset! - Python API + web UI Eliya Habba, Gili Lior, Gabriel Stanovsky eliyahabba.github.io/PromptSuite/

thumb_up_off_alt58

chat_bubble_outline2

repeat14

shareShare

Noy Sternlicht

@noysternlicht

a month ago

🎉 Proud to share that "Debatable Intelligence" has now been accepted to #EMNLP2025 (Main Conference)! noy-sternlicht.github.io/Debatable-Inte… Huge thenks to my amazing collaborators Ariel Gera, Roy Bar Haim, Tom Hope, Noam Slonim 🟢

thumb_up_off_alt48

chat_bubble_outline2

repeat13

shareShare