Moran Mizrahi (@moranmiz) Twitter Tweets • TwiCopy

Gili Lior

9 months ago

"Summarize this text" out ❌ "Provide a 50-word summary, explaining it to a 5-year-old" in ✅ The way we use LLMs has changed—user instructions are now longer, more nuanced, and packed with constraints. Interested in how LLMs keep up? 🤔 Check out WildIFEval, our new benchmark!

thumb_up_off_alt59

chat_bubble_outline1

repeat18

shareShare

Hyadata Lab (Dafna Shahaf)

@hyadatalab

9 months ago

Happy to announce that we are starting a new workshop on Computational Analogy, called Principia Analogiae (Yeah, the name is a bit megalomaniac. I like it this way. :) ).

thumb_up_off_alt10

chat_bubble_outline1

repeat5

shareShare

Guy Kaplan ✈️🇸🇬 ICLR2025

@gkaplan38844

8 months ago

✨ Ever tried generating an image from a prompt but ended up with unexpected outputs? Check out our new paper #FollowTheFlow - tackling T2I issues like bias, failed binding, and leakage from the textual encoding side! 💼🔍 arxiv.org/pdf/2504.01137 guykap12.github.io/guykap12.githu… 🧵[1/7]

thumb_up_off_alt58

chat_bubble_outline1

repeat17

shareShare

Asaf Yehudai

@asafyehudai

8 months ago

Interested in Agent Evaluation? 🤖 We’re excited to launch our new repo: “Evaluation of LLM-based Agents: A Reading List” 📚 Browse benchmarks, methods, and frameworks from our recent survey. 👉 Explore & Contribute: github.com/Asaf-Yehudai/L… #LLMAgents #AgentEvaluation

thumb_up_off_alt85

chat_bubble_outline4

repeat23

shareShare

Eitan Wagner

@eitanwagner

8 months ago

- “I flipped a biased coin with p(Heads) = 0.55.” - “What did it land on?” What is the probability of the answer being “Heads”? Does it depend on whether the outcome is seen? Should we expect it to be 0.55? Check out our new paper! arxiv.org/abs/2505.02072 w/ Omri Abend (1/10)

thumb_up_off_alt31

chat_bubble_outline1

repeat10

shareShare

Eliya Habba

@eliyahabba

7 months ago

🎉 Our paper DOVE 🕊️ has been accepted to #ACL2025 Findings! DOVE 🕊️ is a massive collection (250M!) of LLM outputs across different prompts, domains, and models, aimed at democratizing LLM evaluation research! Thanks to all collaborators! Paper: slab-nlp.github.io/DOVE/

thumb_up_off_alt61

chat_bubble_outline3

repeat23

shareShare

Dan Ofer (Was @ICML,@Worldcon )

@danofer

7 months ago

Our newest paper is out! InterFeat: An Automated Pipeline for Finding Interesting Hypotheses in Structured Biomedical Data tl;dr: It's an automated method using AI/LLMs to find interesting features in data. Doctors reviewed them on different diseases and the UK BioBank.

thumb_up_off_alt13

chat_bubble_outline1

repeat3

shareShare

Oren Sultan

@oren_sultan

7 months ago

🚀 I'm excited to share that our latest research titled: “Toward Reliable Proof Generation with LLMs: Leveraging Analogical Guidance and Symbolic Verification” is now available on ArXiv 📄 arxiv.org/pdf/2505.14479 w/ Eitan Stern Hyadata Lab (Dafna Shahaf)

thumb_up_off_alt49

chat_bubble_outline4

repeat16

shareShare

Michael Hassid

@michaelhassid

7 months ago

The longer reasoning LLM thinks - the more likely to be correct, right? Apparently not. Presenting our paper: “Don’t Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning”. Link: arxiv.org/abs/2505.17813 1/n

thumb_up_off_alt104

chat_bubble_outline5

repeat34

shareShare

Dana Arad 🎗️

@dana_arad4

7 months ago

Tried steering with SAEs and found that not all features behave as expected? Check out our new preprint - "SAEs Are Good for Steering - If You Select the Right Features" 🧵

thumb_up_off_alt166

chat_bubble_outline7

repeat32

shareShare

Noy Sternlicht

@noysternlicht

7 months ago

🚨 New paper! We present CHIMERA — a KB of 28K+ scientific idea recombinations 💡 It captures how researchers blend concepts or take inspiration across fields, enabling: 1. Meta-science 2. Training models to predict new combos noy-sternlicht.github.io/CHIMERA-Web 👇 Findings & data:

thumb_up_off_alt57

chat_bubble_outline4

repeat22

shareShare

Iddo Yosha

@iddoyosha

7 months ago

1/5 🚨 New paper alert! StressTest: Can YOUR Speech LM Handle the Stress? Sentence stress = emphasis on words to signal intent, contrast, or new info. We built StressTest — a benchmark for testing stress reasoning.🗣️💬 Then, meet StresSLM who finally gets it! Insights & Links 👇

thumb_up_off_alt49

chat_bubble_outline3

repeat14

shareShare

Niv Eckhaus

@niveckhaus

6 months ago

🚨 New Paper: "Time to Talk"! 🕵️ We built an LLM agent that doesn't just decide WHAT to say, but also WHEN to say it! Introducing "Time to Talk" - LLM agents for asynchronous group communication, tested in real Mafia games with human players. 🌐niveck.github.io/Time-to-Talk 🧵1/7

thumb_up_off_alt51

chat_bubble_outline2

repeat12

shareShare

Asaf Yehudai

@asafyehudai

5 months ago

🚨 Benchmarks tell us which model is better — but not why it fails. For developers, this means tedious, manual error analysis. We're bridging that gap. Meet CLEAR: an open-source tool for actionable error analysis of LLMs. 🧵👇

thumb_up_off_alt41

chat_bubble_outline1

repeat13

shareShare

David Dinkevich

@daviddinkevich

4 months ago

[1/6] 🎬 New paper: Story2Board We guide diffusion models to generate consistent, expressive storyboards--no training needed. By mixing attention-aligned tokens across panels, we reinforce character identity without hurting layout diversity. 🌐 daviddinkevich.github.io/Story2Board

thumb_up_off_alt26

chat_bubble_outline5

repeat8

shareShare

Moran Mizrahi

@moranmiz

4 months ago

כבר שנים מסתובבת בפריז ורואה פסיפסים קטנים משמחים על קירות בניינים. אתמול בסיור עם נדב גיליתי שהם לא חוקיים, שיש אמן מסתורי שאחראי ליותר מ1500 כאלה ברחבי העיר (!), ושאפילו יש אפליקציה שמאפשרת לצלם אותם ולצבור נקודות! מי מצטרף למרדף חייזרים? 👾👽🛸 NadavBas 🇪🇺🇫🇷 🎗️ Efrat Frid

thumb_up_off_alt14

chat_bubble_outline3

repeat2

shareShare

Yosef Dayani

@yosefday

4 months ago

[1/10] 🤔 What if you wanted to generate a 3D model of a “Bolognese dog” 🐕 or a “Labubu doll” 🧸? Try it with existing text-to-3D models → they collapse. Why? These concepts are rare or new, and the model has never seen them. 🚀 Our solution: MV-RAG See details below ⬇️

thumb_up_off_alt22

chat_bubble_outline1

repeat8

shareShare

Noam Dahan

@dahan_noam

4 months ago

Old news: Single-prompt eval is unreliable🤯 New news: PromptSuite🌈 - an easy way to augment your benchmark with thousands of paraphrases ➡️ robust eval, zero sweat! - Works on any dataset! - Python API + web UI Eliya Habba, Gili Lior, Gabriel Stanovsky eliyahabba.github.io/PromptSuite/

thumb_up_off_alt58

chat_bubble_outline2

repeat14

shareShare

Noy Sternlicht

@noysternlicht

3 months ago

🎉 Proud to share that "Debatable Intelligence" has now been accepted to #EMNLP2025 (Main Conference)! noy-sternlicht.github.io/Debatable-Inte… Huge thenks to my amazing collaborators Ariel Gera, Roy Bar Haim, Tom Hope, Noam Slonim 🟢

thumb_up_off_alt48

chat_bubble_outline2

repeat13

shareShare

Matan Levy

@matanlvy

3 months ago

🤖 AI for finding a needle in a haystack: 🚀 We're excited to share our #NeurIPS2025 paper: "Find your Needle: Small Object Image Retrieval via Multi-Object Attention Optimization".

thumb_up_off_alt19

chat_bubble_outline1

repeat7

shareShare