Moran Mizrahi (@moranmiz) 's Twitter Profile
Moran Mizrahi

@moranmiz

PhD student at @Csehuji (@HyadataLab). Interested in Natural Language Processing, Data Science, Human-Computer Interaction and Computational Creativity.

ID: 56381337

calendar_today13-07-2009 14:10:55

154 Tweet

309 Followers

264 Following

Gili Lior (@gililior) 's Twitter Profile Photo

"Summarize this text" out โŒ "Provide a 50-word summary, explaining it to a 5-year-old" in โœ… The way we use LLMs has changedโ€”user instructions are now longer, more nuanced, and packed with constraints. Interested in how LLMs keep up? ๐Ÿค” Check out WildIFEval, our new benchmark!

Hyadata Lab (Dafna Shahaf) (@hyadatalab) 's Twitter Profile Photo

Happy to announce that we are starting a new workshop on Computational Analogy, called Principia Analogiae (Yeah, the name is a bit megalomaniac. I like it this way. :) ).

Guy Kaplan โœˆ๏ธ๐Ÿ‡ธ๐Ÿ‡ฌ ICLR2025 (@gkaplan38844) 's Twitter Profile Photo

โœจ Ever tried generating an image from a prompt but ended up with unexpected outputs? Check out our new paper #FollowTheFlow - tackling T2I issues like bias, failed binding, and leakage from the textual encoding side! ๐Ÿ’ผ๐Ÿ” arxiv.org/pdf/2504.01137 guykap12.github.io/guykap12.githuโ€ฆ ๐Ÿงต[1/7]

Asaf Yehudai (@asafyehudai) 's Twitter Profile Photo

Interested in Agent Evaluation? ๐Ÿค– Weโ€™re excited to launch our new repo: โ€œEvaluation of LLM-based Agents: A Reading Listโ€ ๐Ÿ“š Browse benchmarks, methods, and frameworks from our recent survey. ๐Ÿ‘‰ Explore & Contribute: github.com/Asaf-Yehudai/Lโ€ฆ #LLMAgents #AgentEvaluation

Eitan Wagner (@eitanwagner) 's Twitter Profile Photo

- โ€œI flipped a biased coin with p(Heads) = 0.55.โ€ - โ€œWhat did it land on?โ€ What is the probability of the answer being โ€œHeadsโ€? Does it depend on whether the outcome is seen? Should we expect it to be 0.55? Check out our new paper! arxiv.org/abs/2505.02072 w/ Omri Abend (1/10)

Eliya Habba (@eliyahabba) 's Twitter Profile Photo

๐ŸŽ‰ Our paper DOVE ๐Ÿ•Š๏ธ has been accepted to #ACL2025 Findings! DOVE ๐Ÿ•Š๏ธ is a massive collection (250M!) of LLM outputs across different prompts, domains, and models, aimed at democratizing LLM evaluation research! Thanks to all collaborators! Paper: slab-nlp.github.io/DOVE/

Dan Ofer (Was @ICML,@Worldcon ) (@danofer) 's Twitter Profile Photo

Our newest paper is out! InterFeat: An Automated Pipeline for Finding Interesting Hypotheses in Structured Biomedical Data tl;dr: It's an automated method using AI/LLMs to find interesting features in data. Doctors reviewed them on different diseases and the UK BioBank.

Oren Sultan (@oren_sultan) 's Twitter Profile Photo

๐Ÿš€ I'm excited to share that our latest research titled: โ€œToward Reliable Proof Generation with LLMs: Leveraging Analogical Guidance and Symbolic Verificationโ€ is now available on ArXiv ๐Ÿ“„ arxiv.org/pdf/2505.14479 w/ Eitan Stern Hyadata Lab (Dafna Shahaf)

๐Ÿš€ I'm excited to share that our latest research titled:
โ€œToward Reliable Proof Generation with LLMs: Leveraging Analogical Guidance and Symbolic Verificationโ€ is now available on ArXiv ๐Ÿ“„ 
arxiv.org/pdf/2505.14479
w/ <a href="/StrnYtn/">Eitan Stern</a>  <a href="/HyadataLab/">Hyadata Lab (Dafna Shahaf)</a>
Michael Hassid (@michaelhassid) 's Twitter Profile Photo

The longer reasoning LLM thinks - the more likely to be correct, right? Apparently not. Presenting our paper: โ€œDonโ€™t Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoningโ€. Link: arxiv.org/abs/2505.17813 1/n

The longer reasoning LLM thinks - the more likely to be correct, right?

Apparently not.

Presenting our paper: โ€œDonโ€™t Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoningโ€.

Link: arxiv.org/abs/2505.17813

1/n
Dana Arad ๐ŸŽ—๏ธ (@dana_arad4) 's Twitter Profile Photo

Tried steering with SAEs and found that not all features behave as expected? Check out our new preprint - "SAEs Are Good for Steering - If You Select the Right Features" ๐Ÿงต

Tried steering with SAEs and found that not all features behave as expected?

Check out our new preprint - "SAEs Are Good for Steering - If You Select the Right Features"  ๐Ÿงต
Noy Sternlicht (@noysternlicht) 's Twitter Profile Photo

๐Ÿšจ New paper! We present CHIMERA โ€” a KB of 28K+ scientific idea recombinations ๐Ÿ’ก It captures how researchers blend concepts or take inspiration across fields, enabling: 1. Meta-science 2. Training models to predict new combos noy-sternlicht.github.io/CHIMERA-Web ๐Ÿ‘‡ Findings & data:

Iddo Yosha (@iddoyosha) 's Twitter Profile Photo

1/5 ๐Ÿšจ New paper alert! StressTest: Can YOUR Speech LM Handle the Stress? Sentence stress = emphasis on words to signal intent, contrast, or new info. We built StressTest โ€” a benchmark for testing stress reasoning.๐Ÿ—ฃ๏ธ๐Ÿ’ฌ Then, meet StresSLM who finally gets it! Insights & Links ๐Ÿ‘‡

Niv Eckhaus (@niveckhaus) 's Twitter Profile Photo

๐Ÿšจ New Paper: "Time to Talk"! ๐Ÿ•ต๏ธ We built an LLM agent that doesn't just decide WHAT to say, but also WHEN to say it! Introducing "Time to Talk" - LLM agents for asynchronous group communication, tested in real Mafia games with human players. ๐ŸŒniveck.github.io/Time-to-Talk ๐Ÿงต1/7

Asaf Yehudai (@asafyehudai) 's Twitter Profile Photo

๐Ÿšจ Benchmarks tell us which model is better โ€” but not why it fails. For developers, this means tedious, manual error analysis. We're bridging that gap. Meet CLEAR: an open-source tool for actionable error analysis of LLMs. ๐Ÿงต๐Ÿ‘‡

๐Ÿšจ Benchmarks tell us which model is better โ€” but not why it fails.

For developers, this means tedious, manual error analysis. We're bridging that gap.

Meet CLEAR: an open-source tool for actionable error analysis of LLMs.

๐Ÿงต๐Ÿ‘‡
David Dinkevich (@daviddinkevich) 's Twitter Profile Photo

[1/6] ๐ŸŽฌ New paper: Story2Board We guide diffusion models to generate consistent, expressive storyboards--no training needed. By mixing attention-aligned tokens across panels, we reinforce character identity without hurting layout diversity. ๐ŸŒ daviddinkevich.github.io/Story2Board

[1/6] ๐ŸŽฌ New paper: Story2Board
We guide diffusion models to generate consistent, expressive storyboards--no training needed.
By mixing attention-aligned tokens across panels, we reinforce character identity without hurting layout diversity.
๐ŸŒ daviddinkevich.github.io/Story2Board
Moran Mizrahi (@moranmiz) 's Twitter Profile Photo

ื›ื‘ืจ ืฉื ื™ื ืžืกืชื•ื‘ื‘ืช ื‘ืคืจื™ื– ื•ืจื•ืื” ืคืกื™ืคืกื™ื ืงื˜ื ื™ื ืžืฉืžื—ื™ื ืขืœ ืงื™ืจื•ืช ื‘ื ื™ื™ื ื™ื. ืืชืžื•ืœ ื‘ืกื™ื•ืจ ืขื ื ื“ื‘ ื’ื™ืœื™ืชื™ ืฉื”ื ืœื ื—ื•ืงื™ื™ื, ืฉื™ืฉ ืืžืŸ ืžืกืชื•ืจื™ ืฉืื—ืจืื™ ืœื™ื•ืชืจ ืž1500 ื›ืืœื” ื‘ืจื—ื‘ื™ ื”ืขื™ืจ (!), ื•ืฉืืคื™ืœื• ื™ืฉ ืืคืœื™ืงืฆื™ื” ืฉืžืืคืฉืจืช ืœืฆืœื ืื•ืชื ื•ืœืฆื‘ื•ืจ ื ืงื•ื“ื•ืช! ืžื™ ืžืฆื˜ืจืฃ ืœืžืจื“ืฃ ื—ื™ื™ื–ืจื™ื? ๐Ÿ‘พ๐Ÿ‘ฝ๐Ÿ›ธ NadavBas ๐Ÿ‡ช๐Ÿ‡บ๐Ÿ‡ซ๐Ÿ‡ท ๐ŸŽ—๏ธ Efrat Frid

ื›ื‘ืจ ืฉื ื™ื ืžืกืชื•ื‘ื‘ืช ื‘ืคืจื™ื– ื•ืจื•ืื” ืคืกื™ืคืกื™ื ืงื˜ื ื™ื ืžืฉืžื—ื™ื ืขืœ ืงื™ืจื•ืช ื‘ื ื™ื™ื ื™ื. 

ืืชืžื•ืœ ื‘ืกื™ื•ืจ ืขื ื ื“ื‘ ื’ื™ืœื™ืชื™ ืฉื”ื ืœื ื—ื•ืงื™ื™ื, ืฉื™ืฉ ืืžืŸ ืžืกืชื•ืจื™ ืฉืื—ืจืื™ ืœื™ื•ืชืจ ืž1500 ื›ืืœื” ื‘ืจื—ื‘ื™ ื”ืขื™ืจ (!), ื•ืฉืืคื™ืœื• ื™ืฉ ืืคืœื™ืงืฆื™ื” ืฉืžืืคืฉืจืช ืœืฆืœื ืื•ืชื ื•ืœืฆื‘ื•ืจ ื ืงื•ื“ื•ืช!

ืžื™ ืžืฆื˜ืจืฃ ืœืžืจื“ืฃ ื—ื™ื™ื–ืจื™ื? ๐Ÿ‘พ๐Ÿ‘ฝ๐Ÿ›ธ

<a href="/nadav_bas/">NadavBas ๐Ÿ‡ช๐Ÿ‡บ๐Ÿ‡ซ๐Ÿ‡ท</a> <a href="/efratfrid/">๐ŸŽ—๏ธ Efrat Frid</a>
Yosef Dayani (@yosefday) 's Twitter Profile Photo

[1/10] ๐Ÿค” What if you wanted to generate a 3D model of a โ€œBolognese dogโ€ ๐Ÿ• or a โ€œLabubu dollโ€ ๐Ÿงธ? Try it with existing text-to-3D models โ†’ they collapse. Why? These concepts are rare or new, and the model has never seen them. ๐Ÿš€ Our solution: MV-RAG See details below โฌ‡๏ธ

[1/10] ๐Ÿค” What if you wanted to generate a 3D model of a โ€œBolognese dogโ€ ๐Ÿ• or a โ€œLabubu dollโ€ ๐Ÿงธ?
 Try it with existing text-to-3D models โ†’ they collapse.
 Why? These concepts are rare or new, and the model has never seen them.

๐Ÿš€ Our solution: MV-RAG

See details below โฌ‡๏ธ
Noam Dahan (@dahan_noam) 's Twitter Profile Photo

Old news: Single-prompt eval is unreliable๐Ÿคฏ New news: PromptSuite๐ŸŒˆ - an easy way to augment your benchmark with thousands of paraphrases โžก๏ธ robust eval, zero sweat! - Works on any dataset! - Python API + web UI Eliya Habba, Gili Lior, Gabriel Stanovsky eliyahabba.github.io/PromptSuite/

Noy Sternlicht (@noysternlicht) 's Twitter Profile Photo

๐ŸŽ‰ Proud to share that "Debatable Intelligence" has now been accepted to #EMNLP2025 (Main Conference)! noy-sternlicht.github.io/Debatable-Inteโ€ฆ Huge thenks to my amazing collaborators Ariel Gera, Roy Bar Haim, Tom Hope, Noam Slonim ๐ŸŸข

Matan Levy (@matanlvy) 's Twitter Profile Photo

๐Ÿค– AI for finding a needle in a haystack: ๐Ÿš€ We're excited to share our #NeurIPS2025 paper: "Find your Needle: Small Object Image Retrieval via Multi-Object Attention Optimization".

๐Ÿค– AI for finding a needle in a haystack:

๐Ÿš€ We're excited to share our #NeurIPS2025 paper: "Find your Needle: Small Object Image Retrieval via Multi-Object Attention Optimization".