Itay Nakash (@itay__nakash) Twitter Tweets • TwiCopy

Eran Hirsch

6 months ago

🚨 Introducing LAQuer, accepted to #ACL2025 (main conf)! LAQuer provides more granular attribution for LLM generations: users can just highlight any output fact (top), and get attribution for that input snippet (bottom). This reduces the amount of text the user has to read by 2

thumb_up_off_alt72

chat_bubble_outline3

repeat26

shareShare

Judd Rosenblatt — d/acc

@juddrosenblatt

5 months ago

Current AI “alignment” is just a mask Our findings in The Wall Street Journal explore the limitations of today’s alignment techniques and what’s needed to get AI right 🧵

Current AI “alignment” is just a mask

Our findings in <a href="/WSJ/">The Wall Street Journal</a> explore the limitations of today’s alignment techniques and what’s needed to get AI right 🧵

thumb_up_off_alt9,9K

chat_bubble_outline352

repeat1,1K

shareShare

Itay Nakash

@itay__nakash

4 months ago

Happy to share that our work was accepted to Conference on Language Modeling 2025 ! 🇨🇦🍁

thumb_up_off_alt21

chat_bubble_outline1

repeat2

shareShare

Zorik Gekhman

@zorikgekhman

4 months ago

Now accepted to #COLM2025! We formally define hidden knowledge in LLMs and show its existence in a controlled study. We even show that a model can know the answer yet fail to generate it in 1,000 attempts 😵 Looking forward to presenting and discussing our work in person.

thumb_up_off_alt58

chat_bubble_outline1

repeat14

shareShare

Tal Haklay

@tal_haklay

4 months ago

🚨Meet our panelists at the Actionable Interpretability Workshop Actionable Interpretability Workshop ICML2025 at ICML Conference! Join us July 19 at 4pm for a panel on making interpretability research actionable, its challenges, and how the community can drive greater impact. Naomi Saphra hiring my lab at ICML 🧈🪰 Samuel Marks Kyle Lo Fazl Barez

🚨Meet our panelists at the Actionable Interpretability Workshop <a href="/ActInterp/">Actionable Interpretability Workshop ICML2025</a> at <a href="/icmlconf/">ICML Conference</a>!

Join us July 19 at 4pm for a panel on making interpretability research actionable, its challenges, and how the community can drive greater impact.
<a href="/nsaphra/">Naomi Saphra hiring my lab at ICML 🧈🪰</a> <a href="/saprmarks/">Samuel Marks</a> <a href="/kylelostat/">Kyle Lo</a> <a href="/FazlBarez/">Fazl Barez</a>

thumb_up_off_alt55

chat_bubble_outline0

repeat12

shareShare

Itay Itzhak

@itay_itzhak_

4 months ago

🚨New paper alert🚨 🧠 Instruction-tuned LLMs show amplified cognitive biases — but are these new behaviors, or pretraining ghosts resurfacing? Excited to share our new paper, accepted to CoLM 2025🎉! See thread below 👇 #BiasInAI #LLMs #MachineLearning #NLProc

thumb_up_off_alt74

chat_bubble_outline3

repeat24

shareShare

Nitay Calderon

@nitcal

4 months ago

Everyone uses LLMs to annotate data or evaluate models in their research. But how can we convince others (readers, collaborators, reviewers!!!) that LLMs are reliable? 🤖 Here’s a simple (and low-effort) solution: show the LLM is a *comparable alternative annotator* ✅

thumb_up_off_alt63

chat_bubble_outline3

repeat19

shareShare

Asaf Yehudai

@asafyehudai

4 months ago

🚨 Benchmarks tell us which model is better — but not why it fails. For developers, this means tedious, manual error analysis. We're bridging that gap. Meet CLEAR: an open-source tool for actionable error analysis of LLMs. 🧵👇

thumb_up_off_alt41

chat_bubble_outline1

repeat13

shareShare

Tomer Ashuach

@tomerashuach

3 months ago

🚨 New preprint out! CRISP: Persistent Concept Unlearning via SAEs LLMs often encode knowledge we want to remove. CRISP enables persistent, interpretable, precise unlearning while keeping models useful & coherent—tested on bio & cyber safety tasks🧵👇 📄arxiv.org/abs/2508.13650

thumb_up_off_alt79

chat_bubble_outline1

repeat19

shareShare

Itay Nakash

@itay__nakash

3 months ago

Excited to share that our work has been accepted to the #EMNLP2025 Main Conference 🥳 Heading to EMNLP? If you’re interested in agent security, safety, or jailbreaking LLMs - let’s talk!

thumb_up_off_alt27

chat_bubble_outline0

repeat2

shareShare

Nitay Calderon

@nitcal

3 months ago

🥳🥳 Happy to share that we have three papers accepted to EMNLP 2025 🇨🇳 (2 main, 1 findings)! What makes this special is that all three belong to a new research line I began last year: LLM-as-a-judge/LLM-as-an-annotator 🤖🧑‍⚖️

thumb_up_off_alt126

chat_bubble_outline2

repeat13

shareShare

Sara Rosenthal

@seirasto

3 months ago

📣📣Presenting our platform used to build MTRAG!! RAGAPHENE: A RAG Annotation Platform with Human ENhancements and Edits Arxiv: arxiv.org/abs/2508.19272 MTRAG GitHub: github.com/IBM/mt-rag-ben… Join our MTRAGEval Task: ibm.github.io/mt-rag-benchma… Kshitij Fadnis Maeda Hanafi Marina Danilevsky

thumb_up_off_alt4

chat_bubble_outline1

repeat2

shareShare

Uri Berger

@uriberger88

21 days ago

Heading to #EMNLP2025! 🎉 Two of our papers will be there — come say hi 👋 🖼️ Image Captioning Evaluation — Nov 5, 17:45 📄 arxiv.org/abs/2408.04909 🕵️ Deceptive LLM Agents (Mafia Game) — Nov 5, 13:00 📄 arxiv.org/abs/2506.05309

thumb_up_off_alt26

chat_bubble_outline1

repeat6

shareShare

Itay Nakash

@itay__nakash

20 days ago

Heading to #EMNLP2025: Suzhou, here we come! 🌟 Presenting our work on multi-agent attacks against policy-adherent agents -> showing why naive evaluation isn’t enough! 📅 Thu, Nov 6 | 10:30–12:00 [Session 7] Interested in AI Security, Safety, or Agent Evaluation? Let’s chat!

thumb_up_off_alt5

chat_bubble_outline1

repeat0

shareShare

Nitay Calderon

@nitcal

16 days ago

Excited to be at #EMNLP2025 in Suzhou 🇨🇳! I’ll present three papers, and I'm happy to chat about any of these works! 🏅 "Multi-Domain Explainability of Preferences" - Oral, Interpretability 2, Nov 5, 17:30 (A104-105) w/ Roi Reichart Liat Ein-Dor 🧠 "Dementia Through Different

thumb_up_off_alt48

chat_bubble_outline0

repeat13

shareShare