Zachary Bamberger @NAACL2025 (@zacharybamberg1) 's Twitter Profile
Zachary Bamberger @NAACL2025

@zacharybamberg1

(🇺🇸/🇮🇱) PhD @TechnionLive (advised by @ofraam and @amir_feder). BSc @CornellCIS, MSc @TechnionLive (advised by @boknilev). Xoogler. Persuasive Arguments

ID: 1183101225337647105

linkhttps://zachary.cswp.cs.technion.ac.il calendar_today12-10-2019 19:24:58

383 Tweet

314 Takipçi

817 Takip Edilen

Nitay Calderon (@nitcal) 's Twitter Profile Photo

Do you use LLM-as-a-judge or LLM annotations in your research? There’s a growing trend of replacing human annotators with LLMs in research—they're fast, cheap, and require less effort. But can we trust them?🤔 Well, we need a rigorous procedure to answer this. 🚨New preprint👇

Do you use LLM-as-a-judge or LLM annotations in your research?

There’s a growing trend of replacing human annotators with LLMs in research—they're fast, cheap, and require less effort.

But can we trust them?🤔
Well, we need a rigorous procedure to answer this.

🚨New preprint👇
Itay Nakash (@itay__nakash) 's Twitter Profile Photo

1/🧵 New preprint: AdaptiVocab 📢 A lightweight method to make LLMs ~25% faster in domain-specific settings—without compromising quality. We adapt the vocabulary on top of your tokenizer to fit the domain. No architecture changes. No retraining from scratch.

1/🧵 New preprint: AdaptiVocab 📢

A lightweight method to make LLMs ~25% faster in domain-specific settings—without compromising quality.

We adapt the vocabulary on top of your tokenizer to fit the domain.

No architecture changes. No retraining from scratch.
Zorik Gekhman (@zorikgekhman) 's Twitter Profile Photo

🚨 It's often claimed that LLMs know more facts than they show in their outputs, but what does this actually mean, and how can we measure this “hidden knowledge”? In our new paper, we clearly define this concept and design controlled experiments to test it. 1/🧵

🚨 It's often claimed that LLMs know more facts than they show in their outputs, but what does this actually mean, and how can we measure this “hidden knowledge”?

In our new paper, we clearly define this concept and design controlled experiments to test it.
1/🧵
Tal Haklay (@tal_haklay) 's Twitter Profile Photo

🚨 Call for Papers is Out! The First Workshop on 𝐀𝐜𝐭𝐢𝐨𝐧𝐚𝐛𝐥𝐞 𝐈𝐧𝐭𝐞𝐫𝐩𝐫𝐞𝐭𝐚𝐛𝐢𝐥𝐢𝐭𝐲 will be held at ICML 2025 in Vancouver! 📅 Submission Deadline: May 9 Follow us >> Actionable Interpretability Workshop ICML2025 🧠Topics of interest include: 👇

🚨 Call for Papers is Out!

The First Workshop on 𝐀𝐜𝐭𝐢𝐨𝐧𝐚𝐛𝐥𝐞 𝐈𝐧𝐭𝐞𝐫𝐩𝐫𝐞𝐭𝐚𝐛𝐢𝐥𝐢𝐭𝐲 will be held at ICML 2025 in Vancouver!

📅 Submission Deadline: May 9
Follow us &gt;&gt; <a href="/ActInterp/">Actionable Interpretability Workshop ICML2025</a>

🧠Topics of interest include: 👇
Isabel O. Gallegos (@isabelogallegos) 's Twitter Profile Photo

🚨🚨New Working Paper🚨🚨 AI-generated content is getting more politically persuasive. But does labeling it as AI-generated change its impact?🤔 Our research says the disclosure of AI authorship has little to no effect on the persuasiveness of AI-generated content. 🧵1/6

🚨🚨New Working Paper🚨🚨

AI-generated content is getting more politically persuasive. But does labeling it as AI-generated change its impact?🤔

Our research says the disclosure of AI authorship has little to no effect on the persuasiveness of AI-generated content.

🧵1/6
León (@leonguertler) 's Twitter Profile Photo

TextArena is live on arXiv! We present a benchmark of 57+ competitive text-based games to evaluate and train LLMs on agentic behavior — including negotiation, deception, theory of mind and many more. Real-time TrueSkill. Multiplayer support. Human-vs-models. Model-vs-model.

TextArena is live on arXiv! We present a benchmark of 57+ competitive text-based games to evaluate and train LLMs on agentic behavior — including negotiation, deception, theory of mind and many more.  Real-time TrueSkill. Multiplayer support. Human-vs-models. Model-vs-model.
Ġabe Ġrand (@gabe_grand) 's Twitter Profile Photo

Tackling complex problems with LMs requires search/planning, but how should test-time compute be structured? Introducing Self-Steering, a new meta-reasoning framework where LMs coordinate their own inference procedures by writing code!

Mehul Damani @ ICLR (@mehuldamani2) 's Twitter Profile Photo

🚨New Paper!🚨 We trained reasoning LLMs to reason about what they don't know. o1-style reasoning training improves accuracy but produces overconfident models that hallucinate more. Meet RLCR: a simple RL method that trains LLMs to reason and reflect on their uncertainty --

🚨New Paper!🚨
We trained reasoning LLMs to reason about what they don't know.

o1-style reasoning training improves accuracy but produces overconfident models that hallucinate more.

Meet RLCR: a simple RL method that trains LLMs to reason and reflect on their uncertainty --
Omar Khattab (@lateinteraction) 's Twitter Profile Photo

New paper: Reflective Prompt Evolution Can Outperform GRPO. It's becoming clear that learning via natural-language reflection (aka prompt optimization) will long be a central learning paradigm for building AI systems. Great work by Lakshya A Agrawal and team on GEPA and SIMBA.

New paper: Reflective Prompt Evolution Can Outperform GRPO.

It's becoming clear that learning via natural-language reflection (aka prompt optimization) will long be a central learning paradigm for building AI systems.

Great work by <a href="/LakshyAAAgrawal/">Lakshya A Agrawal</a> and team on GEPA and SIMBA.
Zachary Bamberger @NAACL2025 (@zacharybamberg1) 's Twitter Profile Photo

Many persuasion settings assume there is a knowledge imbalance: the speaker knows more than the listener, and their interests don’t necessarily align. I think a better formulation is where both players have access to the same tools, but perhaps different capacities in using them