Jonathan Berant (@jonathanberant) 's Twitter Profile
Jonathan Berant

@jonathanberant

NLP at Tel-Aviv University and Google DeepMind

ID: 322636963

linkhttps://www.cs.tau.ac.il/~joberant/ calendar_today23-06-2011 14:12:47

999 Tweet

2,2K Followers

274 Following

Adam Fisch (@adamjfisch) 's Twitter Profile Photo

Specifically, building on the active PPI estimator of Zrnic and Candès, we derive a family of cost-optimal policies, pi(x), that determine the best probabilities for choosing to get H_t, versus choosing to just use G_t, for each X_t.

Specifically, building on the active PPI estimator of Zrnic and Candès, we derive a family of cost-optimal policies, pi(x), that determine the best probabilities for choosing to get H_t, versus choosing to just use G_t, for each X_t.
Adam Fisch (@adamjfisch) 's Twitter Profile Photo

We solve for two types of policies: (1) the best fixed sampling rate, pi_random(x) = p*, that doesn’t change with X, and (2) and the best fully active policy pi_active(x) \in (0, 1]. Intuitively, fully active is better when G has variable accuracy (e.g., we see hard + easy Xs).

Adam Fisch (@adamjfisch) 's Twitter Profile Photo

We explore how much these policies improve over the naïve empirical estimates of E[H] using synthetic + real data. The optimal pi depends on unknown distributional properties of (X, H, G), so we examine performance in theory (using oracle rules) + in practice (when approximated).

We explore how much these policies improve over the naïve empirical estimates of E[H] using synthetic + real data. The optimal pi depends on unknown distributional properties of (X, H, G), so we examine performance in theory (using oracle rules) + in practice (when approximated).
Anastasios Nikolas Angelopoulos (@ml_angelopoulos) 's Twitter Profile Photo

This paper extends active statistical inference in a number of exciting ways, with applications in LLM evaluation! 1. Improves upon active inference to give the optimal sampling policy with clipping. 2. Gives an optimal-cost inference procedure Take a look! One of my fave

Ziteng Sun (@sziteng) 's Twitter Profile Photo

[Today 11 am poster E-2804 #ICML2025] Inference-time compute have been instrumental to recent development of LLMs. Can we align our model to better suit a given inference-time procedure? Come check our poster and discuss with Ananth Balashankar, Ahmad Beirami, Jacob Eisenstein, and

[Today 11 am poster E-2804 #ICML2025] Inference-time compute have been instrumental to recent development of LLMs. Can we align our model to better suit a given inference-time procedure? Come check our poster and discuss with <a href="/ananthbshankar/">Ananth Balashankar</a>, <a href="/abeirami/">Ahmad Beirami</a>, <a href="/jacobeisenstein/">Jacob Eisenstein</a>, and
Michal Feldman (@michalfeldman9) 's Twitter Profile Photo

🚨 Don't miss this amazing opportunity! The Schmidt Postdoc Award supports Israeli women pursuing postdocs abroad in math, CS, IE, or EE. 💰 $60K/year | 🌍 Top global institutions 📅 Deadline: Aug 15, 2025 🔗 schmidtsciences.org/israeli-womens… 📝 Apply: easychair.org/conferences/?c…

Michal Feldman (@michalfeldman9) 's Twitter Profile Photo

🚨 אל תפספסי את ההזדמנות: מלגת פוסטדוק בחו"ל ע"ש שמידט, לנשים במתמטיקה, מדעי המחשב, הנדסת תעשייה וניהול או הנדסת חשמל. 💰 60,000 דולר בשנה 📅 דדליין: 15 באוגוסט 2025 🔗 schmidtsciences.org/israeli-womens… 📝 הגשה: easychair.org/conferences/?c…

Conference on Language Modeling (@colm_conf) 's Twitter Profile Photo

Outstanding paper 3🏆: Don't lie to your friends: Learning what you know from collaborative self-play openreview.net/forum?id=2vDJi…

Outstanding paper 3🏆: Don't lie to your friends: Learning what you know from collaborative self-play
openreview.net/forum?id=2vDJi…
Transactions on Machine Learning Research (@tmlrorg) 's Twitter Profile Photo

As Transactions on Machine Learning Research (TMLR) grows in number of submissions, we are looking for more reviewers and action editors. Please sign up! Only one paper to review at a time and <= 6 per year, reviewers report greater satisfaction than reviewing for conferences!

As Transactions on Machine Learning Research (TMLR) grows in number of submissions, we are looking for more reviewers and action editors. Please sign up! 

Only one paper to review at a time and &lt;= 6 per year, reviewers report greater satisfaction than reviewing for conferences!
Samuel AMOUYAL (@amouyalsamuel) 's Twitter Profile Photo

No head injury is too trivial to be ignored. What do you think this sentence means ? Can you ignore head injuries ? This type of sentence is called depth charge sentence and its structure is especially challenging for humans.

Samuel AMOUYAL (@amouyalsamuel) 's Twitter Profile Photo

But this is not the only challenging structure for humans. Psycholinguistic research has discovered many different structures that are challenging for humans. We read them slowly and understand them poorly. But what happens with LLMs? Do they understand them correctly ?

Samuel AMOUYAL (@amouyalsamuel) 's Twitter Profile Photo

This is what we check. We tested the comprehension of 31 different models from 5 different families on 7 different challenging structures (including 4 types of garden paths, GP). We also collected human data on these structures to be able to compare human comprehension to LLMs.

Samuel AMOUYAL (@amouyalsamuel) 's Twitter Profile Photo

First, these structures are challenging for LLMs (highest mean accuracy being 0.653). We noticed 2 interesting facts: 1. Structures straining working memory in humans were easier than structures challenging due to ambiguity. 2. Thinking helps, but once an LLM is strong enough.

First, these structures are challenging for LLMs (highest mean accuracy being 0.653).
We noticed 2 interesting facts:
1. Structures straining working memory in humans were easier than structures challenging due to ambiguity.
2. Thinking helps, but once an LLM is strong enough.
Samuel AMOUYAL (@amouyalsamuel) 's Twitter Profile Photo

We report 3 additional findings: 1. LLMs similarity to humans on GP structures is higher 2. The similarity of the structures' difficulty ordering to humans increases with model size 3. LLM performs better on easy baseline than on the structures if it's not too strong or too weak

We report 3 additional findings:
1. LLMs similarity to humans on GP structures is higher
2. The similarity of the structures' difficulty ordering to humans increases with model size
3. LLM performs better on easy baseline than on the structures if it's not too strong or too weak
Samuel AMOUYAL (@amouyalsamuel) 's Twitter Profile Photo

We have more interesting insights in our paper. We believe this is a really exciting direction for humans and LLMs comparison. Extending our framework to more structures and more LLMs will certainly lead to additional insights !

Samuel AMOUYAL (@amouyalsamuel) 's Twitter Profile Photo

I had a lot of fun working on this with Jonathan Berant Aya Meltzer-Asscher You can find our paper here: arxiv.org/abs/2510.07141 And by the way, the answer (at least based on the sentence) is yes, you can ignore head injuries. But it's a terrible advice