Jonathan Berant (@jonathanberant) Twitter Tweets • TwiCopy

Adam Fisch

5 months ago

Specifically, building on the active PPI estimator of Zrnic and Candès, we derive a family of cost-optimal policies, pi(x), that determine the best probabilities for choosing to get H_t, versus choosing to just use G_t, for each X_t.

thumb_up_off_alt4

chat_bubble_outline1

repeat1

shareShare

Adam Fisch

@adamjfisch

5 months ago

We solve for two types of policies: (1) the best fixed sampling rate, pi_random(x) = p*, that doesn’t change with X, and (2) and the best fully active policy pi_active(x) \in (0, 1]. Intuitively, fully active is better when G has variable accuracy (e.g., we see hard + easy Xs).

thumb_up_off_alt3

chat_bubble_outline1

repeat1

shareShare

Adam Fisch

@adamjfisch

5 months ago

We explore how much these policies improve over the naïve empirical estimates of E[H] using synthetic + real data. The optimal pi depends on unknown distributional properties of (X, H, G), so we examine performance in theory (using oracle rules) + in practice (when approximated).

thumb_up_off_alt4

chat_bubble_outline1

repeat2

shareShare

Adam Fisch

@adamjfisch

5 months ago

Work co-led with Anastasios Nikolas Angelopoulos , whom we had the pleasure of briefly hosting here at Google DeepMind for this collaboration, together with my GDM and GR colleagues Jacob Eisenstein , Jonathan Berant , and Alekh Agarwal.

thumb_up_off_alt3

chat_bubble_outline2

repeat1

shareShare

Anastasios Nikolas Angelopoulos

@ml_angelopoulos

5 months ago

This paper extends active statistical inference in a number of exciting ways, with applications in LLM evaluation! 1. Improves upon active inference to give the optimal sampling policy with clipping. 2. Gives an optimal-cost inference procedure Take a look! One of my fave

thumb_up_off_alt31

chat_bubble_outline0

repeat2

shareShare

Jonathan Berant

@jonathanberant

4 months ago

Accepted to COLM Conference on Language Modeling !

thumb_up_off_alt18

chat_bubble_outline0

repeat1

shareShare

Ziteng Sun

@sziteng

4 months ago

[Today 11 am poster E-2804 #ICML2025] Inference-time compute have been instrumental to recent development of LLMs. Can we align our model to better suit a given inference-time procedure? Come check our poster and discuss with Ananth Balashankar, Ahmad Beirami, Jacob Eisenstein, and

thumb_up_off_alt14

chat_bubble_outline1

repeat4

shareShare

Michal Feldman

@michalfeldman9

3 months ago

🚨 Don't miss this amazing opportunity! The Schmidt Postdoc Award supports Israeli women pursuing postdocs abroad in math, CS, IE, or EE. 💰 $60K/year | 🌍 Top global institutions 📅 Deadline: Aug 15, 2025 🔗 schmidtsciences.org/israeli-womens… 📝 Apply: easychair.org/conferences/?c…

thumb_up_off_alt7

chat_bubble_outline0

repeat7

shareShare

Michal Feldman

@michalfeldman9

3 months ago

🚨 אל תפספסי את ההזדמנות: מלגת פוסטדוק בחו"ל ע"ש שמידט, לנשים במתמטיקה, מדעי המחשב, הנדסת תעשייה וניהול או הנדסת חשמל. 💰 60,000 דולר בשנה 📅 דדליין: 15 באוגוסט 2025 🔗 schmidtsciences.org/israeli-womens… 📝 הגשה: easychair.org/conferences/?c…

thumb_up_off_alt22

chat_bubble_outline0

repeat5

shareShare

Jonathan Berant

@jonathanberant

a month ago

Will be at Conference on Language Modeling Mon night to Fri morning, let me know if you wanna catch up!

thumb_up_off_alt22

chat_bubble_outline1

repeat0

shareShare

Conference on Language Modeling

@colm_conf

a month ago

Outstanding paper 3🏆: Don't lie to your friends: Learning what you know from collaborative self-play openreview.net/forum?id=2vDJi…

thumb_up_off_alt36

chat_bubble_outline1

repeat9

shareShare

Chris Dyer

@redpony

a month ago

Great work, everyone! :)

thumb_up_off_alt14

chat_bubble_outline1

repeat1

shareShare

Transactions on Machine Learning Research

@tmlrorg

a month ago

As Transactions on Machine Learning Research (TMLR) grows in number of submissions, we are looking for more reviewers and action editors. Please sign up! Only one paper to review at a time and <= 6 per year, reviewers report greater satisfaction than reviewing for conferences!

thumb_up_off_alt59

chat_bubble_outline2

repeat22

shareShare

Samuel AMOUYAL

@amouyalsamuel

25 days ago

No head injury is too trivial to be ignored. What do you think this sentence means ? Can you ignore head injuries ? This type of sentence is called depth charge sentence and its structure is especially challenging for humans.

thumb_up_off_alt6

chat_bubble_outline1

repeat2

shareShare

Samuel AMOUYAL

@amouyalsamuel

25 days ago

But this is not the only challenging structure for humans. Psycholinguistic research has discovered many different structures that are challenging for humans. We read them slowly and understand them poorly. But what happens with LLMs? Do they understand them correctly ?

thumb_up_off_alt1

chat_bubble_outline1

repeat1

shareShare

Samuel AMOUYAL

@amouyalsamuel

25 days ago

This is what we check. We tested the comprehension of 31 different models from 5 different families on 7 different challenging structures (including 4 types of garden paths, GP). We also collected human data on these structures to be able to compare human comprehension to LLMs.

thumb_up_off_alt1

chat_bubble_outline1

repeat1

shareShare

Samuel AMOUYAL

@amouyalsamuel

25 days ago

First, these structures are challenging for LLMs (highest mean accuracy being 0.653). We noticed 2 interesting facts: 1. Structures straining working memory in humans were easier than structures challenging due to ambiguity. 2. Thinking helps, but once an LLM is strong enough.

thumb_up_off_alt1

chat_bubble_outline1

repeat1

shareShare

Samuel AMOUYAL

@amouyalsamuel

25 days ago

We report 3 additional findings: 1. LLMs similarity to humans on GP structures is higher 2. The similarity of the structures' difficulty ordering to humans increases with model size 3. LLM performs better on easy baseline than on the structures if it's not too strong or too weak

thumb_up_off_alt1

chat_bubble_outline1

repeat1

shareShare

Samuel AMOUYAL

@amouyalsamuel

25 days ago

We have more interesting insights in our paper. We believe this is a really exciting direction for humans and LLMs comparison. Extending our framework to more structures and more LLMs will certainly lead to additional insights !

thumb_up_off_alt2

chat_bubble_outline1

repeat1

shareShare

Samuel AMOUYAL

@amouyalsamuel

25 days ago

I had a lot of fun working on this with Jonathan Berant Aya Meltzer-Asscher You can find our paper here: arxiv.org/abs/2510.07141 And by the way, the answer (at least based on the sentence) is yes, you can ignore head injuries. But it's a terrible advice

thumb_up_off_alt4

chat_bubble_outline0

repeat1

shareShare