Ori Yoran (@oriyoran) Twitter Tweets • TwiCopy

good girl

@goodgirlxsz

5 hours ago

🔥Telegram İfşa

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Hi ho! New work: arxiv.org/pdf/2503.14481 With amazing collabs Jacob Eisenstein Reza Aghajani Adam Fisch dheeru dua Fantine Huot ✈️ ICLR 25 Mirella Lapata Vicky Zayats Some things are easier to learn in a social setting. We show agents can learn to faithfully express their beliefs (along... 1/3

Hi ho!

New work: arxiv.org/pdf/2503.14481
With amazing collabs <a href="/jacobeisenstein/">Jacob Eisenstein</a> <a href="/jdjdhekchbdjd/">Reza Aghajani</a> <a href="/adamjfisch/">Adam Fisch</a> <a href="/ddua17/">dheeru dua</a> <a href="/fantinehuot/">Fantine Huot ✈️ ICLR 25</a> <a href="/mlapata/">Mirella Lapata</a> <a href="/vicky_zayats/">Vicky Zayats</a>

Some things are easier to learn in a social setting. We show agents can learn to faithfully express their beliefs (along... 1/3

thumb_up_off_alt61

chat_bubble_outline2

repeat17

shareShare

Pierre Chambon

@pierrechambon6

8 months ago

Does your LLM truly comprehend the complexity of the code it generates? 🥰 Introducing our new non-saturated (for at least the coming week? 😉) benchmark: ✨BigO(Bench)✨ - Can LLMs Generate Code with Controlled Time and Space Complexity? Check out the details below !👇

thumb_up_off_alt119

chat_bubble_outline9

repeat26

shareShare

Noam Razin

@noamrazin

8 months ago

The success of RLHF depends heavily on the quality of the reward model (RM), but how should we measure this quality? 📰 We study what makes a good RM from an optimization perspective. Among other results, we formalize why more accurate RMs are not necessarily better teachers! 🧵

thumb_up_off_alt749

chat_bubble_outline7

repeat119

shareShare

Gallil Maimon

@gallilmaimon

8 months ago

Many modern SpeechLMs are trained with Speech-Text interleaving. How does this impact scaling trends? In our new paper, we train several dozen SLMs, and show - quite a lot! So there is room for optimism 😊 Key insights, code, models, full paper 👇🏻

thumb_up_off_alt69

chat_bubble_outline4

repeat20

shareShare

Michael Hassid

@michaelhassid

6 months ago

The longer reasoning LLM thinks - the more likely to be correct, right? Apparently not. Presenting our paper: “Don’t Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning”. Link: arxiv.org/abs/2505.17813 1/n

thumb_up_off_alt104

chat_bubble_outline5

repeat34

shareShare

Alex Zhang

@a1zhang

6 months ago

Can GPT, Claude, and Gemini play video games like Zelda, Civ, and Doom II? 𝗩𝗶𝗱𝗲𝗼𝗚𝗮𝗺𝗲𝗕𝗲𝗻𝗰𝗵 evaluates VLMs on Game Boy & MS-DOS games given only raw screen input, just like how a human would play. The best model (Gemini) completes just 0.48% of the benchmark! 🧵👇

thumb_up_off_alt518

chat_bubble_outline23

repeat71

shareShare

Ofir Press

@ofirpress

6 months ago

Our new benchmark is finally out! Lots of cool demo vids in this thread:

thumb_up_off_alt52

chat_bubble_outline1

repeat5

shareShare

AK

@_akhaliq

6 months ago

Don't Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning

thumb_up_off_alt369

chat_bubble_outline6

repeat48

shareShare

Yoav Gur Arieh

@guryoav

6 months ago

Can we precisely erase conceptual knowledge from LLM parameters? Most methods are shallow, coarse, or overreach, adversely affecting related or general knowledge. We introduce🪝𝐏𝐈𝐒𝐂𝐄𝐒 — a general framework for Precise In-parameter Concept EraSure. 🧵 1/

thumb_up_off_alt63

chat_bubble_outline2

repeat8

shareShare

Chaitanya Malaviya

@cmalaviya11

6 months ago

Ever wondered what makes language models generate overly verbose, vague, or sycophantic responses? Our new paper investigates these and other idiosyncratic biases in preference models, and presents a simple post-training recipe to mitigate them! Thread below 🧵↓

thumb_up_off_alt75

chat_bubble_outline1

repeat17

shareShare

Tanishq Mathew Abraham, Ph.D.

@iscienceluvr

6 months ago

Corrector Sampling in Language Models "Autoregressive language models accumulate errors due to their fixed, irrevocable left-to-right token generation. To address this, we propose a new sampling method called Resample-Previous-Tokens (RPT). RPT mitigates error accumulation by

thumb_up_off_alt278

chat_bubble_outline1

repeat36

shareShare

Ricky T. Q. Chen

@rickytqchen

6 months ago

Padding in our non-AR sequence models? Yuck. 🙅 👉 Instead of unmasking, our new work *Edit Flows* perform iterative refinements via position-relative inserts and deletes, operations naturally suited for variable-length sequence generation. Easily better than using mask tokens.

thumb_up_off_alt482

chat_bubble_outline8

repeat71

shareShare

Yijia Shao

@echoshao8899

5 months ago

🚨 70 million US workers are about to face their biggest workplace transmission due to AI agents. But nobody asks them what they want. While AI races to automate everything, we took a different approach: auditing what workers want vs. what AI can do across the US workforce.🧵

thumb_up_off_alt280

chat_bubble_outline6

repeat47

shareShare

Mor Geva

@megamor2

5 months ago

✨MLP layers have just become more interpretable than ever ✨ In a new paper: * We show a simple method for decomposing MLP activations into interpretable features * Our method uncovers hidden concept hierarchies, where sparse neuron combinations form increasingly abstract ideas

thumb_up_off_alt233

chat_bubble_outline6

repeat39

shareShare

Neta Shaul

@shaulneta

5 months ago

[1/n] New paper alert! 🚀 Excited to introduce 𝐓𝐫𝐚𝐧𝐬𝐢𝐭𝐢𝐨𝐧 𝐌𝐚𝐭𝐜𝐡𝐢𝐧𝐠 (𝐓𝐌)! We're replacing short-timestep kernels from Flow Matching/Diffusion with... a generative model🤯, achieving SOTA text-2-image generation! Uriel Singer Itai Gat Yaron Lipman

thumb_up_off_alt254

chat_bubble_outline4

repeat42

shareShare

Ido Cohen

@idoc0hen

5 months ago

A Vision-Language Model can answer questions about Robin Williams. It can also recognize him in a photo. So why does it FAIL when asked the same questions using his photo instead of his name? A thread on our new #acl2025 paper that explores this puzzle 🧵

thumb_up_off_alt24

chat_bubble_outline1

repeat7

shareShare

Itay Itzhak

@itay_itzhak_

4 months ago

🚨New paper alert🚨 🧠 Instruction-tuned LLMs show amplified cognitive biases — but are these new behaviors, or pretraining ghosts resurfacing? Excited to share our new paper, accepted to CoLM 2025🎉! See thread below 👇 #BiasInAI #LLMs #MachineLearning #NLProc

thumb_up_off_alt74

chat_bubble_outline3

repeat24

shareShare

Ori Yoran

good girl

Jonathan Berant

Pierre Chambon

Noam Razin

Gallil Maimon

Michael Hassid

Alex Zhang

Ofir Press

AK

Yoav Gur Arieh

Chaitanya Malaviya

Tanishq Mathew Abraham, Ph.D.

Ricky T. Q. Chen

Yijia Shao

Mor Geva

Neta Shaul

Ido Cohen

Itay Itzhak