Ori Yoran (@oriyoran) 's Twitter Profile
Ori Yoran

@oriyoran

NLP researcher / P.hD candidate (Tel-Aviv University)

ID: 1375041744459468800

calendar_today25-03-2021 11:07:59

165 Tweet

606 Takipçi

543 Takip Edilen

Jonathan Berant (@jonathanberant) 's Twitter Profile Photo

Hi ho! New work: arxiv.org/pdf/2503.14481 With amazing collabs Jacob Eisenstein Reza Aghajani Adam Fisch dheeru dua Fantine Huot ✈️ ICLR 25 Mirella Lapata Vicky Zayats Some things are easier to learn in a social setting. We show agents can learn to faithfully express their beliefs (along... 1/3

Hi ho!

New work: arxiv.org/pdf/2503.14481
With amazing collabs <a href="/jacobeisenstein/">Jacob Eisenstein</a> <a href="/jdjdhekchbdjd/">Reza Aghajani</a> <a href="/adamjfisch/">Adam Fisch</a> <a href="/ddua17/">dheeru dua</a> <a href="/fantinehuot/">Fantine Huot ✈️ ICLR 25</a> <a href="/mlapata/">Mirella Lapata</a> <a href="/vicky_zayats/">Vicky Zayats</a>

Some things are easier to learn in a social setting. We show agents can learn to faithfully express their beliefs (along... 1/3
Pierre Chambon (@pierrechambon6) 's Twitter Profile Photo

Does your LLM truly comprehend the complexity of the code it generates? 🥰   Introducing our new non-saturated (for at least the coming week? 😉) benchmark:   ✨BigO(Bench)✨ - Can LLMs Generate Code with Controlled Time and Space Complexity?   Check out the details below !👇

Does your LLM truly comprehend the complexity of the code it generates? 🥰
 
Introducing our new non-saturated (for at least the coming week? 😉) benchmark:
 
✨BigO(Bench)✨ - Can LLMs Generate Code with Controlled Time and Space Complexity?
 
Check out the details below !👇
Noam Razin (@noamrazin) 's Twitter Profile Photo

The success of RLHF depends heavily on the quality of the reward model (RM), but how should we measure this quality? 📰 We study what makes a good RM from an optimization perspective. Among other results, we formalize why more accurate RMs are not necessarily better teachers! 🧵

The success of RLHF depends heavily on the quality of the reward model (RM), but how should we measure this quality?

📰 We study what makes a good RM from an optimization perspective. Among other results, we formalize why more accurate RMs are not necessarily better teachers!
🧵
Gallil Maimon (@gallilmaimon) 's Twitter Profile Photo

Many modern SpeechLMs are trained with Speech-Text interleaving. How does this impact scaling trends? In our new paper, we train several dozen SLMs, and show - quite a lot! So there is room for optimism 😊 Key insights, code, models, full paper 👇🏻

Many modern SpeechLMs are trained with Speech-Text interleaving. How does this impact scaling trends?

In our new paper, we train several dozen SLMs, and show - quite a lot! So there is room for optimism 😊

Key insights, code, models, full paper 👇🏻
Michael Hassid (@michaelhassid) 's Twitter Profile Photo

The longer reasoning LLM thinks - the more likely to be correct, right? Apparently not. Presenting our paper: “Don’t Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning”. Link: arxiv.org/abs/2505.17813 1/n

The longer reasoning LLM thinks - the more likely to be correct, right?

Apparently not.

Presenting our paper: “Don’t Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning”.

Link: arxiv.org/abs/2505.17813

1/n
Alex Zhang (@a1zhang) 's Twitter Profile Photo

Can GPT, Claude, and Gemini play video games like Zelda, Civ, and Doom II? 𝗩𝗶𝗱𝗲𝗼𝗚𝗮𝗺𝗲𝗕𝗲𝗻𝗰𝗵 evaluates VLMs on Game Boy & MS-DOS games given only raw screen input, just like how a human would play. The best model (Gemini) completes just 0.48% of the benchmark! 🧵👇

Yoav Gur Arieh (@guryoav) 's Twitter Profile Photo

Can we precisely erase conceptual knowledge from LLM parameters? Most methods are shallow, coarse, or overreach, adversely affecting related or general knowledge. We introduce🪝𝐏𝐈𝐒𝐂𝐄𝐒 — a general framework for Precise In-parameter Concept EraSure. 🧵 1/

Can we precisely erase conceptual knowledge from LLM parameters?
Most methods are shallow, coarse, or overreach, adversely affecting related or general knowledge.

We introduce🪝𝐏𝐈𝐒𝐂𝐄𝐒 — a general framework for Precise In-parameter Concept EraSure. 🧵 1/
Chaitanya Malaviya (@cmalaviya11) 's Twitter Profile Photo

Ever wondered what makes language models generate overly verbose, vague, or sycophantic responses? Our new paper investigates these and other idiosyncratic biases in preference models, and presents a simple post-training recipe to mitigate them! Thread below 🧵↓

Ever wondered what makes language models generate overly verbose, vague, or sycophantic responses?

Our new paper investigates these and other idiosyncratic biases in preference models, and presents a simple post-training recipe to mitigate them! Thread below 🧵↓
Tanishq Mathew Abraham, Ph.D. (@iscienceluvr) 's Twitter Profile Photo

Corrector Sampling in Language Models "Autoregressive language models accumulate errors due to their fixed, irrevocable left-to-right token generation. To address this, we propose a new sampling method called Resample-Previous-Tokens (RPT). RPT mitigates error accumulation by

Corrector Sampling in Language Models

"Autoregressive language models accumulate errors due to  their fixed, irrevocable left-to-right token generation. To address  this, we propose a new sampling method called Resample-Previous-Tokens  (RPT). RPT mitigates error accumulation by
Ricky T. Q. Chen (@rickytqchen) 's Twitter Profile Photo

Padding in our non-AR sequence models? Yuck. 🙅 👉 Instead of unmasking, our new work *Edit Flows* perform iterative refinements via position-relative inserts and deletes, operations naturally suited for variable-length sequence generation. Easily better than using mask tokens.

Yijia Shao (@echoshao8899) 's Twitter Profile Photo

🚨 70 million US workers are about to face their biggest workplace transmission due to AI agents. But nobody asks them what they want. While AI races to automate everything, we took a different approach: auditing what workers want vs. what AI can do across the US workforce.🧵

🚨 70 million US workers are about to face their biggest workplace transmission due to AI agents. But nobody asks them what they want.

While AI races to automate everything, we took a different approach: auditing what workers want vs. what AI can do across the US workforce.🧵
Mor Geva (@megamor2) 's Twitter Profile Photo

✨MLP layers have just become more interpretable than ever ✨ In a new paper: * We show a simple method for decomposing MLP activations into interpretable features * Our method uncovers hidden concept hierarchies, where sparse neuron combinations form increasingly abstract ideas

✨MLP layers have just become more interpretable than ever ✨
In a new paper:
* We show a simple method for decomposing MLP activations into interpretable features
* Our method uncovers hidden concept hierarchies, where sparse neuron combinations form increasingly abstract ideas
Neta Shaul (@shaulneta) 's Twitter Profile Photo

[1/n] New paper alert! 🚀 Excited to introduce 𝐓𝐫𝐚𝐧𝐬𝐢𝐭𝐢𝐨𝐧 𝐌𝐚𝐭𝐜𝐡𝐢𝐧𝐠 (𝐓𝐌)! We're replacing short-timestep kernels from Flow Matching/Diffusion with... a generative model🤯, achieving SOTA text-2-image generation! Uriel Singer Itai Gat Yaron Lipman

Ido Cohen (@idoc0hen) 's Twitter Profile Photo

A Vision-Language Model can answer questions about Robin Williams. It can also recognize him in a photo. So why does it FAIL when asked the same questions using his photo instead of his name? A thread on our new #acl2025 paper that explores this puzzle 🧵

A Vision-Language Model can answer questions about Robin Williams. It can also recognize him in a photo. So why does it FAIL when asked the same questions using his photo instead of his name?

A thread on our new #acl2025  paper that explores this puzzle 🧵
Itay Itzhak (@itay_itzhak_) 's Twitter Profile Photo

🚨New paper alert🚨 🧠 Instruction-tuned LLMs show amplified cognitive biases — but are these new behaviors, or pretraining ghosts resurfacing? Excited to share our new paper, accepted to CoLM 2025🎉! See thread below 👇 #BiasInAI #LLMs #MachineLearning #NLProc

🚨New paper alert🚨

🧠 
Instruction-tuned LLMs show amplified cognitive biases — but are these new behaviors, or pretraining ghosts resurfacing?

Excited to share our new paper, accepted to CoLM 2025🎉!
See thread below 👇
#BiasInAI #LLMs #MachineLearning #NLProc