roeeaharoni (@roeeaharoni) 's Twitter Profile
roeeaharoni

@roeeaharoni

Research Scientist @GoogleAI Tel-Aviv | Phd @BIUNLP Lab

ID: 301992639

linkhttp://roeeaharoni.com calendar_today20-05-2011 12:20:37

4,4K Tweet

1,1K Followers

769 Following

Eliahu Horwitz | @ ICLR2025 (@eliahuhorwitz) 's Twitter Profile Photo

🚨 New paper alert! 🚨 Millions of neural networks now populate public repositories like Hugging Face 🤗, but most lack documentation. So, we decided to build an Atlas 🗺️ Project: horwitz.ai/model-atlas Demo: huggingface.co/spaces/Eliahu/… 🧵👇🏻 Here's what we found:

🚨 New paper alert! 🚨

Millions of neural networks now populate public repositories like Hugging Face 🤗, but most lack documentation. So, we decided to build an Atlas 🗺️

Project: horwitz.ai/model-atlas
Demo: huggingface.co/spaces/Eliahu/…

🧵👇🏻 Here's what we found:
Jonathan Berant (@jonathanberant) 's Twitter Profile Photo

Hi ho! New work: arxiv.org/pdf/2503.14481 With amazing collabs Jacob Eisenstein Reza Aghajani Adam Fisch dheeru dua Fantine Huot ✈️ ICLR 25 Mirella Lapata Vicky Zayats Some things are easier to learn in a social setting. We show agents can learn to faithfully express their beliefs (along... 1/3

Hi ho!

New work: arxiv.org/pdf/2503.14481
With amazing collabs <a href="/jacobeisenstein/">Jacob Eisenstein</a> <a href="/jdjdhekchbdjd/">Reza Aghajani</a> <a href="/adamjfisch/">Adam Fisch</a> <a href="/ddua17/">dheeru dua</a> <a href="/fantinehuot/">Fantine Huot ✈️ ICLR 25</a> <a href="/mlapata/">Mirella Lapata</a> <a href="/vicky_zayats/">Vicky Zayats</a>

Some things are easier to learn in a social setting. We show agents can learn to faithfully express their beliefs (along... 1/3
siddharth ahuja (@sidahuj) 's Twitter Profile Photo

🎵💿Built an MCP that lets Claude talk directly to Ableton. Now you can create music with just prompts! Here’s a demo of me creating a lush, 80s synthwave track in just two prompts. It picks the right instruments, creates melodies, and adds effects like reverb and distortion 🔊

Leshem Choshen C U @ ICLR 🤖🤗 (@lchoshen) 's Twitter Profile Photo

🚀 "Multilingual" LLMs are really just clusters of monolingual ones! They might know a Brazilian 🇧🇷 brewery—but only in Portuguese With ECLEKTIC, you can now test this. The challenge? Making them truly multilingual. 🧵⬇️ omer goldman Uri Shaham ... Reut Tsarfaty Matan Eyal Google AI

🚀 "Multilingual" LLMs are really just clusters of monolingual ones!
They might know a Brazilian 🇧🇷 brewery—but only in Portuguese
With ECLEKTIC, you can now test this. The challenge? Making them truly multilingual. 🧵⬇️
<a href="/omerNLP/">omer goldman</a> <a href="/Uri_Shaham/">Uri Shaham</a> ... <a href="/rtsarfaty/">Reut Tsarfaty</a> <a href="/mataneyal1/">Matan Eyal</a> 
<a href="/GoogleAI/">Google AI</a>
Jonas Pfeiffer (@pfeiffjo) 's Twitter Profile Photo

I am hiring a Student Researcher for our Modularity team at the Google DeepMind office in Zurich🇨🇭 Please fill out the interest form if you would like to work with us! The role would start mid/end 2025 and would be in-person in Zurich with 80-100% at GDM forms.gle/N94ViTmKHCCAcv…

Jews Fight Back 🇺🇸🇮🇱 (@jewsfightback) 's Twitter Profile Photo

On the night of October 6th, 2023, after over six months of total silence, Columbia SJP suddenly posted a message: “We are back!” It featured a map of Israel with the Arabic phrase “revolution until victory!” They posted it just minutes after Hamas began its brutal October 7th

On the night of October 6th, 2023, after over six months of total silence, Columbia SJP suddenly posted a message: “We are back!”
It featured a map of Israel with the Arabic phrase “revolution until victory!”

They posted it just minutes after Hamas began its brutal October 7th
lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

BREAKING: Gemini 2.5 Pro is now #1 on the Arena leaderboard - the largest score jump ever (+40 pts vs Grok-3/GPT-4.5)! 🏆 Tested under codename "nebula"🌌, Gemini 2.5 Pro ranked #1🥇 across ALL categories and UNIQUELY #1 in Math, Creative Writing, Instruction Following, Longer

BREAKING: Gemini 2.5 Pro is now #1 on the Arena leaderboard - the largest score jump ever (+40 pts vs Grok-3/GPT-4.5)! 🏆

Tested under codename "nebula"🌌, Gemini 2.5 Pro ranked #1🥇 across ALL categories and UNIQUELY #1 in Math, Creative Writing, Instruction Following, Longer
Ihab Hassan (@ihabhassane) 's Twitter Profile Photo

🚨BREAKING: The Assembly of Southern Gaza Clans has issued a powerful call for a popular uprising (Intifada) against Hamas, accusing the group of gambling with Palestinians’ lives for its own narrow, self-serving interests.

🚨BREAKING: The Assembly of Southern Gaza Clans has issued a powerful call for a popular uprising (Intifada) against Hamas, accusing the group of gambling with Palestinians’ lives for its own narrow, self-serving interests.
(((ل()(ل() 'yoav))))👾 (@yoavgo) 's Twitter Profile Photo

i find information seeking to be one of the most exciting opportunities and at the same time one of the most challenging problems for "agentic" AI. Paper finder is an initial exploration of this. Try it out, use it daily, make it fail, we need to see some challenging queries!

Zorik Gekhman (@zorikgekhman) 's Twitter Profile Photo

🚨 It's often claimed that LLMs know more facts than they show in their outputs, but what does this actually mean, and how can we measure this “hidden knowledge”? In our new paper, we clearly define this concept and design controlled experiments to test it. 1/🧵

🚨 It's often claimed that LLMs know more facts than they show in their outputs, but what does this actually mean, and how can we measure this “hidden knowledge”?

In our new paper, we clearly define this concept and design controlled experiments to test it.
1/🧵
omer goldman (@omernlp) 's Twitter Profile Photo

Wanna check how well a model can share knowledge between languages? Of course you do! 🤩 But can you do it without access to the model’s weights? Now you can with ECLeKTic 🤯

Wanna check how well a model can share knowledge between languages? Of course you do! 🤩

But can you do it without access to the model’s weights? Now you can with ECLeKTic 🤯
Google AI (@googleai) 's Twitter Profile Photo

Introducing ECLeKTic, a new benchmark for Evaluating Cross-Lingual Knowledge Transfer in LLMs. It uses a closed-book QA task, where models must rely on internal knowledge to answer questions based on information captured only in a single language. More →goo.gle/3Y5TqvZ

Introducing ECLeKTic, a new benchmark for Evaluating Cross-Lingual Knowledge Transfer in LLMs. It uses a closed-book QA task, where models must rely on internal knowledge to answer questions based on information captured only in a single language. More →goo.gle/3Y5TqvZ
Google AI (@googleai) 's Twitter Profile Photo

We just announced Geospatial Reasoning, a new research effort that will use generative AI (#Gemini) to help unlock novel insights about our world. 🌍 Imagine asking complex Qs about our planet & getting detailed answers w/ plans & visuals! goo.gle/3XOO5sF

We just announced Geospatial Reasoning, a new research effort that will use generative AI (#Gemini) to help unlock novel insights about our world. 🌍

Imagine asking complex Qs about our planet &amp; getting detailed answers w/ plans &amp; visuals! goo.gle/3XOO5sF
Gal Yona (@_galyo) 's Twitter Profile Photo

my completely personal take: Llama-4 blatantly gaming the Chatbot Arena evals (beyond being a neat example of Goodhart’s law in action!) is an important moment for the NLP community ⏩

Jason Wei (@_jasonwei) 's Twitter Profile Photo

New benchmark for deep research agents! An agent that is creative and persistent should be able to find any piece of information on the open web, even if it requires browsing hundreds of webpages. Models that exercise this ability are like a frictionless interface to the

New benchmark for deep research agents! An agent that is creative and persistent should be able to find any piece of information on the open web, even if it requires browsing hundreds of webpages. Models that exercise this ability are like a frictionless interface to the
lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

Exciting News! Search Arena Leaderboard🌐 🥇 Gemini-2.5-Pro-Grounding and Perplexity-Sonar-Reasoning-Pro top the leaderboard! Congrats Google DeepMind and Perplexity! 📊 We've open-sourced 7k battles with user votes! 📝 Check out our blog post for detailed analysis. Blog

Exciting News! Search Arena Leaderboard🌐

🥇 Gemini-2.5-Pro-Grounding and Perplexity-Sonar-Reasoning-Pro top the leaderboard! Congrats <a href="/GoogleDeepMind/">Google DeepMind</a> and <a href="/perplexity_ai/">Perplexity</a>!

📊 We've open-sourced 7k battles with user votes!
📝 Check out our blog post for detailed analysis.

Blog
Dipanjan Das (@dipanjand) 's Twitter Profile Photo

Super proud to see this result. The model behavior of integrating search into Gemini is something my team members, collaborators on Gemini and I have been working on for a while now and we're very happy to see this result. A lot more is coming soon.