Daniel Fried (@dan_fried) 's Twitter Profile
Daniel Fried

@dan_fried

Assistant prof. @LTIatCMU @SCSatCMU; Research scientist at @AIatMeta. Working on NLP: language interfaces, applied pragmatics, language-to-code, grounding.

ID: 1693446193

linkhttps://dpfried.github.io/ calendar_today23-08-2013 09:56:46

864 Tweet

3,3K Takipçi

851 Takip Edilen

Yu Su @#ICLR2025 (@ysu_nlp) 's Twitter Profile Photo

🔥2025 is the year of agents, but are we there yet?🤔 🤯 "An Illusion of Progress? Assessing the Current State of Web Agents" –– our new study shows that frontier web agents may be far less competent (up to 59%) than previously reported! Why were benchmark numbers inflated? -

🔥2025 is the year of agents, but are we there yet?🤔

🤯 "An Illusion of Progress? Assessing the Current State of Web Agents" –– our new study shows that frontier web agents may be far less competent (up to 59%) than previously reported!

Why were benchmark numbers inflated?
-
Jacob Springer (@jacspringer) 's Twitter Profile Photo

Training with more data = better LLMs, right? 🚨 False! Scaling language models by adding more pre-training data can decrease your performance after post-training! Introducing "catastrophic overtraining." 🥁🧵+arXiv 👇 1/9

Training with more data = better LLMs, right? 🚨

False! Scaling language models by adding more pre-training data can decrease your performance after post-training!

Introducing "catastrophic overtraining." 🥁🧵+arXiv 👇

1/9
Graham Neubig (@gneubig) 's Twitter Profile Photo

Today's a big day! Months of work went into both of these releases, so we hope people enjoy them. OpenHands is now a great coding agent that you can run entirely locally (w/ OpenHands LM), and a great coding agent that you can run anywhere (w/ OpenHands Cloud).

Bowen Wang (@bowenwangnlp) 's Twitter Profile Photo

🎮 Computer Use Agent Arena is LIVE! 🚀 🔥 Easiest way to test computer-use agents in the wild without any setup 🌟 Compare top VLMs: OpenAI Operator, Claude 3.7, Gemini 2.5 Pro, Qwen 2.5 vl and more 🕹️ Test agents on 100+ real apps & webs with one-click config 🔒 Safe & free

ML@CMU (@mlcmublog) 's Twitter Profile Photo

blog.ml.cmu.edu/2025/04/09/cop… How do real-world developer preferences compare to existing evaluations? A CMU and UC Berkeley team led by Wayne Chi and Valerie Chen created Copilot Arena to collect user preferences on in-the-wild workflows. This blogpost overviews the  design and

blog.ml.cmu.edu/2025/04/09/cop…

How do real-world developer preferences compare to existing evaluations? A CMU and UC Berkeley team led by <a href="/iamwaynechi/">Wayne Chi</a> and <a href="/valeriechen_/">Valerie Chen</a> created <a href="/CopilotArena/">Copilot Arena</a> to collect user preferences on in-the-wild workflows. This blogpost overviews the  design and
Sean Welleck (@wellecks) 's Twitter Profile Photo

Had a fun time giving the tutorial at Simons Institute for the Theory of Computing! Here are the materials: Transformers for Mathematics Tutorial - Slides: wellecks.com/transformers4m… - Code/exercises: github.com/wellecks/trans…

Had a fun time giving the tutorial at <a href="/SimonsInstitute/">Simons Institute for the Theory of Computing</a>! Here are the materials:

Transformers for Mathematics Tutorial

- Slides: wellecks.com/transformers4m…
- Code/exercises: github.com/wellecks/trans…
Graham Neubig (@gneubig) 's Twitter Profile Photo

A big two days of agents starting tomorrow at CMU (and then two days of agent hackathon after that!) Registration is still open so if you're in or around Pittsburgh come one come all: cmu-agent-workshop.github.io We also plan to livestream for participants who can't make it in person

Xing Han Lu (@xhluca) 's Twitter Profile Photo

AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories We are releasing the first benchmark to evaluate how well automatic evaluators, such as LLM judges, can evaluate web agent trajectories. We find that rule-based evals underreport success rates, and

AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories  

We are releasing the first benchmark to evaluate how well automatic evaluators, such as LLM judges, can evaluate web agent trajectories.

We find that rule-based evals underreport success rates, and
Christina Baek (@_christinabaek) 's Twitter Profile Photo

Are current reasoning models optimal for test-time scaling? 🌠 No! Models make the same incorrect guess over and over again. We show that you can fix this problem w/o any crazy tricks 💫 – just do weight ensembling (WiSE-FT) for big gains on math! 1/N

Are current reasoning models optimal for test-time scaling? 🌠
No! Models make the same incorrect guess over and over again.

We show that you can fix this problem w/o any crazy tricks 💫 – just do weight ensembling (WiSE-FT) for big gains on math!

1/N
Daniel Fried (@dan_fried) 's Twitter Profile Photo

Zora's latest work shows that program induction / tool learning benefits web agents: large improvements in success & efficiency, when agents create their own tools to make tasks easier. I'm excited about programs for more controllable & verifiable agents in settings like these!

Deep Learning For Code @ ICLR'25 (@dl4code) 's Twitter Profile Photo

🚀 ICLR week is upon us! Join us at the #DL4C Workshop to hear Xingyao Wang (Xingyao Wang) discuss LLMs evolving into SE agents, covering the CodeAct framework (code exec as action), the OpenHands platform (dev-like generalist agents), & SWE-Gym (real-world task training).

Zora Wang (@zhiruow) 's Twitter Profile Photo

Couldn't agree more on agent "continually adapt" from "streamed experiences"! This is exactly what we've envisioned in building online adaptive agents with self-induced evolving memory & skills in AWM (arxiv.org/abs/2409.07429) and ASI (arxiv.org/abs/2504.06821)! Yet still some

Deep Learning For Code @ ICLR'25 (@dl4code) 's Twitter Profile Photo

Just 6 days until #DL4C! 🗓️ Daniel Fried (CMU / Meta AI) Daniel Fried AI at Meta will be sharing insights on how inducing functions from code makes LLM agents smarter and more efficient. Don't miss it! See you Sunday! #ICLR2025 #iclr

Prithviraj (Raj) Ammanabrolu (@rajammanabrolu) 's Twitter Profile Photo

The future of embodied AI revolves around *collaborative* multi agent scenarios that need natural language communication, task delegation, resource sharing, and more ⛏️ Here are MINDcraft and MineCollab, a simulator and benchmark purpose built to enable research in this area!

Elias Stengel-Eskin (on the faculty job market) (@eliaseskin) 's Twitter Profile Photo

Extremely excited to announce that I will be joining UT Austin Computer Science at UT Austin in August 2025 as an Assistant Professor! 🎉 I’m looking forward to continuing to develop AI agents that interact/communicate with people, each other, and the multimodal world. I’ll be recruiting PhD

Extremely excited to announce that I will be joining <a href="/UTAustin/">UT Austin</a> <a href="/UTCompSci/">Computer Science at UT Austin</a> in August 2025 as an Assistant Professor! 🎉

I’m looking forward to continuing to develop AI agents that interact/communicate with people, each other, and the multimodal world. I’ll be recruiting PhD
Philippe Laban (@philippelaban) 's Twitter Profile Photo

🆕paper: LLMs Get Lost in Multi-Turn Conversation In real life, people don’t speak in perfect prompts. So we simulate multi-turn conversations — less lab-like, more like real use. We find that LLMs get lost in conversation. 👀What does that mean? 🧵1/N 📄arxiv.org/abs/2505.06120

🆕paper: LLMs Get Lost in Multi-Turn Conversation

In real life, people don’t speak in perfect prompts.
So we simulate multi-turn conversations — less lab-like, more like real use.

We find that LLMs get lost in conversation.
👀What does that mean? 🧵1/N
📄arxiv.org/abs/2505.06120
Wenting Zhao (@wzhao_nlp) 's Twitter Profile Photo

Some personal news: I'll join UMass Amherst CS as an assistant professor in fall 2026. Until then, I'll postdoc at Meta nyc. Reasoning will continue to be my main interest, with a focus on data-centric approaches🤩 If you're also interested, apply to me (phds & a postdoc)!

Jaemin Cho (on faculty job market) (@jmin__cho) 's Twitter Profile Photo

Sharing some personal updates 🥳: - I've completed my PhD at UNC Computer Science! 🎓 - Starting Fall 2026, I'll be joining the Computer Science dept. at Johns Hopkins University (JHU Computer Science) as an Assistant Professor 💙 - Currently exploring options + finalizing the plan for my gap year (Aug

Sharing some personal updates 🥳:
- I've completed my PhD at <a href="/unccs/">UNC Computer Science</a>! 🎓
- Starting Fall 2026, I'll be joining the Computer Science dept. at Johns Hopkins University (<a href="/JHUCompSci/">JHU Computer Science</a>) as an Assistant Professor 💙
- Currently exploring options + finalizing the plan for my gap year (Aug
Kayo Yin (@kayo_yin) 's Twitter Profile Photo

Happy to announce the first workshop on Pragmatic Reasoning in Language Models — PragLM @ COLM 2025! 🧠🎉 How do LLMs engage in pragmatic reasoning, and what core pragmatic capacities remain beyond their reach? 🌐 sites.google.com/berkeley.edu/p… 📅 Submit by June 23rd