Pete Shaw (@ptshaw2) Twitter Tweets • TwiCopy

Clare Lyle

8 months ago

📣📣 My team at Google DeepMind is hiring a student researcher for summer/fall 2025 in Seattle! If you're a PhD student interested in getting deep RL to (finally) work reliably in interesting domains, apply at the link below and reach out to me via email so I know you aplied👇

thumb_up_off_alt622

chat_bubble_outline7

repeat74

shareShare

Jonathan Berant

@jonathanberant

8 months ago

Hi ho! New work: arxiv.org/pdf/2503.14481 With amazing collabs Jacob Eisenstein Reza Aghajani Adam Fisch dheeru dua Fantine Huot ✈️ ICLR 25 Mirella Lapata Vicky Zayats Some things are easier to learn in a social setting. We show agents can learn to faithfully express their beliefs (along... 1/3

Hi ho!

New work: arxiv.org/pdf/2503.14481
With amazing collabs <a href="/jacobeisenstein/">Jacob Eisenstein</a> <a href="/jdjdhekchbdjd/">Reza Aghajani</a> <a href="/adamjfisch/">Adam Fisch</a> <a href="/ddua17/">dheeru dua</a> <a href="/fantinehuot/">Fantine Huot ✈️ ICLR 25</a> <a href="/mlapata/">Mirella Lapata</a> <a href="/vicky_zayats/">Vicky Zayats</a>

Some things are easier to learn in a social setting. We show agents can learn to faithfully express their beliefs (along... 1/3

thumb_up_off_alt61

chat_bubble_outline2

repeat17

shareShare

Xing Han Lu

@xhluca

8 months ago

AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories We are releasing the first benchmark to evaluate how well automatic evaluators, such as LLM judges, can evaluate web agent trajectories. We find that rule-based evals underreport success rates, and

thumb_up_off_alt230

chat_bubble_outline4

repeat100

shareShare

Pete Shaw

@ptshaw2

7 months ago

This was my first time submitting to TMLR, and thanks to the reviewers and AE Alessandro Sordoni for making it a positive experience! TMLR seems to offer some nice pros vs. ICML/ICLR/NeurIPS, eg: - Potentially lower variance review process - Not dependent on conference calendar

thumb_up_off_alt35

chat_bubble_outline2

repeat4

shareShare

Jacob Eisenstein

@jacobeisenstein

6 months ago

We're hiring a research scientist on the Foundational Research in Language team at GDM. The role is right here in sunny Seattle! job-boards.greenhouse.io/deepmind/jobs/…

thumb_up_off_alt38

chat_bubble_outline0

repeat6

shareShare

Lucas Saldyt

@saldytlucas

6 months ago

Neural networks can express more than they learn, creating expressivity-trainability gaps. Our paper, “Mind The Gap,” shows neural networks best learn parallel algorithms, and analyzes gaps in faithfulness and effectiveness. Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)

thumb_up_off_alt253

chat_bubble_outline4

repeat25

shareShare

Xing Han Lu

@xhluca

5 months ago

AgentRewardBench will be presented at Conference on Language Modeling 2025 in Montreal! See you soon and ping me if you want to meet up!

thumb_up_off_alt44

chat_bubble_outline2

repeat11

shareShare

Xing Han Lu

@xhluca

2 months ago

i will be presenting AgentRewardBench at #COLM2025 next week! session: #3 date: wednesday 11am to 1pm poster: #545 come learn more about the paper, my recent works or just chat about anything (montreal, mila, etc.) here's a teaser of my poster :)

thumb_up_off_alt34

chat_bubble_outline1

repeat6

shareShare

Rohan Paul

@rohanpaul_ai

2 months ago

The paper links Kolmogorov complexity to Transformers and proposes loss functions that become provably best as model resources grow. It treats learning as compression, minimize bits to describe the model plus bits to describe the labels. Provides a single training target that

thumb_up_off_alt292

chat_bubble_outline9

repeat50

shareShare

Conference on Language Modeling

@colm_conf

2 months ago

Outstanding paper 3🏆: Don't lie to your friends: Learning what you know from collaborative self-play openreview.net/forum?id=2vDJi…

thumb_up_off_alt36

chat_bubble_outline1

repeat9

shareShare

Google DeepMind

@googledeepmind

2 months ago

Our new Gemini 2.5 Computer Use model can navigate browsers just like you do. 🌐 It builds on Gemini’s visual understanding and reasoning capabilities to power agents that can click, scroll and type for you online - setting a new standard on multiple benchmarks, with faster

thumb_up_off_alt2,2K

chat_bubble_outline113

repeat364

shareShare

Sundar Pichai

@sundarpichai

2 months ago

Our new Gemini 2.5 Computer Use model is now available in the Gemini API, setting a new standard on multiple benchmarks with lower latency. These are early days, but the model’s ability to interact with the web – like scrolling, filling forms + navigating dropdowns – is an

thumb_up_off_alt3,3K

chat_bubble_outline119

repeat308

shareShare