Pete Shaw (@ptshaw2) 's Twitter Profile
Pete Shaw

@ptshaw2

Research Scientist @GoogleDeepmind

ID: 1075460972

linkhttp://ptshaw.com calendar_today10-01-2013 02:28:10

85 Tweet

564 Followers

432 Following

Clare Lyle (@clarelyle) 's Twitter Profile Photo

📣📣 My team at Google DeepMind is hiring a student researcher for summer/fall 2025 in Seattle! If you're a PhD student interested in getting deep RL to (finally) work reliably in interesting domains, apply at the link below and reach out to me via email so I know you aplied👇

📣📣 My team at Google DeepMind is hiring a student researcher for summer/fall 2025 in Seattle! If you're a PhD student interested in getting deep RL to (finally) work reliably in interesting domains, apply at the link below and reach out to me via email so I know you aplied👇
Jonathan Berant (@jonathanberant) 's Twitter Profile Photo

Hi ho! New work: arxiv.org/pdf/2503.14481 With amazing collabs Jacob Eisenstein Reza Aghajani Adam Fisch dheeru dua Fantine Huot ✈️ ICLR 25 Mirella Lapata Vicky Zayats Some things are easier to learn in a social setting. We show agents can learn to faithfully express their beliefs (along... 1/3

Hi ho!

New work: arxiv.org/pdf/2503.14481
With amazing collabs <a href="/jacobeisenstein/">Jacob Eisenstein</a> <a href="/jdjdhekchbdjd/">Reza Aghajani</a> <a href="/adamjfisch/">Adam Fisch</a> <a href="/ddua17/">dheeru dua</a> <a href="/fantinehuot/">Fantine Huot ✈️ ICLR 25</a> <a href="/mlapata/">Mirella Lapata</a> <a href="/vicky_zayats/">Vicky Zayats</a>

Some things are easier to learn in a social setting. We show agents can learn to faithfully express their beliefs (along... 1/3
Xing Han Lu (@xhluca) 's Twitter Profile Photo

AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories We are releasing the first benchmark to evaluate how well automatic evaluators, such as LLM judges, can evaluate web agent trajectories. We find that rule-based evals underreport success rates, and

AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories  

We are releasing the first benchmark to evaluate how well automatic evaluators, such as LLM judges, can evaluate web agent trajectories.

We find that rule-based evals underreport success rates, and
Pete Shaw (@ptshaw2) 's Twitter Profile Photo

This was my first time submitting to TMLR, and thanks to the reviewers and AE Alessandro Sordoni for making it a positive experience! TMLR seems to offer some nice pros vs. ICML/ICLR/NeurIPS, eg: - Potentially lower variance review process - Not dependent on conference calendar

Jacob Eisenstein (@jacobeisenstein) 's Twitter Profile Photo

We're hiring a research scientist on the Foundational Research in Language team at GDM. The role is right here in sunny Seattle! job-boards.greenhouse.io/deepmind/jobs/…

Lucas Saldyt (@saldytlucas) 's Twitter Profile Photo

Neural networks can express more than they learn, creating expressivity-trainability gaps. Our paper, “Mind The Gap,” shows neural networks best learn parallel algorithms, and analyzes gaps in faithfulness and effectiveness. Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)

Neural networks can express more than they learn, creating expressivity-trainability gaps. Our paper, “Mind The Gap,” shows neural networks best learn parallel algorithms, and analyzes gaps in faithfulness and effectiveness. <a href="/rao2z/">Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)</a>
Xing Han Lu (@xhluca) 's Twitter Profile Photo

i will be presenting AgentRewardBench at #COLM2025 next week! session: #3 date: wednesday 11am to 1pm poster: #545 come learn more about the paper, my recent works or just chat about anything (montreal, mila, etc.) here's a teaser of my poster :)

i will be presenting AgentRewardBench at 
#COLM2025 next week!

session: #3
date: wednesday 11am to 1pm
poster: #545

come learn more about the paper, my recent works or just chat about anything (montreal, mila, etc.)

here's a teaser of my poster :)
Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

The paper links Kolmogorov complexity to Transformers and proposes loss functions that become provably best as model resources grow. It treats learning as compression, minimize bits to describe the model plus bits to describe the labels. Provides a single training target that

The paper links Kolmogorov complexity to Transformers and proposes loss functions that become provably best as model resources grow.

It treats learning as compression, minimize bits to describe the model plus bits to describe the labels.

Provides a single training target that
Conference on Language Modeling (@colm_conf) 's Twitter Profile Photo

Outstanding paper 3🏆: Don't lie to your friends: Learning what you know from collaborative self-play openreview.net/forum?id=2vDJi…

Outstanding paper 3🏆: Don't lie to your friends: Learning what you know from collaborative self-play
openreview.net/forum?id=2vDJi…
Google DeepMind (@googledeepmind) 's Twitter Profile Photo

Our new Gemini 2.5 Computer Use model can navigate browsers just like you do. 🌐 It builds on Gemini’s visual understanding and reasoning capabilities to power agents that can click, scroll and type for you online - setting a new standard on multiple benchmarks, with faster

Our new Gemini 2.5 Computer Use model can navigate browsers just like you do. 🌐

It builds on Gemini’s visual understanding and reasoning capabilities to power agents that can click, scroll and type for you online - setting a new standard on multiple benchmarks, with faster
Sundar Pichai (@sundarpichai) 's Twitter Profile Photo

Our new Gemini 2.5 Computer Use model is now available in the Gemini API, setting a new standard on multiple benchmarks with lower latency. These are early days, but the model’s ability to interact with the web – like scrolling, filling forms + navigating dropdowns – is an

Our new Gemini 2.5 Computer Use model is now available in the Gemini API, setting a new standard on multiple benchmarks with lower latency. These are early days, but the model’s ability to interact with the web – like scrolling, filling forms + navigating dropdowns – is an