Sumeet Motwani (@sumeetrm) 's Twitter Profile
Sumeet Motwani

@sumeetrm

ML PhD at Oxford, Previously CS at UC Berkeley

ID: 1757471242670813187

linkhttp://sumeetmotwani.com calendar_today13-02-2024 18:26:17

223 Tweet

1,1K Takipçi

1,1K Takip Edilen

Geoffrey Irving (@geoffreyirving) 's Twitter Profile Photo

New alignment theory paper! We present a new scalable oversight protocol (prover-estimator debate) and a proof that honesty is incentivised at equilibrium (with large assumptions, see 🧵), even when the AIs involved have similar available compute.

New alignment theory paper! We present a new scalable oversight protocol (prover-estimator debate) and a proof that honesty is incentivised at equilibrium (with large assumptions, see 🧵), even when the AIs involved have similar available compute.
Amrith Setlur (@setlur_amrith) 's Twitter Profile Photo

Since R1 there has been a lot of chatter 💬 on post-training LLMs with RL. Is RL only sharpening the distribution over correct responses sampled by the pretrained LLM OR is it exploring and discovering new strategies 🤔? Find answers in our latest post ⬇️ tinyurl.com/rlshadis

Runjia Li (@runjiali) 's Twitter Profile Photo

🎉 VMem is officially accepted to ICCV 2025! Excited to chat with everyone in Hawaii about making video generation consistent and interactive with our Surfel-Indexed View Memory 🏝️🎥 Also, huge thanks to my insanely helpful coauthors!

Jelani Nelson (@minilek) 's Twitter Profile Photo

Today is my first as department chair at UC Berkeley EECS When I left Harvard 6 yrs ago, I just saw it as moving from one great place to another. Then, it hit me at a state school I'm also now a public servant, and that fact weighs on me every day. I aim to serve the best I can.

Shashwat Goel (@shashwatgoel7) 's Twitter Profile Photo

There's been a hole at the heart of #LLM evals, and we can now fix it. 📜New paper: Answer Matching Outperforms Multiple Choice for Language Model Evaluations. ❗️We found MCQs can be solved without even knowing the question. Looking at just the choices helps guess the answer

There's been a hole at the heart of #LLM evals, and we can now fix it.

📜New paper: Answer Matching Outperforms Multiple Choice for Language Model Evaluations.

❗️We found MCQs can be solved without even knowing the question. Looking at just the choices helps guess the answer
Nikhil Chandak (@nikhilchandak29) 's Twitter Profile Photo

🚨 Ever wondered how much you can ace popular MCQ benchmarks without even looking at the questions? 🤯 Turns out, you can often get significant accuracy just from the choices alone. This is true even on recent benchmarks with 10 choices (like MMLU-Pro) and their vision

🚨 Ever wondered how much you can ace popular MCQ benchmarks without even looking at the questions? 🤯

Turns out, you can often get significant accuracy just from the choices alone. This is true even on recent benchmarks with 10 choices (like MMLU-Pro) and their vision
Jakob Foerster (@j_foerst) 's Twitter Profile Photo

In May I missed a single email from openreview saying I'd be auto-enlisted as a reviewer. Then a few ACs missed my immediate and repeated messages on openreview saying that I won't be able to review since I'll be taking the second half of my paternity leave. Now all of my

Sumeet Motwani (@sumeetrm) 's Twitter Profile Photo

MALT has been accepted at Conference on Language Modeling 2025! Incredibly grateful for my amazing coauthors. The future holds an 'improve answer' button on every AI assistant🫡

MALT has been accepted at <a href="/COLM_conf/">Conference on Language Modeling</a> 2025! Incredibly grateful for my amazing coauthors.

The future holds an 'improve answer' button on every AI assistant🫡
Sumeet Motwani (@sumeetrm) 's Twitter Profile Photo

I've joined the Phi Team at Microsoft AI Frontiers as a Research Intern this summer! I'll be working with Shital Shah on new RL post-training directions for unverifiable domains and am very excited to be back in the land of compute🫡

I've joined the Phi Team at Microsoft AI Frontiers as a Research Intern this summer!

I'll be working with <a href="/sytelus/">Shital Shah</a> on new RL post-training directions for unverifiable domains and am very excited to be back in the land of compute🫡
Pratyush Maini (@pratyushmaini) 's Twitter Profile Photo

Last year we released our work on dataset inference. This work unlocks an important limitation of dataset inference: need for a held-our validation set, by a recipe that robustly synthesizes data that is IID to the train distribution. This makes DI possible post-hoc. Come over!

James Alcorn (@jamesalcorn94) 's Twitter Profile Photo

Plenty of brittle + narrow tooling in this wild west era of codegen can be characterized—not unfairly, and with just a hint of sarcasm—as off-the-shelf LLMs whose untreated amnesia only worsens an already-acute addiction (kink?) to github READMEs. We're a far cry from Fred

Dulhan Jayalath (@dulhanjay) 's Twitter Profile Photo

Come and find me today at #ICML2025 and let's talk about speech 💬 decoding from the brain and scaling brain-computer interfaces 🤖. 11 am -1:30 pm, West Exhibition Hall, Poster W-415

Sanjeev Arora (@prfsanjeevarora) 's Twitter Profile Photo

Completely misses the point. Nobody is suggesting that solving IMO problems is useful for math research. The point is that AI has become really good at complex reasoning, and is not just memorizing its training data. It can handle completely new IMO questions designed by a

Guohao Li (Hiring!) 🐫 (@guohao_li) 's Twitter Profile Photo

Introducing Eigent — the first multi-agent workforce on your desktop. Eigent is a team of AI agents collaborating to complete complex tasks in parallel. It is your long-term working partner with fullly customizable workers and MCPs. Public beta available to download for MacOS,

Oleksii Kuchaiev (@kuchaev) 's Twitter Profile Photo

Everything about Llama-Nemotron-Super-V1.5 post-training is now open: Synthetic data: huggingface.co/datasets/nvidi… Human data: huggingface.co/datasets/nvidi… Reward models (trained on HS3 data): huggingface.co/collections/nv… RL toolkit: github.com/NVIDIA-NeMo/RL