Sumeet Motwani (@sumeetrm) Twitter Tweets • TwiCopy

Geoffrey Irving

5 months ago

New alignment theory paper! We present a new scalable oversight protocol (prover-estimator debate) and a proof that honesty is incentivised at equilibrium (with large assumptions, see 🧵), even when the AIs involved have similar available compute.

thumb_up_off_alt325

chat_bubble_outline6

repeat51

shareShare

Amrith Setlur

@setlur_amrith

5 months ago

Since R1 there has been a lot of chatter 💬 on post-training LLMs with RL. Is RL only sharpening the distribution over correct responses sampled by the pretrained LLM OR is it exploring and discovering new strategies 🤔? Find answers in our latest post ⬇️ tinyurl.com/rlshadis

thumb_up_off_alt147

chat_bubble_outline2

repeat25

shareShare

Runjia Li

@runjiali

5 months ago

🎉 VMem is officially accepted to ICCV 2025! Excited to chat with everyone in Hawaii about making video generation consistent and interactive with our Surfel-Indexed View Memory 🏝️🎥 Also, huge thanks to my insanely helpful coauthors!

thumb_up_off_alt61

chat_bubble_outline5

repeat7

shareShare

Sumeet Motwani

@sumeetrm

5 months ago

Time to build vending machine rl envs

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Jelani Nelson

@minilek

5 months ago

Today is my first as department chair at UC Berkeley EECS When I left Harvard 6 yrs ago, I just saw it as moving from one great place to another. Then, it hit me at a state school I'm also now a public servant, and that fact weighs on me every day. I aim to serve the best I can.

thumb_up_off_alt1,1K

chat_bubble_outline49

repeat32

shareShare

Shashwat Goel

@shashwatgoel7

5 months ago

There's been a hole at the heart of #LLM evals, and we can now fix it. 📜New paper: Answer Matching Outperforms Multiple Choice for Language Model Evaluations. ❗️We found MCQs can be solved without even knowing the question. Looking at just the choices helps guess the answer

thumb_up_off_alt229

chat_bubble_outline11

repeat37

shareShare

Nikhil Chandak

@nikhilchandak29

5 months ago

🚨 Ever wondered how much you can ace popular MCQ benchmarks without even looking at the questions? 🤯 Turns out, you can often get significant accuracy just from the choices alone. This is true even on recent benchmarks with 10 choices (like MMLU-Pro) and their vision

thumb_up_off_alt62

chat_bubble_outline3

repeat18

shareShare

Sumeet Motwani

@sumeetrm

5 months ago

👀

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Jakob Foerster

@j_foerst

5 months ago

In May I missed a single email from openreview saying I'd be auto-enlisted as a reviewer. Then a few ACs missed my immediate and repeated messages on openreview saying that I won't be able to review since I'll be taking the second half of my paternity leave. Now all of my

thumb_up_off_alt105

chat_bubble_outline4

repeat8

shareShare

Sumeet Motwani

@sumeetrm

5 months ago

MALT has been accepted at Conference on Language Modeling 2025! Incredibly grateful for my amazing coauthors. The future holds an 'improve answer' button on every AI assistant🫡

MALT has been accepted at <a href="/COLM_conf/">Conference on Language Modeling</a> 2025! Incredibly grateful for my amazing coauthors.

The future holds an 'improve answer' button on every AI assistant🫡

thumb_up_off_alt48

chat_bubble_outline3

repeat4

shareShare

Sumeet Motwani

@sumeetrm

4 months ago

I've joined the Phi Team at Microsoft AI Frontiers as a Research Intern this summer! I'll be working with Shital Shah on new RL post-training directions for unverifiable domains and am very excited to be back in the land of compute🫡

I've joined the Phi Team at Microsoft AI Frontiers as a Research Intern this summer!

I'll be working with <a href="/sytelus/">Shital Shah</a> on new RL post-training directions for unverifiable domains and am very excited to be back in the land of compute🫡

thumb_up_off_alt322

chat_bubble_outline12

repeat5

shareShare

Pratyush Maini

@pratyushmaini

4 months ago

Last year we released our work on dataset inference. This work unlocks an important limitation of dataset inference: need for a held-our validation set, by a recipe that robustly synthesizes data that is IID to the train distribution. This makes DI possible post-hoc. Come over!

thumb_up_off_alt32

chat_bubble_outline0

repeat4

shareShare

Sumeet Motwani

@sumeetrm

4 months ago

🤡

thumb_up_off_alt8

chat_bubble_outline1

repeat0

shareShare

James Alcorn

@jamesalcorn94

4 months ago

Plenty of brittle + narrow tooling in this wild west era of codegen can be characterized—not unfairly, and with just a hint of sarcasm—as off-the-shelf LLMs whose untreated amnesia only worsens an already-acute addiction (kink?) to github READMEs. We're a far cry from Fred

thumb_up_off_alt32

chat_bubble_outline0

repeat5

shareShare

Dulhan Jayalath

@dulhanjay

4 months ago

Come and find me today at #ICML2025 and let's talk about speech 💬 decoding from the brain and scaling brain-computer interfaces 🤖. 11 am -1:30 pm, West Exhibition Hall, Poster W-415

thumb_up_off_alt22

chat_bubble_outline0

repeat3

shareShare

Sumeet Motwani

@sumeetrm

4 months ago

Given the recent IMO results, OAI seems to have figured out reasoning *reliably* with at least 4 Million tokens

thumb_up_off_alt12

chat_bubble_outline1

repeat0

shareShare

Sanjeev Arora

@prfsanjeevarora

4 months ago

Completely misses the point. Nobody is suggesting that solving IMO problems is useful for math research. The point is that AI has become really good at complex reasoning, and is not just memorizing its training data. It can handle completely new IMO questions designed by a

thumb_up_off_alt600

chat_bubble_outline20

repeat38

shareShare

Sumeet Motwani

@sumeetrm

4 months ago

Windows defender just flagged cursor as malware and deleted it💀

thumb_up_off_alt10

chat_bubble_outline0

repeat0

shareShare

Guohao Li (Hiring!) 🐫

@guohao_li

4 months ago

Introducing Eigent — the first multi-agent workforce on your desktop. Eigent is a team of AI agents collaborating to complete complex tasks in parallel. It is your long-term working partner with fullly customizable workers and MCPs. Public beta available to download for MacOS,

thumb_up_off_alt672

chat_bubble_outline135

repeat136

shareShare

Oleksii Kuchaiev

@kuchaev

4 months ago

Everything about Llama-Nemotron-Super-V1.5 post-training is now open: Synthetic data: huggingface.co/datasets/nvidi… Human data: huggingface.co/datasets/nvidi… Reward models (trained on HS3 data): huggingface.co/collections/nv… RL toolkit: github.com/NVIDIA-NeMo/RL

thumb_up_off_alt253

chat_bubble_outline4

repeat49

shareShare