Nitish Joshi (@nitishjoshi23) Twitter Tweets • TwiCopy

Naomi Saphra hiring a lab 🧈🪰

9 months ago

2018: Saliency maps give plausible interpretations of random weights, triggering skepticism and catalyzing the mechinterp cultural movement, which now advocates for SAEs. 2025: SAEs give plausible interpretations of random weights, triggering skepticism and ...

thumb_up_off_alt301

chat_bubble_outline8

repeat24

shareShare

Yanda Chen

@yanda_chen_

8 months ago

My first paper Anthropic is out! We show that Chains-of-Thought often don’t reflect models’ true reasoning—posing challenges for safety monitoring. It’s been an incredible 6 months pushing the frontier toward safe AGI with brilliant colleagues. Huge thanks to the team! 🙏

thumb_up_off_alt1,1K

chat_bubble_outline32

repeat87

shareShare

Richard Pang

@yzpang_

8 months ago

First set of Llama 4!!

thumb_up_off_alt22

chat_bubble_outline0

repeat4

shareShare

Naman Jain @ ICLR

@stringchaos

7 months ago

Excited to release R2E-Gym - 🔥 8.1K executable environments using synthetic data - 🧠 Hybrid verifiers for enhanced inference-time scaling - 📈 51% success-rate on the SWE-Bench Verified - 🤗 Open Source Data + Models + Trajectories 1/

thumb_up_off_alt259

chat_bubble_outline15

repeat62

shareShare

Yulin Chen

@yulinchen99

7 months ago

Reasoning models overthink, generating multiple answers during reasoning. Is it because they can’t tell which ones are right? No! We find while reasoning models encode strong correctness signals during chain-of-thought, they may not use them optimally. 🧵 below

thumb_up_off_alt387

chat_bubble_outline9

repeat77

shareShare

Vishakh Padmakumar

@vishakh_pk

7 months ago

What does it mean for #LLM output to be novel? In work w/ John(Yueh-Han) Chen, Jane Pan, Valerie Chen, He He we argue it needs to be both original and high quality. While prompting tricks trade one for the other, better models (scaling/post-training) can shift the novelty frontier 🧵

What does it mean for #LLM output to be novel?
In work w/ <a href="/jcyhc_ai/">John(Yueh-Han) Chen</a>, <a href="/JanePan_/">Jane Pan</a>, <a href="/valeriechen_/">Valerie Chen</a>, <a href="/hhexiy/">He He</a> we argue it needs to be both original and high quality. While prompting tricks trade one for the other, better models (scaling/post-training) can shift the novelty frontier 🧵

thumb_up_off_alt82

chat_bubble_outline2

repeat22

shareShare

Chaitanya Malaviya

@cmalaviya11

6 months ago

Ever wondered what makes language models generate overly verbose, vague, or sycophantic responses? Our new paper investigates these and other idiosyncratic biases in preference models, and presents a simple post-training recipe to mitigate them! Thread below 🧵↓

thumb_up_off_alt75

chat_bubble_outline1

repeat17

shareShare

Nathan Lambert

@natolambert

6 months ago

Nice to see folks studying biases in RLHF / preference tuning all the way down to the datasets. I think many of the biases are mostly irreducible human biases that can't be solved within current training regimes, just mitigated.

thumb_up_off_alt72

chat_bubble_outline0

repeat8

shareShare

John(Yueh-Han) Chen

@jcyhc_ai

5 months ago

LLMs won’t tell you how to make fake IDs—but will reveal the layouts/materials of IDs and make realistic photos if asked separately. 💥Such decomposition attacks reach 87% success across QA, text-to-image, and agent settings! 🛡️Our monitoring method defends with 93% success! 🧵

thumb_up_off_alt24

chat_bubble_outline1

repeat9

shareShare

Rico Angell

@rico_angell

5 months ago

What causes jailbreaks to transfer between LLMs? We find that jailbreak strength and model representation similarity predict transferability, and we can engineer model similarity to improve transfer. Details in🧵

thumb_up_off_alt51

chat_bubble_outline3

repeat11

shareShare

CLS

@chengleisi

5 months ago

Are AI scientists already better than human researchers? We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts. Main finding: LLM ideas result in worse projects than human ideas.

thumb_up_off_alt553

chat_bubble_outline10

repeat162

shareShare

Michael Hu

@michahu8

5 months ago

📢 today's scaling laws often don't work for predicting downstream task performance. For some pretraining setups, smooth and predictable scaling is the exception, not the rule. a quick read about scaling law fails: 📜arxiv.org/abs/2507.00885 🧵1/5👇

thumb_up_off_alt279

chat_bubble_outline4

repeat36

shareShare

Tuhin Chakrabarty

@tuhinchakr

4 months ago

Honored to get the outstanding position paper award at ICML Conference :) Come attend my talk and poster tomorrow on human centered considerations for a safer and better future of work I will be recruiting PhD students at Stony Brook University Stony Brook University Dept. of Computer Science coming fall. Please get in touch.

Honored to get the outstanding position paper award at <a href="/icmlconf/">ICML Conference</a> :) Come attend my talk and poster tomorrow on human centered considerations for a safer and better future of work

I will be recruiting PhD students at <a href="/stonybrooku/">Stony Brook University</a> <a href="/sbucompsc/">Stony Brook University Dept. of Computer Science</a> coming fall. Please get in touch.

thumb_up_off_alt76

chat_bubble_outline7

repeat11

shareShare

Google DeepMind

@googledeepmind

4 months ago

An advanced version of Gemini with Deep Think has officially achieved gold medal-level performance at the International Mathematical Olympiad. 🥇 It solved 5️⃣ out of 6️⃣ exceptionally difficult problems, involving algebra, combinatorics, geometry and number theory. Here’s how 🧵

thumb_up_off_alt4,4K

chat_bubble_outline148

repeat765

shareShare

Vishakh Padmakumar

@vishakh_pk

4 months ago

Maybe don't use an LLM for _everything_? Last summer, I got to fiddle again with content diversity Adobe Research Adobe and we showed that agentic pipelines that mix LLM-prompt steps with principled techniques can yield better, more personalized summaries

Maybe don't use an LLM for _everything_?

Last summer, I got to fiddle again with content diversity <a href="/AdobeResearch/">Adobe Research</a> <a href="/Adobe/">Adobe</a> and we showed that agentic pipelines that mix LLM-prompt steps with principled techniques can yield better, more personalized summaries

thumb_up_off_alt61

chat_bubble_outline1

repeat12

shareShare

Nitish Joshi

@nitishjoshi23

2 months ago

Monitoring CoT may be insufficient to detect reward hacking. We develop a very simple method to detect such implicit reward hacking - truncate CoT, force predict answer, and use the AUC of the %CoT vs expected reward curve as a measure. Last project of my PhD!

thumb_up_off_alt18

chat_bubble_outline0

repeat4

shareShare

He He

@hhexiy

a month ago

Reward hacking means the model is making less effort than expected: it finds the answer long before its fake CoT is finished. TRACE uses this idea to detect hacking when CoT monitoring fails. Work led by Xinpeng Wang Nitish Joshi and Rico Angell👇

thumb_up_off_alt130

chat_bubble_outline3

repeat11

shareShare

Ziqian Zhong

@fjzzq2002

a month ago

New research with Aditi Raghunathan, Nicholas Carlini and Anthropic! We built ImpossibleBench to measure reward hacking in LLM coding agents 🤖, by making benchmark tasks impossible and seeing whether models game tests or follow specs. (1/9)

New research with <a href="/AdtRaghunathan/">Aditi Raghunathan</a>, Nicholas Carlini and Anthropic!

We built ImpossibleBench to measure reward hacking in LLM coding agents 🤖, by making benchmark tasks impossible and seeing whether models game tests or follow specs. (1/9)

thumb_up_off_alt451

chat_bubble_outline11

repeat64

shareShare

Sundar Pichai

@sundarpichai

6 days ago

Introducing Gemini 3 ✨ It’s the best model in the world for multimodal understanding, and our most powerful agentic + vibe coding model yet. Gemini 3 can bring any idea to life, quickly grasping context and intent so you can get what you need with less prompting. Find Gemini

thumb_up_off_alt15,15K

chat_bubble_outline636

repeat1,1K

shareShare

lmarena.ai (formerly lmsys.org)

@lmarena_ai

6 days ago

🚨BREAKING: Google DeepMind’s Gemini-3-Pro is now #1 across all major Arena leaderboards 🥇#1 in Text, Vision, and WebDev - surpassing Grok-4.1, Claude-4.5, and GPT-5 🥇#1 in Coding, Math, Creative Writing, Long Queries, and nearly all occupational leaderboards. Massive gains

🚨BREAKING: <a href="/GoogleDeepMind/">Google DeepMind</a>’s Gemini-3-Pro is now #1 across all major Arena leaderboards

🥇#1 in Text, Vision, and WebDev - surpassing Grok-4.1, Claude-4.5, and GPT-5
🥇#1 in Coding, Math, Creative Writing, Long Queries, and nearly all occupational leaderboards.

Massive gains

thumb_up_off_alt1,1K

chat_bubble_outline78

repeat227

shareShare