Nitish Joshi (@nitishjoshi23) 's Twitter Profile
Nitish Joshi

@nitishjoshi23

PhD student at NYU | CS undergrad @IITBombay '20 | Research in Natural Language Processing (#NLProc)

ID: 1005935875556036608

linkhttps://joshinh.github.io/ calendar_today10-06-2018 22:12:56

181 Tweet

940 Followers

792 Following

Naomi Saphra hiring a lab 🧈🪰 (@nsaphra) 's Twitter Profile Photo

2018: Saliency maps give plausible interpretations of random weights, triggering skepticism and catalyzing the mechinterp cultural movement, which now advocates for SAEs. 2025: SAEs give plausible interpretations of random weights, triggering skepticism and ...

2018: Saliency maps give plausible interpretations of random weights, triggering skepticism and catalyzing the mechinterp cultural movement, which now advocates for SAEs.

2025: SAEs give plausible interpretations of random weights, triggering skepticism and ...
Yanda Chen (@yanda_chen_) 's Twitter Profile Photo

My first paper Anthropic is out! We show that Chains-of-Thought often don’t reflect models’ true reasoning—posing challenges for safety monitoring. It’s been an incredible 6 months pushing the frontier toward safe AGI with brilliant colleagues. Huge thanks to the team! 🙏

Naman Jain @ ICLR (@stringchaos) 's Twitter Profile Photo

Excited to release R2E-Gym - 🔥 8.1K executable environments using synthetic data - 🧠 Hybrid verifiers for enhanced inference-time scaling - 📈 51% success-rate on the SWE-Bench Verified - 🤗 Open Source Data + Models + Trajectories 1/

Excited to release R2E-Gym
  - 🔥 8.1K executable environments using synthetic data
  - 🧠 Hybrid verifiers for enhanced inference-time scaling
  - 📈 51% success-rate on the SWE-Bench Verified
  - 🤗 Open Source Data + Models + Trajectories

1/
Yulin Chen (@yulinchen99) 's Twitter Profile Photo

Reasoning models overthink, generating multiple answers during reasoning. Is it because they can’t tell which ones are right? No! We find while reasoning models encode strong correctness signals during chain-of-thought, they may not use them optimally. 🧵 below

Reasoning models overthink, generating multiple answers during reasoning. Is it because they can’t tell which ones are right?

No! We find while reasoning models encode strong correctness signals during chain-of-thought, they may not use them optimally.

🧵 below
Vishakh Padmakumar (@vishakh_pk) 's Twitter Profile Photo

What does it mean for #LLM output to be novel? In work w/ John(Yueh-Han) Chen, Jane Pan, Valerie Chen, He He we argue it needs to be both original and high quality. While prompting tricks trade one for the other, better models (scaling/post-training) can shift the novelty frontier 🧵

What does it mean for #LLM output to be novel?
In work w/ <a href="/jcyhc_ai/">John(Yueh-Han) Chen</a>, <a href="/JanePan_/">Jane Pan</a>, <a href="/valeriechen_/">Valerie Chen</a>,  <a href="/hhexiy/">He He</a> we argue it needs to be both original and high quality. While prompting tricks trade one for the other, better models (scaling/post-training) can shift the novelty frontier 🧵
Chaitanya Malaviya (@cmalaviya11) 's Twitter Profile Photo

Ever wondered what makes language models generate overly verbose, vague, or sycophantic responses? Our new paper investigates these and other idiosyncratic biases in preference models, and presents a simple post-training recipe to mitigate them! Thread below 🧵↓

Ever wondered what makes language models generate overly verbose, vague, or sycophantic responses?

Our new paper investigates these and other idiosyncratic biases in preference models, and presents a simple post-training recipe to mitigate them! Thread below 🧵↓
Nathan Lambert (@natolambert) 's Twitter Profile Photo

Nice to see folks studying biases in RLHF / preference tuning all the way down to the datasets. I think many of the biases are mostly irreducible human biases that can't be solved within current training regimes, just mitigated.

John(Yueh-Han) Chen (@jcyhc_ai) 's Twitter Profile Photo

LLMs won’t tell you how to make fake IDs—but will reveal the layouts/materials of IDs and make realistic photos if asked separately. 💥Such decomposition attacks reach 87% success across QA, text-to-image, and agent settings! 🛡️Our monitoring method defends with 93% success! 🧵

LLMs won’t tell you how to make fake IDs—but will reveal the layouts/materials of IDs and make realistic photos if asked separately.

💥Such decomposition attacks reach 87% success across QA, text-to-image, and agent settings!
🛡️Our monitoring method defends with 93% success! 🧵
Rico Angell (@rico_angell) 's Twitter Profile Photo

What causes jailbreaks to transfer between LLMs? We find that jailbreak strength and model representation similarity predict transferability, and we can engineer model similarity to improve transfer. Details in🧵

What causes jailbreaks to transfer between LLMs?

We find that jailbreak strength and model representation similarity predict transferability, and we can engineer model similarity to improve transfer.

Details in🧵
CLS (@chengleisi) 's Twitter Profile Photo

Are AI scientists already better than human researchers? We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts. Main finding: LLM ideas result in worse projects than human ideas.

Are AI scientists already better than human researchers?

We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts.

Main finding: LLM ideas result in worse projects than human ideas.
Michael Hu (@michahu8) 's Twitter Profile Photo

📢 today's scaling laws often don't work for predicting downstream task performance. For some pretraining setups, smooth and predictable scaling is the exception, not the rule. a quick read about scaling law fails: 📜arxiv.org/abs/2507.00885 🧵1/5👇

📢 today's scaling laws often don't work for predicting downstream task performance. For some pretraining setups, smooth and predictable scaling is the exception, not the rule.

a quick read about scaling law fails: 
📜arxiv.org/abs/2507.00885

🧵1/5👇
Tuhin Chakrabarty (@tuhinchakr) 's Twitter Profile Photo

Honored to get the outstanding position paper award at ICML Conference :) Come attend my talk and poster tomorrow on human centered considerations for a safer and better future of work I will be recruiting PhD students at Stony Brook University Stony Brook University Dept. of Computer Science coming fall. Please get in touch.

Honored to get the outstanding position paper award at <a href="/icmlconf/">ICML Conference</a> :) Come attend my talk and poster tomorrow on human centered considerations for a safer and better future of work

I will be recruiting PhD students at <a href="/stonybrooku/">Stony Brook University</a> <a href="/sbucompsc/">Stony Brook University Dept. of Computer Science</a> coming fall. Please get in touch.
Google DeepMind (@googledeepmind) 's Twitter Profile Photo

An advanced version of Gemini with Deep Think has officially achieved gold medal-level performance at the International Mathematical Olympiad. 🥇 It solved 5️⃣ out of 6️⃣ exceptionally difficult problems, involving algebra, combinatorics, geometry and number theory. Here’s how 🧵

An advanced version of Gemini with Deep Think has officially achieved gold medal-level performance at the International Mathematical Olympiad. 🥇

It solved 5️⃣ out of 6️⃣ exceptionally difficult problems, involving algebra, combinatorics, geometry and number theory. Here’s how 🧵
Vishakh Padmakumar (@vishakh_pk) 's Twitter Profile Photo

Maybe don't use an LLM for _everything_? Last summer, I got to fiddle again with content diversity Adobe Research Adobe and we showed that agentic pipelines that mix LLM-prompt steps with principled techniques can yield better, more personalized summaries

Maybe don't use an LLM for _everything_?

Last summer, I got to fiddle again with content diversity <a href="/AdobeResearch/">Adobe Research</a> <a href="/Adobe/">Adobe</a> and we showed that agentic pipelines that mix LLM-prompt steps with principled techniques can yield better, more personalized summaries
Nitish Joshi (@nitishjoshi23) 's Twitter Profile Photo

Monitoring CoT may be insufficient to detect reward hacking. We develop a very simple method to detect such implicit reward hacking - truncate CoT, force predict answer, and use the AUC of the %CoT vs expected reward curve as a measure. Last project of my PhD!

He He (@hhexiy) 's Twitter Profile Photo

Reward hacking means the model is making less effort than expected: it finds the answer long before its fake CoT is finished. TRACE uses this idea to detect hacking when CoT monitoring fails. Work led by Xinpeng Wang Nitish Joshi and Rico Angell👇

Ziqian Zhong (@fjzzq2002) 's Twitter Profile Photo

New research with Aditi Raghunathan, Nicholas Carlini and Anthropic! We built ImpossibleBench to measure reward hacking in LLM coding agents 🤖, by making benchmark tasks impossible and seeing whether models game tests or follow specs. (1/9)

New research with <a href="/AdtRaghunathan/">Aditi Raghunathan</a>, Nicholas Carlini and Anthropic!

We built ImpossibleBench to measure reward hacking in LLM coding agents 🤖, by making benchmark tasks impossible and seeing whether models game tests or follow specs. (1/9)
Sundar Pichai (@sundarpichai) 's Twitter Profile Photo

Introducing Gemini 3 ✨ It’s the best model in the world for multimodal understanding, and our most powerful agentic + vibe coding model yet. Gemini 3 can bring any idea to life, quickly grasping context and intent so you can get what you need with less prompting.  Find Gemini

lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

🚨BREAKING: Google DeepMind’s Gemini-3-Pro is now #1 across all major Arena leaderboards 🥇#1 in Text, Vision, and WebDev - surpassing Grok-4.1, Claude-4.5, and GPT-5 🥇#1 in Coding, Math, Creative Writing, Long Queries, and nearly all occupational leaderboards. Massive gains

🚨BREAKING: <a href="/GoogleDeepMind/">Google DeepMind</a>’s Gemini-3-Pro is now #1 across all major Arena leaderboards

🥇#1 in Text, Vision, and WebDev - surpassing Grok-4.1, Claude-4.5, and GPT-5
🥇#1 in Coding, Math, Creative Writing, Long Queries, and nearly all occupational leaderboards.

Massive gains