Deepak Vijaykeerthy (@dvijaykeerthy) 's Twitter Profile
Deepak Vijaykeerthy

@dvijaykeerthy

ML Research & Engineering @IBMResearch. Ex @MSFTResearch. Opinions are my own! Tweets about books & food.

ID: 1308375612667486208

linkhttps://researcher.watson.ibm.com/researcher/view.php?person=in-deepakvij calendar_today22-09-2020 12:00:39

5,5K Tweet

496 Followers

906 Following

Aran Komatsuzaki (@arankomatsuzaki) 's Twitter Profile Photo

Actually, gradient descent can be seen as attention that applies beyond the model's context length! Let me explain why 🧵 👇 (1/N) Ref: arxiv.org/abs/2202.05798 arxiv.org/abs/2212.10559

Actually, gradient descent can be seen as attention that applies beyond the model's context length! Let me explain why 🧵 👇 (1/N)

Ref:
arxiv.org/abs/2202.05798
arxiv.org/abs/2212.10559
Junlin (Hans) Han (@han_junlin) 's Twitter Profile Photo

Excited to share our new work: “Learning to See Before Seeing”! 🧠➡️👀 We investigate an interesting phenomeno: how do LLMs, trained only on text, learn about the visual world? Project page: junlinhan.github.io/projects/lsbs/

Excited to share our new work: “Learning to See Before Seeing”! 🧠➡️👀 We investigate an interesting phenomeno: how do LLMs, trained only on text, learn about the visual world? 
Project page:  junlinhan.github.io/projects/lsbs/
Ernest Ryu (@ernestryu) 's Twitter Profile Photo

There’s chatter about frontier labs having a secret super-advanced-GRPO. But let me tell you something new about GRPO; the clipping mechanisms induce entropy biases: - clip-low increases entropy - clip-high decreases entropy (1/5)

Dr. Datta M.D. (AIIMS Delhi) (@drdatta_aiims) 's Twitter Profile Photo

🚨 Just published! All frontier AI models have failed “Radiology’s Last Exam” - the toughest benchmark in radiology launched today! ✅ Board-certified radiologists scored 83%, trainees 45%, but the best performing AI from frontier labs, GPT-5, managed only 30%. ❌ These results

🚨 Just published! All frontier AI models have failed “Radiology’s Last Exam” - the toughest benchmark in radiology launched today!

✅ Board-certified radiologists scored 83%, trainees 45%, but the best performing AI from frontier labs, GPT-5, managed only 30%.

❌ These results
Mohit Bansal (@mohitban47) 's Twitter Profile Photo

🚨 Generalized Correctness Predictors: ➡️ LLMs have no better self-knowledge about their own correctness compared to other LLMs. ➡️ Instead we find that LLMs benefit from learning to predict the correctness (based on history) of many other models. ➡️ Training 1 GCM is strictly

tomaarsen (@tomaarsen) 's Twitter Profile Photo

We're announcing a new update to MTEB: RTEB It's a new multilingual text embedding retrieval benchmark with private (!) datasets, to ensure that we measure true generalization and avoid (accidental) overfitting. Details in our blogpost below 🧵

We're announcing a new update to MTEB: RTEB

It's a new multilingual text embedding retrieval benchmark with private (!) datasets, to ensure that we measure true generalization and avoid (accidental) overfitting.

Details in our blogpost below 🧵
Justin Chih-Yao Chen (@cyjustinchen) 's Twitter Profile Photo

🚨 NuRL: Nudging the Boundaries of LLM Reasoning GRPO improves LLM reasoning, but often within the model's "comfort zone": hard samples (w/ 0% pass rate) remain unsolvable and contribute zero learning signals. In NuRL, we show that "nudging" the LLM with self-generated hints

🚨 NuRL: Nudging the Boundaries of LLM Reasoning

GRPO improves LLM reasoning, but often within the model's "comfort zone": hard samples (w/ 0% pass rate) remain unsolvable and contribute zero learning signals. In NuRL, we show that "nudging" the LLM with self-generated hints
Aman Bhargava 📊 (@thedivtagguy) 's Twitter Profile Photo

We filed an RTI request with Bangalore's metro authority asking how many people traveled on the metro and got a file with 1.2M rows of data on ridership. In this project, we show our findings on how Bangalore uses the metro. You can find your commute here! Link below.

We filed an RTI request with Bangalore's metro authority asking how many people traveled on the metro and got a file with 1.2M rows of data on ridership. In this project, we show our findings on how Bangalore uses the metro. 

You can find your commute here! Link below.
Houjun Liu (@houjun_liu) 's Twitter Profile Photo

Introducing 𝘁𝗵𝗼𝘂𝗴𝗵𝘁𝗯𝘂𝗯𝗯𝗹𝗲𝘀: a *fully unsupervised* LM for input-adaptive parallel latent reasoning ✅ Learn yourself a reasoning model with normal pretraining ✅ Better perplexity compared to fixed thinking tokens No fancy loss, no chain of thought labels 🚀

Introducing 𝘁𝗵𝗼𝘂𝗴𝗵𝘁𝗯𝘂𝗯𝗯𝗹𝗲𝘀: a *fully unsupervised* LM for input-adaptive parallel latent reasoning

✅ Learn yourself a reasoning model with normal pretraining
✅ Better perplexity compared to fixed thinking tokens

No fancy loss, no chain of thought labels 🚀
Signal (@signalapp) 's Twitter Profile Photo

We are alarmed by reports that Germany is on the verge of a catastrophic about-face, reversing its longstanding and principled opposition to the EU’s Chat Control proposal which, if passed, could spell the end of the right to privacy in Europe. signal.org/blog/pdfs/germ…

John Schulman (@johnschulman2) 's Twitter Profile Photo

Really happy to see people reproducing the result that LoRA rank=1 closely matches full fine-tuning on many RL fine-tuning problems. Here are a couple nice ones: x.com/ben_burtenshaw… x.com/zzlccc/status/…

Hadi Pouransari (@hpouransari) 's Twitter Profile Photo

Introducing Pretraining with Hierarchical Memories: Separating Knowledge & Reasoning for On-Device LLM Deployment 💡We propose dividing LLM parameters into 1) anchor (always used, capturing commonsense) and 2) memory bank (selected per query, capturing world knowledge). [1/X]🧵

Pablo Montalvo (@m_olbap) 's Twitter Profile Photo

Super excited to finally post this interactive resource! We maintain 1M+ Python LOC across 400+ model architectures in 🤗 Transformers. How do we keep it controlled and keep shipping models? With Lysandre, Pedro Cuenca and Yoni we wrote down what makes it possible. Dive here!

Super excited to finally post this interactive resource! We maintain 1M+ Python LOC across 400+ model architectures in 🤗 Transformers. How do we keep it controlled and keep shipping models?

With <a href="/LysandreJik/">Lysandre</a>, <a href="/pcuenq/">Pedro Cuenca</a> and <a href="/yonigoz/">Yoni</a> we wrote down what makes it possible. Dive here!
Lianhui Qin (@lianhuiq) 's Twitter Profile Photo

🧠How can LLMs self-evolve over time? They need memory. LLMs burn huge compute on each query and forget everything afterward. ArcMemo introduces abstraction memory, which stores reusable reasoning patterns and recombines them to strengthen compositional reasoning. 📈On

🧠How can LLMs self-evolve over time? They need memory.

LLMs burn huge compute on each query and forget everything afterward. 

ArcMemo introduces abstraction memory, which stores reusable reasoning patterns and recombines them to strengthen compositional reasoning.

📈On
Ameya P. (@amyprb) 's Twitter Profile Photo

Presenting; A Sober Look at Progress in LM Reasoning at Conference on Language Modeling today 📷 #COLM2025📷 We find that many “reasoning” gains fall within variance and make evaluation reproducible again. Today 11:00 AM - 1:00 PM 📍Room 710 - Poster #31 Lots of new results in updated draft 👇

Presenting; A Sober Look at Progress in LM Reasoning at <a href="/COLM_conf/">Conference on Language Modeling</a> today 📷 #COLM2025📷 

We find that many “reasoning” gains fall within variance and make evaluation reproducible again.

Today 11:00 AM - 1:00 PM
📍Room 710 - Poster #31

Lots of new results in updated draft 👇
Pan Lu (@lupantech) 's Twitter Profile Photo

🔥Introducing #AgentFlow, a new trainable agentic system where a team of agents learns to plan and use tools in the flow of a task. 🌐agentflow.stanford.edu 📄huggingface.co/papers/2510.05… AgentFlow unlocks full potential of LLMs w/ tool-use. (And yes, our 3/7B model beats GPT-4o)👇

🔥Introducing #AgentFlow, a new trainable agentic system where a team of agents learns to plan and use tools in the flow of a task.

🌐agentflow.stanford.edu
📄huggingface.co/papers/2510.05…

AgentFlow unlocks full potential of LLMs w/ tool-use.
(And yes, our 3/7B model beats GPT-4o)👇
Andrej Karpathy (@karpathy) 's Twitter Profile Photo

I don't know what labs are doing to these poor LLMs during RL but they are mortally terrified of exceptions, in any infinitesimally likely case. Exceptions are a normal part of life and healthy dev process. Sign my LLM welfare petition for improved rewards in cases of exceptions.