Manish Pandey 🧬 (@manish_genai) 's Twitter Profile
Manish Pandey 🧬

@manish_genai

Co-Founder @FreeDoctr 🧬⚕️ ,
Building a collaborative Platform for Patients and Doctors. 🌐🩻
#GraphML, #GeometricDL, #Gen AI #ML ,#RL, #LLM, #AIForHealthcare

ID: 1423712366441635841

linkhttps://www.linkedin.com/in/manish-genai/ calendar_today06-08-2021 18:27:53

811 Tweet

425 Takipçi

6,6K Takip Edilen

Jack D. Carson (@mtlushan) 's Twitter Profile Photo

This is the greatest paper that I have ever read, and should be an inspiration to computer scientists everywhere. The actual innovation offered in the paper is moderate, but the authors don't just show a couple graphs and tables and extrapolate. They rigorously prove from

This is the greatest paper that I have ever read, and should be an inspiration to computer scientists everywhere. The actual innovation offered in the paper is moderate, but the authors don't just show a couple graphs and tables and extrapolate. They rigorously prove from
Prof. Anima Anandkumar (@animaanandkumar) 's Twitter Profile Photo

I have been advocating tensor methods for almost decade and a half. Take a look at our tensor methods in deep learning from a few years ago arxiv.org/abs/2107.03436 Tensorly package allows defining tensor operations in Pytorch seamlessly tensorly.org Jean Kossaifi

Kevin Lu (@_kevinlu) 's Twitter Profile Photo

Why you should stop working on RL research and instead work on product // The technology that unlocked the big scaling shift in AI is the internet, not transformers I think it's well known that data is the most important thing in AI, and also that researchers choose not to work

Why you should stop working on RL research and instead work on product //
The technology that unlocked the big scaling shift in AI is the internet, not transformers

I think it's well known that data is the most important thing in AI, and also that researchers choose not to work
Scott Geng (@scottgeng00) 's Twitter Profile Photo

🤔 How do we train AI models that surpass their teachers? 🚨 In #COLM2025: ✨Delta learning ✨makes LLM post-training cheap and easy – with only weak data, we beat open 8B SOTA 🤯 The secret? Learn from the *differences* in weak data pairs! 📜 arxiv.org/abs/2507.06187 🧵 below

🤔 How do we train AI models that surpass their teachers?

🚨 In #COLM2025: ✨Delta learning ✨makes LLM post-training cheap and easy – with only weak data, we beat open 8B SOTA 🤯

The secret? Learn from the *differences* in weak data pairs!

📜 arxiv.org/abs/2507.06187

🧵 below
Azalia Mirhoseini (@azaliamirh) 's Twitter Profile Photo

Looking forward to attending ICML! Here are some works on memory/long context, verification, kernel design, multi-model AI systems, and theoretical understanding of test-time scaling from my awesome students and collaborators!

Looking forward to attending ICML!

Here are some works on memory/long context, verification, kernel design, multi-model AI systems, and theoretical understanding of test-time scaling from my awesome students and collaborators!
Ang Cao (@angcao3) 's Twitter Profile Photo

Can we train a 3D-language multimodality Transformer using 2D VLMs and rendering loss? Sasha (Alexander) Sax will present our new #icml25 paper on Wednesday 2pm at Hall B2-B3 W200. Please come and check! Project Page: liftgs.github.io

Jiahao Qiu (@jiahaoqiu99) 's Twitter Profile Photo

🚀 Just released: "A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence"! We provide the first comprehensive review of agents capable of self-evolution—highlighting what, when, and how agents evolve, key benchmarks and applications, and future directions

🚀 Just released: "A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence"!
We provide the first comprehensive review of agents capable of self-evolution—highlighting what, when, and how agents evolve, key benchmarks and applications, and future directions
Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

The paper shows that a small team can automate the full medical data pipeline with cooperating agents, slashing manual effort. An agent in this framework is a small service that knows one task and chats with the others. The pipeline begins when an Ingestion Identifier spots

The paper shows that a small team can automate the full medical data pipeline with cooperating agents, slashing manual effort.

An agent in this framework is a small service that knows one task and chats with the others.

The pipeline begins when an Ingestion Identifier spots
Zhaopeng Tu (@tuzhaopeng) 's Twitter Profile Photo

Are RL agents truly learning to reason, or just finding lucky shortcuts? 🤔 Introducing RLVMR: Reinforcement Learning with Verifiable Meta-Reasoning Rewards — a novel framework that rewards not just outcomes, but the quality of reasoning itself, creating more robust and

Are RL agents truly learning to reason, or just finding lucky shortcuts? 🤔

Introducing RLVMR: Reinforcement Learning with Verifiable Meta-Reasoning Rewards — a novel framework that rewards not just outcomes, but the quality of reasoning itself, creating more robust and
Jack Lindsey (@jack_w_lindsey) 's Twitter Profile Photo

Attention is all you need - but how does it work? In our new paper, we take a big step towards understanding it. We developed a way to integrate attention into our previous circuit-tracing framework (attribution graphs), and it's already turning up fascinating stuff! 🧵

Emmanuel Ameisen (@mlpowered) 's Twitter Profile Photo

Earlier this year, we showed a method to interpret the intermediate steps a model takes to produce an answer. But we were missing a key bit of information: explaining why the model attends to specific concepts. Today, we do just that 🧵

Earlier this year, we showed a method to interpret the intermediate steps a model takes to produce an answer.

But we were missing a key bit of information: explaining why the model attends to specific concepts.

Today, we do just that 🧵
Raj Movva (@rajivmovva) 's Twitter Profile Photo

📢NEW POSITION PAPER: Use Sparse Autoencoders to Discover Unknown Concepts, Not to Act on Known Concepts Despite recent results, SAEs aren't dead! They can still be useful to mech interp, and also much more broadly: across FAccT, computational social science, and ML4H. 🧵

📢NEW POSITION PAPER: Use Sparse Autoencoders to Discover Unknown Concepts, Not to Act on Known Concepts

Despite recent results, SAEs aren't dead! They can still be useful to mech interp, and also much more broadly: across FAccT, computational social science, and ML4H. 🧵
Graham Neubig (@gneubig) 's Twitter Profile Photo

Summary of GPT-OSS architectural innovations: 1. sliding window attention (ref: arxiv.org/abs/1901.02860) 2. mixture of experts (ref: arxiv.org/abs/2101.03961) 3. RoPE w/ Yarn (ref: arxiv.org/abs/2309.00071) 4. attention sinks (ref: streaming llm arxiv.org/abs/2309.17453)

Zihan Wang - on RAGEN (@wzihanw) 's Twitter Profile Photo

To guys diving into fine-tuning open-source MoEs today: check out ESFT, our customized PEFT method for MoE models. Train with 90% less parameters, gain 95%+ task perf and keep 98% general perf :)

YichuanWang (@yichuanm) 's Twitter Profile Photo

1/N 🚀 Launching LEANN — the tiniest vector index on Earth! Fast, accurate, and 100% private RAG on your MacBook. 0% internet. 97% smaller. Semantic search on everything. Your personal Jarvis, ready to dive into your emails, chats, and more. 🔗 Code: github.com/yichuan-w/LEANN 📄

1/N 🚀 Launching LEANN — the tiniest vector index on Earth!

Fast, accurate, and 100% private RAG on your MacBook.
0% internet. 97% smaller. Semantic search on everything.
Your personal Jarvis, ready to dive into your emails, chats, and more.

🔗 Code: github.com/yichuan-w/LEANN
📄
Peter Tong (@tongpetersb) 's Twitter Profile Photo

Want to add that even with language-assisted visual evaluations, we're seeing encouraging progress in vision-centric benchmarks like CV-Bench (arxiv.org/abs/2406.16860) and Blink (arxiv.org/abs/2404.12390), which repurpose core vision tasks into VQA format. These benchmarks do help

Tanishq Mathew Abraham, Ph.D. (@iscienceluvr) 's Twitter Profile Photo

Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning "we demonstrate that employing only two techniques, i.e., advantage normalization (group-level mean, batch-level std) and token-level loss aggregation, can unlock the learning capability of critic-free policies using

Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning

"we demonstrate that employing only two techniques, i.e., advantage normalization (group-level mean, batch-level std) and token-level loss aggregation, can unlock the learning capability of critic-free policies using
Jean de Nyandwi (@jeande_d) 's Twitter Profile Photo

Current multimodal LLMs excel in English and Western contexts but struggle with cultural knowledge from underrepresented regions and languages. How can we build truly globally inclusive vision-language models? We are introducing CulturalGround, a large-scale dataset with 22M

Current multimodal LLMs excel in English and Western contexts but struggle with cultural knowledge from underrepresented regions and languages. How can we build truly globally inclusive vision-language models?

We are introducing CulturalGround, a large-scale dataset with 22M
Felix Heide (@_felixheide_) 's Twitter Profile Photo

3D Object Tracking without Training Data? In our nature Machine Intelligence paper (nature.com/articles/s4225…), we recast 3D tracking as an inverse neural rendering task where we fit a scene graph to an image that best explains this image. The method generalizes to completely