Manish Pandey 🧬 (@manish_genai) Twitter Tweets • TwiCopy

Jack D. Carson

2 months ago

This is the greatest paper that I have ever read, and should be an inspiration to computer scientists everywhere. The actual innovation offered in the paper is moderate, but the authors don't just show a couple graphs and tables and extrapolate. They rigorously prove from

thumb_up_off_alt1,1K

chat_bubble_outline7

repeat132

shareShare

Prof. Anima Anandkumar

@animaanandkumar

a month ago

I have been advocating tensor methods for almost decade and a half. Take a look at our tensor methods in deep learning from a few years ago arxiv.org/abs/2107.03436 Tensorly package allows defining tensor operations in Pytorch seamlessly tensorly.org Jean Kossaifi

thumb_up_off_alt382

chat_bubble_outline7

repeat52

shareShare

AK

@_akhaliq

a month ago

Energy-Based Transformers are Scalable Learners and Thinkers

thumb_up_off_alt899

chat_bubble_outline14

repeat130

shareShare

Kevin Lu

@_kevinlu

a month ago

Why you should stop working on RL research and instead work on product // The technology that unlocked the big scaling shift in AI is the internet, not transformers I think it's well known that data is the most important thing in AI, and also that researchers choose not to work

thumb_up_off_alt1,1K

chat_bubble_outline32

repeat152

shareShare

Scott Geng

@scottgeng00

a month ago

🤔 How do we train AI models that surpass their teachers? 🚨 In #COLM2025: ✨Delta learning ✨makes LLM post-training cheap and easy – with only weak data, we beat open 8B SOTA 🤯 The secret? Learn from the *differences* in weak data pairs! 📜 arxiv.org/abs/2507.06187 🧵 below

thumb_up_off_alt159

chat_bubble_outline7

repeat46

shareShare

Azalia Mirhoseini

@azaliamirh

a month ago

Looking forward to attending ICML! Here are some works on memory/long context, verification, kernel design, multi-model AI systems, and theoretical understanding of test-time scaling from my awesome students and collaborators!

thumb_up_off_alt86

chat_bubble_outline3

repeat17

shareShare

Ang Cao

@angcao3

a month ago

Can we train a 3D-language multimodality Transformer using 2D VLMs and rendering loss? Sasha (Alexander) Sax will present our new #icml25 paper on Wednesday 2pm at Hall B2-B3 W200. Please come and check! Project Page: liftgs.github.io

thumb_up_off_alt133

chat_bubble_outline0

repeat21

shareShare

Jiahao Qiu

@jiahaoqiu99

21 days ago

🚀 Just released: "A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence"! We provide the first comprehensive review of agents capable of self-evolution—highlighting what, when, and how agents evolve, key benchmarks and applications, and future directions

thumb_up_off_alt154

chat_bubble_outline2

repeat39

shareShare

Rohan Paul

@rohanpaul_ai

19 days ago

The paper shows that a small team can automate the full medical data pipeline with cooperating agents, slashing manual effort. An agent in this framework is a small service that knows one task and chats with the others. The pipeline begins when an Ingestion Identifier spots

thumb_up_off_alt195

chat_bubble_outline4

repeat53

shareShare

Zhaopeng Tu

@tuzhaopeng

19 days ago

Are RL agents truly learning to reason, or just finding lucky shortcuts? 🤔 Introducing RLVMR: Reinforcement Learning with Verifiable Meta-Reasoning Rewards — a novel framework that rewards not just outcomes, but the quality of reasoning itself, creating more robust and

thumb_up_off_alt382

chat_bubble_outline6

repeat77

shareShare

Jack Lindsey

@jack_w_lindsey

19 days ago

Attention is all you need - but how does it work? In our new paper, we take a big step towards understanding it. We developed a way to integrate attention into our previous circuit-tracing framework (attribution graphs), and it's already turning up fascinating stuff! 🧵

thumb_up_off_alt1,1K

chat_bubble_outline20

repeat187

shareShare

Emmanuel Ameisen

@mlpowered

19 days ago

Earlier this year, we showed a method to interpret the intermediate steps a model takes to produce an answer. But we were missing a key bit of information: explaining why the model attends to specific concepts. Today, we do just that 🧵

thumb_up_off_alt507

chat_bubble_outline6

repeat55

shareShare

Raj Movva

@rajivmovva

14 days ago

📢NEW POSITION PAPER: Use Sparse Autoencoders to Discover Unknown Concepts, Not to Act on Known Concepts Despite recent results, SAEs aren't dead! They can still be useful to mech interp, and also much more broadly: across FAccT, computational social science, and ML4H. 🧵

thumb_up_off_alt356

chat_bubble_outline2

repeat63

shareShare

Graham Neubig

@gneubig

14 days ago

Summary of GPT-OSS architectural innovations: 1. sliding window attention (ref: arxiv.org/abs/1901.02860) 2. mixture of experts (ref: arxiv.org/abs/2101.03961) 3. RoPE w/ Yarn (ref: arxiv.org/abs/2309.00071) 4. attention sinks (ref: streaming llm arxiv.org/abs/2309.17453)

thumb_up_off_alt1,1K

chat_bubble_outline9

repeat274

shareShare

Zihan Wang - on RAGEN

@wzihanw

13 days ago

To guys diving into fine-tuning open-source MoEs today: check out ESFT, our customized PEFT method for MoE models. Train with 90% less parameters, gain 95%+ task perf and keep 98% general perf :)

thumb_up_off_alt199

chat_bubble_outline1

repeat27

shareShare

YichuanWang

@yichuanm

11 days ago

1/N 🚀 Launching LEANN — the tiniest vector index on Earth! Fast, accurate, and 100% private RAG on your MacBook. 0% internet. 97% smaller. Semantic search on everything. Your personal Jarvis, ready to dive into your emails, chats, and more. 🔗 Code: github.com/yichuan-w/LEANN 📄

thumb_up_off_alt131

chat_bubble_outline3

repeat42

shareShare

Peter Tong

@tongpetersb

8 days ago

Want to add that even with language-assisted visual evaluations, we're seeing encouraging progress in vision-centric benchmarks like CV-Bench (arxiv.org/abs/2406.16860) and Blink (arxiv.org/abs/2404.12390), which repurpose core vision tasks into VQA format. These benchmarks do help

thumb_up_off_alt61

chat_bubble_outline1

repeat14

shareShare

Tanishq Mathew Abraham, Ph.D.

@iscienceluvr

7 days ago

Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning "we demonstrate that employing only two techniques, i.e., advantage normalization (group-level mean, batch-level std) and token-level loss aggregation, can unlock the learning capability of critic-free policies using

thumb_up_off_alt198

chat_bubble_outline3

repeat31

shareShare

Jean de Nyandwi

@jeande_d

7 days ago

Current multimodal LLMs excel in English and Western contexts but struggle with cultural knowledge from underrepresented regions and languages. How can we build truly globally inclusive vision-language models? We are introducing CulturalGround, a large-scale dataset with 22M

thumb_up_off_alt140

chat_bubble_outline5

repeat22

shareShare

Felix Heide

@_felixheide_

5 days ago

3D Object Tracking without Training Data? In our nature Machine Intelligence paper (nature.com/articles/s4225…), we recast 3D tracking as an inverse neural rendering task where we fit a scene graph to an image that best explains this image. The method generalizes to completely

thumb_up_off_alt350

chat_bubble_outline8

repeat63

shareShare