Sharut Gupta (@sharut_gupta) 's Twitter Profile
Sharut Gupta

@sharut_gupta

PhD@MIT_CSAIL | Previously @GoogleDeepMind, @AIatMeta | Representation Learning, In-Context Learning, ML Robustness, Generalization | IIT Delhi’22

ID: 1297440922032955398

linkhttps://www.mit.edu/~sharut/ calendar_today23-08-2020 07:50:03

184 Tweet

1,1K Followers

852 Following

Chenyu Wang (@chenyuw64562111) 's Twitter Profile Photo

I'll be at NeurIPS Conference next week, presenting - ContextSSL in Session 2 (#2301) (also oral at SSL workshop) - DisentangledSSL at UniReps workshop (Honorable Mention & Oral!) - DRAKES at MLSB & AIDrugX workshop Please reach out if you’d like to chat!

UniReps (@unireps) 's Twitter Profile Photo

We continue with Chenyu Wang, presenting "An Information Criterion for Controlled Disentanglement of Multimodal Data" 👏

We continue with Chenyu Wang, presenting "An Information Criterion for Controlled Disentanglement of Multimodal Data" 👏
Mohammad Pezeshki (@mpezeshki91) 's Twitter Profile Photo

Memorization can block true learning in neural nets! Check out our latest work tomorrow at the workshop on Scientific Methods for Understanding Deep Learning, Sunday in West meeting rooms 205-207 at #NeurIPS2024!

Yifei Wang (@yifeiwang77) 's Twitter Profile Photo

Excited to share that 6 papers were accepted at ICLR 2025! ✨ #ICLR2025 We proposed long-context perplexity, invariant in-context learning, and constrained tool decoding for better training and usage of LLMs. We also looked into some fundamental questions, such as OOD

Excited to share that 6 papers were accepted at ICLR 2025! ✨ #ICLR2025

We proposed long-context perplexity, invariant in-context learning, and constrained tool decoding for better training and usage of LLMs. We also looked into some fundamental questions, such as OOD
Sharut Gupta (@sharut_gupta) 's Twitter Profile Photo

Thrilled to share that our work on disentangled multimodal representation is headed to #ICLR2025 🇸🇬! Disentanglement is key for tasks where modalities provide complementary insights, driving interpretability, cross-modal translation, and counterfactual generation.

Sharut Gupta (@sharut_gupta) 's Twitter Profile Photo

Grateful to MIT CSAIL Alliances for having me on their podcast! It was a joy sharing how our recent work trains machines to self-adapt to new tasks and scenarios! Paper: lnkd.in/g2g-RcDb Podcast: bit.ly/40IVZ7v

Jeremy Bernstein (@jxbz) 's Twitter Profile Photo

I just wrote my first blog post in four years! It is called "Deriving Muon". It covers the theory that led to Muon and how, for me, Muon is a meaningful example of theory leading practice in deep learning (1/11)

I just wrote my first blog post in four years! It is called "Deriving Muon". It covers the theory that led to Muon and how, for me, Muon is a meaningful example of theory leading practice in deep learning

(1/11)
Ben Cohen-Wang (@bcohenwang) 's Twitter Profile Photo

It can be helpful to pinpoint the in-context information that a language model uses when generating content (is it using provided documents? or its own intermediate thoughts?). We present Attribution with Attention (AT2), a method for doing so efficiently and reliably! (1/8)

It can be helpful to pinpoint the in-context information that a language model uses when generating content (is it using provided documents? or its own intermediate thoughts?). We present Attribution with Attention (AT2), a method for doing so efficiently and reliably! (1/8)
Hannah Lawrence (@hlawrencecs) 's Twitter Profile Photo

Equivariant functions (e.g. GNNs) can't break symmetries, which can be problematic for generative models and beyond. Come to poster #207 Saturday at 10AM to hear about our solution: SymPE, or symmetry-breaking positional encodings! w/Vasco Portilheiro, Yan Zhang, Oumar Kaba

Equivariant functions (e.g. GNNs) can't break symmetries, which can be problematic for generative models and beyond. Come to poster #207 Saturday at 10AM to hear about our solution: SymPE, or symmetry-breaking positional encodings! 

w/Vasco Portilheiro, Yan Zhang, <a href="/sekoumarkaba/">Oumar Kaba</a>
Mohammad Pezeshki (@mpezeshki91) 's Twitter Profile Photo

I'm presenting our recent work on "Pitfalls of Memorization" today at ICLR Number #304 at 3pm.. Come say hi! iclr.cc/virtual/2025/p…

I'm presenting our recent work on "Pitfalls of Memorization" today at ICLR
Number #304 at 3pm..
Come say hi!
iclr.cc/virtual/2025/p…
Derek Lim (@dereklim_lzh) 's Twitter Profile Photo

Check out our new paper on learning from LLM output signatures: the |tokens| x |vocab|+1 matrix of predicted next-token probs and actual next-token prob. It provably generalizes several existing approaches and is great at hallucination / data contamination detection tasks!

Divyat Mahajan (@divyat09) 's Twitter Profile Photo

Happy to share that Compositional Risk Minimization has been accepted at #ICML2025 📌Extensive theoretical analysis along with a practical approach for extrapolating classifiers to novel compositions! 📜 arxiv.org/abs/2410.06303

Happy to share that Compositional Risk Minimization has been accepted at #ICML2025

📌Extensive theoretical analysis along with a practical approach for extrapolating classifiers to novel compositions!

📜 arxiv.org/abs/2410.06303
Mohammad Pezeshki (@mpezeshki91) 's Twitter Profile Photo

"Compositional Risk Minimization" Tackling unseen attribute combinations with additive energy models at #ICML2025. 💡 Check out the excellent summary by Divyat Samadhiya

Akarsh Kumar (@akarshkumar0101) 's Twitter Profile Photo

Excited to share our position paper on the Fractured Entangled Representation (FER) Hypothesis! We hypothesize that the standard paradigm of training networks today — while producing impressive benchmark results — is still failing to create a well-organized internal

Sharut Gupta (@sharut_gupta) 's Twitter Profile Photo

Excited to share our work on Transformer-PSMs: a neural sequence model with constant per-token inference time and log(seq-len) memory. It presents a sweet spot between transformers (linear scaling with KV cache) and RNNs/state space models (constant). Check the thread below 👇

Excited to share our work on Transformer-PSMs: a neural sequence model with constant per-token inference time and log(seq-len) memory. It presents a sweet spot between transformers (linear scaling with KV cache) and RNNs/state space models (constant). 

Check the thread below 👇
Phillip Isola (@phillip_isola) 's Twitter Profile Photo

Our computer vision textbook is now available for free online here: visionbook.mit.edu We are working on adding some interactive components like search and (beta) integration with LLMs. Hope this is useful and feel free to submit Github issues to help us improve the text!

Polina Kirichenko (@polkirichenko) 's Twitter Profile Photo

Excited to release AbstentionBench -- our paper and benchmark on evaluating LLMs’ *abstention*: the skill of knowing when NOT to answer! Key finding: reasoning LLMs struggle with unanswerable questions and hallucinate! Details and links to paper & open source code below! 🧵1/9

Excited to release AbstentionBench -- our paper and benchmark on evaluating LLMs’ *abstention*: the skill of knowing when NOT to answer!

Key finding: reasoning LLMs struggle with unanswerable questions and hallucinate!

Details and links to paper &amp; open source code below!
🧵1/9
Jenny Huang (@jennyhuang99) 's Twitter Profile Photo

🚨Past work shows: dropping just 0.1% of the data can change the conclusions of important studies. We show: Many approximations can fail to catch this. 📢Check out our new TMLR paper (w/ David Burt, Yunyi Shen/申云逸 🐺 , Tin Nguyen, and Tamara Broderick ) 👇 openreview.net/forum?id=m6EQ6…

Shivam Duggal (@shivamduggal4) 's Twitter Profile Photo

Compression is the heart of intelligence From Occam to Kolmogorov—shorter programs=smarter representations Meet KARL: Kolmogorov-Approximating Representation Learning. Given an image, token budget T & target quality 𝜖 —KARL finds the smallest t≤T to reconstruct it within 𝜖🧵

Compression is the heart of intelligence
From Occam to Kolmogorov—shorter programs=smarter representations

Meet KARL: Kolmogorov-Approximating Representation Learning.

Given an image, token budget T &amp; target quality 𝜖 —KARL finds the smallest t≤T to reconstruct it within 𝜖🧵