Sharut Gupta (@sharut_gupta) Twitter Tweets • TwiCopy

Chenyu Wang

a year ago

I'll be at NeurIPS Conference next week, presenting - ContextSSL in Session 2 (#2301) (also oral at SSL workshop) - DisentangledSSL at UniReps workshop (Honorable Mention & Oral!) - DRAKES at MLSB & AIDrugX workshop Please reach out if you’d like to chat!

thumb_up_off_alt13

chat_bubble_outline1

repeat3

shareShare

UniReps

@unireps

a year ago

We continue with Chenyu Wang, presenting "An Information Criterion for Controlled Disentanglement of Multimodal Data" 👏

thumb_up_off_alt14

chat_bubble_outline0

repeat6

shareShare

Mohammad Pezeshki

@mpezeshki91

a year ago

Memorization can block true learning in neural nets! Check out our latest work tomorrow at the workshop on Scientific Methods for Understanding Deep Learning, Sunday in West meeting rooms 205-207 at #NeurIPS2024!

thumb_up_off_alt50

chat_bubble_outline2

repeat9

shareShare

Yifei Wang

@yifeiwang77

10 months ago

Excited to share that 6 papers were accepted at ICLR 2025! ✨ #ICLR2025 We proposed long-context perplexity, invariant in-context learning, and constrained tool decoding for better training and usage of LLMs. We also looked into some fundamental questions, such as OOD

thumb_up_off_alt191

chat_bubble_outline6

repeat11

shareShare

Sharut Gupta

@sharut_gupta

10 months ago

Thrilled to share that our work on disentangled multimodal representation is headed to #ICLR2025 🇸🇬! Disentanglement is key for tasks where modalities provide complementary insights, driving interpretability, cross-modal translation, and counterfactual generation.

thumb_up_off_alt61

chat_bubble_outline2

repeat3

shareShare

Sharut Gupta

@sharut_gupta

10 months ago

Grateful to MIT CSAIL Alliances for having me on their podcast! It was a joy sharing how our recent work trains machines to self-adapt to new tasks and scenarios! Paper: lnkd.in/g2g-RcDb Podcast: bit.ly/40IVZ7v

thumb_up_off_alt29

chat_bubble_outline2

repeat2

shareShare

Jeremy Bernstein

@jxbz

9 months ago

I just wrote my first blog post in four years! It is called "Deriving Muon". It covers the theory that led to Muon and how, for me, Muon is a meaningful example of theory leading practice in deep learning (1/11)

thumb_up_off_alt885

chat_bubble_outline10

repeat128

shareShare

Ben Cohen-Wang

@bcohenwang

7 months ago

It can be helpful to pinpoint the in-context information that a language model uses when generating content (is it using provided documents? or its own intermediate thoughts?). We present Attribution with Attention (AT2), a method for doing so efficiently and reliably! (1/8)

thumb_up_off_alt56

chat_bubble_outline3

repeat13

shareShare

Hannah Lawrence

@hlawrencecs

7 months ago

Equivariant functions (e.g. GNNs) can't break symmetries, which can be problematic for generative models and beyond. Come to poster #207 Saturday at 10AM to hear about our solution: SymPE, or symmetry-breaking positional encodings! w/Vasco Portilheiro, Yan Zhang, Oumar Kaba

thumb_up_off_alt99

chat_bubble_outline2

repeat16

shareShare

Mohammad Pezeshki

@mpezeshki91

7 months ago

I'm presenting our recent work on "Pitfalls of Memorization" today at ICLR Number #304 at 3pm.. Come say hi! iclr.cc/virtual/2025/p…

thumb_up_off_alt98

chat_bubble_outline1

repeat15

shareShare

Derek Lim

@dereklim_lzh

7 months ago

Check out our new paper on learning from LLM output signatures: the |tokens| x |vocab|+1 matrix of predicted next-token probs and actual next-token prob. It provably generalizes several existing approaches and is great at hallucination / data contamination detection tasks!

thumb_up_off_alt19

chat_bubble_outline0

repeat4

shareShare

Divyat Mahajan

@divyat09

7 months ago

Happy to share that Compositional Risk Minimization has been accepted at #ICML2025 📌Extensive theoretical analysis along with a practical approach for extrapolating classifiers to novel compositions! 📜 arxiv.org/abs/2410.06303

thumb_up_off_alt159

chat_bubble_outline4

repeat31

shareShare

Mohammad Pezeshki

@mpezeshki91

7 months ago

"Compositional Risk Minimization" Tackling unseen attribute combinations with additive energy models at #ICML2025. 💡 Check out the excellent summary by Divyat Samadhiya

thumb_up_off_alt29

chat_bubble_outline0

repeat4

shareShare

Akarsh Kumar

@akarshkumar0101

6 months ago

Excited to share our position paper on the Fractured Entangled Representation (FER) Hypothesis! We hypothesize that the standard paradigm of training networks today — while producing impressive benchmark results — is still failing to create a well-organized internal

thumb_up_off_alt224

chat_bubble_outline5

repeat36

shareShare

Sharut Gupta

@sharut_gupta

5 months ago

Excited to share our work on Transformer-PSMs: a neural sequence model with constant per-token inference time and log(seq-len) memory. It presents a sweet spot between transformers (linear scaling with KV cache) and RNNs/state space models (constant). Check the thread below 👇

thumb_up_off_alt67

chat_bubble_outline1

repeat4

shareShare

Phillip Isola

@phillip_isola

5 months ago

Our computer vision textbook is now available for free online here: visionbook.mit.edu We are working on adding some interactive components like search and (beta) integration with LLMs. Hope this is useful and feel free to submit Github issues to help us improve the text!

thumb_up_off_alt2,2K

chat_bubble_outline35

repeat595

shareShare

Polina Kirichenko

@polkirichenko

5 months ago

Excited to release AbstentionBench -- our paper and benchmark on evaluating LLMs’ *abstention*: the skill of knowing when NOT to answer! Key finding: reasoning LLMs struggle with unanswerable questions and hallucinate! Details and links to paper & open source code below! 🧵1/9

thumb_up_off_alt590

chat_bubble_outline11

repeat81

shareShare

Jenny Huang

@jennyhuang99

5 months ago

🚨Past work shows: dropping just 0.1% of the data can change the conclusions of important studies. We show: Many approximations can fail to catch this. 📢Check out our new TMLR paper (w/ David Burt, Yunyi Shen/申云逸 🐺 , Tin Nguyen, and Tamara Broderick ) 👇 openreview.net/forum?id=m6EQ6…

thumb_up_off_alt16

chat_bubble_outline1

repeat8

shareShare

Shivam Duggal

@shivamduggal4

5 months ago

Compression is the heart of intelligence From Occam to Kolmogorov—shorter programs=smarter representations Meet KARL: Kolmogorov-Approximating Representation Learning. Given an image, token budget T & target quality 𝜖 —KARL finds the smallest t≤T to reconstruct it within 𝜖🧵

thumb_up_off_alt329

chat_bubble_outline10

repeat62

shareShare