Rishika Bhagwatkar (@rishika2110) Twitter Tweets • TwiCopy

Ekaterina Lobacheva

10 months ago

Did you know that learning rate affects which examples are easy or hard for a model? And this difference is meaningful and relates to generalization. Stop by our poster at the SciForDL Workshop #NeurIPS2024 tomorrow to learn more! Paper: openreview.net/forum?id=NeetG… 1/8

thumb_up_off_alt63

chat_bubble_outline1

repeat12

shareShare

Sabyasachi Sahoo

@saby_tweets

8 months ago

🚀 Test-Time Adaptation (TTA) & Layer Selection! 🧵 (1/N) TTA helps LLMs like GPT-4o & DeepSeek-V2 via test-time compute scaling. But TTA fails on hard Out-of-Distribution (OOD) tasks! 😩 Our AAAI 2025 paper introduces GALA, a Gradient-Aligned Layer Adaptation framework to fix

thumb_up_off_alt15

chat_bubble_outline1

repeat5

shareShare

Tejas Vaidhya

@imtejas13

8 months ago

🎉 Thrilled to share that our paper "Surprising effectiveness of pretraining ternary language models at scale" earned a spotlight at #ICLR2024! We dive into Ternary Language Models (TriLMs), systematically studying their training feasibility and scaling laws against FloatLMs.

thumb_up_off_alt60

chat_bubble_outline3

repeat12

shareShare

Benjamin Thérien

@benjamintherien

7 months ago

How do MoE transformers, like DeepSeek, behave under distribution shifts? Do their routers collapse? Can they still match full re-training performance? Excited to present “Continual Pre-training of MoEs: How robust is your router?”!🧵arxiv.org/abs/2503.05029 1/N

thumb_up_off_alt39

chat_bubble_outline1

repeat20

shareShare

Accepted papers at TMLR

@tmlrpub

7 months ago

Interpreting Neurons in Deep Vision Networks with Language Models Nicholas Bai, Rahul Ajay Iyer, Tuomas Oikarinen, Akshay R. Kulkarni, Tsui-Wei Weng. Action editor: Antoine Ledent. openreview.net/forum?id=x1dXv… #neuron #neurons #deep

thumb_up_off_alt9

chat_bubble_outline0

repeat3

shareShare

Benjamin Thérien

@benjamintherien

6 months ago

Llama4 MoEs just dropped! Now you're planning to continually pre-train Scout or Maverick on your data. BUT, you're not sure how the distribution shift may affect the MoE's router? Our new paper has you covered! x.com/benjamintherie…

thumb_up_off_alt18

chat_bubble_outline0

repeat5

shareShare

Aniket Didolkar

@aniket_d98

6 months ago

✈️ I am travelling to Singapore 🇸🇬 for #ICLR 2025. I will be presenting 1 paper, details in 🧵 I will also be at the Meta booth on 24th and 25th from 10-12. Come chat about self supervised learning, the student/visiting researcher program at FAIR or anything in general!

thumb_up_off_alt77

chat_bubble_outline4

repeat7

shareShare

Gopeshh Subbaraj

@gopeshh1

6 months ago

1/ Most RL methods assumes a turn-based setup-- agent acts, environment responds. But in the real world, the environment doesn’t wait. In real-time RL, slow inference means missed actions or delayed ones. This leads to two key challenges: • Inaction Regret • Delay Regret

thumb_up_off_alt34

chat_bubble_outline2

repeat8

shareShare

Arnav Jain

@arnavkj95

6 months ago

📢 Come say hi at our SFM poster at #ICLR2025, Poster Session 5 – #572! We’re presenting a method for Inverse Reinforcement Learning via Successor Feature Matching — a non-adversarial approach that works without action labels. Excited to share and chat!

thumb_up_off_alt33

chat_bubble_outline0

repeat10

shareShare

Arjun Ashok

@arjunashok37

6 months ago

Context is Key🗝️ is accepted at ICML 2025! 📈 Let's catch up if you'll be at ICML 🛬 See the poster and tweet thread below for a preview of CiK 👇 x.com/arjunashok37/s… And stay tuned for new results ;)

thumb_up_off_alt56

chat_bubble_outline5

repeat10

shareShare

Divyat Mahajan

@divyat09

6 months ago

Happy to share that Compositional Risk Minimization has been accepted at #ICML2025 📌Extensive theoretical analysis along with a practical approach for extrapolating classifiers to novel compositions! 📜 arxiv.org/abs/2410.06303

thumb_up_off_alt159

chat_bubble_outline4

repeat31

shareShare

francesco croce

@fra__31

4 months ago

📃 In our new paper, we introduce FuseLIP, an encoder for multimodal embedding. We use early fusion of modalities to train a single transformer on contrastive + masked (multimodal) modeling loss More details👇

thumb_up_off_alt12

chat_bubble_outline1

repeat2

shareShare

Sara Ghazanfari

@saraghznfri

4 months ago

🚨How to incorporate temporal grounding into the reasoning steps of video LLMs? 📃We’re excited to introduce Chain-of-Frames (CoF), a simple method to improve reasoning via explicit frame references! 🧠✨ Big thanks to my co-authors, francesco croce, N. Flammarion, P. Krishnamurthy,

thumb_up_off_alt9

chat_bubble_outline1

repeat1

shareShare

francesco croce

@fra__31

4 months ago

We just released Chain-of-Frames: explicitly referencing relevant frames while reasoning improves the performance of video LLMs across benchmarks! Check it out 👇

thumb_up_off_alt7

chat_bubble_outline0

repeat1

shareShare

Majdi Hassan

@majdi_has

4 months ago

(1/n)🚨You can train a model solving DFT for any geometry almost without training data!🚨 Introducing Self-Refining Training for Amortized Density Functional Theory — a variational framework for learning a DFT solver that predicts the ground-state solutions for different

thumb_up_off_alt151

chat_bubble_outline3

repeat38

shareShare

Emiliano Penaloza

@emilianopp_

4 months ago

Excited that our paper "Addressing Concept Mislabeling in Concept Bottleneck Models Through Preference Optimization" was accepted to ICML 2025! We show how Preference Optimization can reduce the impact of noisy concept labels in CBMs. 🧵/9

thumb_up_off_alt30

chat_bubble_outline1

repeat21

shareShare

Akshay Kulkarni

@ak70000

4 months ago

⚡Interested in making pretrained generative models interpretable with minimal training and annotations? I'll be presenting our paper, Interpretable Generative Models through Post-hoc Concept Bottlenecks, at #CVPR2025 today in Poster Session 2 (4 pm - 6 pm CDT) at poster #266!

thumb_up_off_alt11

chat_bubble_outline1

repeat5

shareShare

Akshay Kulkarni

@ak70000

4 months ago

🚀 Paper: arxiv.org/abs/2503.19377 🚀 Code: github.com/Trustworthy-ML… 🚀 Project site: lilywenglab.github.io/posthoc-genera… Thanks to my co-authors, Ge Yan, Chung-En, Sun, Tuomas Oikarinen, and my PhD advisor Lily Weng

thumb_up_off_alt6

chat_bubble_outline1

repeat3

shareShare

Massimo Caccia

@masscaccia

3 months ago

🎉 Our paper “𝐻𝑜𝑤 𝑡𝑜 𝑇𝑟𝑎𝑖𝑛 𝑌𝑜𝑢𝑟 𝐿𝐿𝑀 𝑊𝑒𝑏 𝐴𝑔𝑒𝑛𝑡: 𝐴 𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐𝑎𝑙 𝐷𝑖𝑎𝑔𝑛𝑜𝑠𝑖𝑠” got an 𝐨𝐫𝐚𝐥 at next week’s 𝗜𝗖𝗠𝗟 𝗪𝗼𝗿𝗸𝘀𝗵𝗼𝗽 𝗼𝗻 𝗖𝗼𝗺𝗽𝘂𝘁𝗲𝗿 𝗨𝘀𝗲 𝗔𝗴𝗲𝗻𝘁𝘀! 🖥️🧠 We present the 𝐟𝐢𝐫𝐬𝐭 𝐥𝐚𝐫𝐠𝐞-𝐬𝐜𝐚𝐥𝐞

thumb_up_off_alt197

chat_bubble_outline5

repeat46

shareShare

P Shravan Nayak

@pshravannayak

3 months ago

Excited to be at #ICML2025 presenting 3 papers! 📌 UI-Vision (Poster, July 15, Hall B2-B3) 📌 LIVS (Poster, July 16, Hall B2-B3) 📌 CulturalFrames @ MoFA Workshop (July 18) If you're around and want to chat about agents, alignment, or cultural understanding, let's connect!

thumb_up_off_alt26

chat_bubble_outline0

repeat2

shareShare