MIT NLP (@nlp_mit) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

MIT NLP

@nlp_mit

2 months ago

MIT NLP @ ICLR 2025 - catch Mehul Damani at poster 219, Thursday 3PM to chat about "Learning How Hard to Think: Input Adaptive Allocation of LM Computation"!

MIT NLP @ ICLR 2025 - catch
<a href="/MehulDamani2/">Mehul Damani</a> at poster 219, Thursday 3PM to chat about "Learning How Hard to Think: Input Adaptive Allocation of LM Computation"!

thumb_up_off_alt17

chat_bubble_outline0

repeat1

shareShare

It can be helpful to pinpoint the in-context information that a language model uses when generating content (is it using provided documents? or its own intermediate thoughts?). We present Attribution with Attention (AT2), a method for doing so efficiently and reliably! (1/8)

thumb_up_off_alt56

chat_bubble_outline3

repeat13

shareShare

MIT NLP

@nlp_mit

2 months ago

MIT NLP @ ICLR 2025 - catch Belinda Li at poster 252, Thursday 10AM to chat about "Eliciting Human Preferences With Language Models"!

thumb_up_off_alt13

chat_bubble_outline0

repeat0

shareShare

MIT NLP

@nlp_mit

2 months ago

check out Pratyusha Sharma’s TED talk on her amazing research using AI to decode the language of sperm whales! 👏

thumb_up_off_alt10

chat_bubble_outline0

repeat1

shareShare

João Loula

@joaoloula

2 months ago

#ICLR2025 Oral How can we control LMs using diverse signals such as static analyses, test cases, and simulations? In our paper “Syntactic and Semantic Control of Large Language Models via Sequential Monte Carlo” we: Cast controlled generation as an inference problem, with the LM

thumb_up_off_alt24

chat_bubble_outline2

repeat11

shareShare

Ġabe Ġrand

@gabe_grand

2 months ago

Happening now! Stop by poster #634 to chat with the GenLM team ICLR 2026

Happening now! Stop by poster #634 to chat with the GenLM team <a href="/iclr_conf/">ICLR 2026</a>

thumb_up_off_alt15

chat_bubble_outline1

repeat2

shareShare

Paul Liang

@pliang279

2 months ago

came for #ICLR2025, stayed for #petexpo2025 🐶 🐕

thumb_up_off_alt123

chat_bubble_outline0

repeat2

shareShare

[email protected]

@ddvd233

2 months ago

This work has been accepted to #ICML2025. See you in Vancouver!

thumb_up_off_alt51

chat_bubble_outline7

repeat2

shareShare

Yung-Sung Chuang

@yungsungchuang

2 months ago

SelfCite is now accepted by ICML 2025! 🎉 See you in Vancouver! 🇨🇦

thumb_up_off_alt64

chat_bubble_outline0

repeat8

shareShare

Ben Lipkin

@ben_lipkin

a month ago

Many LM applications may be formulated as targeting some (Boolean) constraint. Generate a… - Python program that passes a test suite - PDDL plan that satisfies a goal - CoT trajectory that yields a positive reward The list goes on… How can we efficiently satisfy these? 🧵👇

thumb_up_off_alt22

chat_bubble_outline1

repeat7

shareShare

Andrew Rouditchenko 🇺🇦

@arouditchenko

a month ago

Do you really need audio to fine-tune your Audio LLM? 🤔 Answer below: Introducing Omni-R1, a simple GRPO fine‑tuning method for Qwen2.5‑Omni on audio question answering. It sets new state‑of‑the‑art accuracies on the MMAU benchmark for Audio LLMs. arxiv.org/abs/2505.09439

thumb_up_off_alt145

chat_bubble_outline2

repeat35

shareShare

Songlin Yang

@songlinyang4

a month ago

📢 (1/16) Introducing PaTH 🛣️ — a RoPE-free contextualized position encoding scheme, built for stronger state tracking, better extrapolation, and hardware-efficient training. PaTH outperforms RoPE across short and long language modeling benchmarks arxiv.org/abs/2505.16381

thumb_up_off_alt424

chat_bubble_outline9

repeat79

shareShare

Yung-Sung Chuang

@yungsungchuang

22 days ago

🚨Do passage rerankers really need explicit reasoning?🤔—Maybe Not! Our findings: ⚖️Standard rerankers outperform those w/ step-by-step reasoning! 🚫Disable reasoning from reasoning reranker actually improves reranking accuracy!🤯 👇But, why? 📰arxiv.org/abs/2505.16886 (1/6)

thumb_up_off_alt54

chat_bubble_outline1

repeat16

shareShare

Tianyuan Zhang

@tianyuanzhang99

16 days ago

Bored of linear recurrent memories (e.g., linear attention) and want a scalable, nonlinear alternative? Our new paper “Test-Time Training Done Right” propose LaCT (Large Chunk Test-Time Training) — a highly efficient, massively scalable nonlinear memory with: 💡 Pure PyTorch

thumb_up_off_alt390

chat_bubble_outline5

repeat74

shareShare

Tanishq Mathew Abraham, Ph.D.

@iscienceluvr

16 days ago

QoQ-Med: Building Multimodal Clinical Foundation Models with Domain-Aware GRPO Training "we introduce QoQ-Med-7B/32B, the first open generalist clinical foundation model that jointly reasons across medical images, time-series signals, and text reports. QoQ-Med is trained with

thumb_up_off_alt114

chat_bubble_outline5

repeat27

shareShare

[email protected]

@ddvd233

15 days ago

Thanks Tanishq Mathew Abraham, Ph.D. for posting about our recent work! We're excited to introduce QoQ-Med, a multimodal medical foundation model that jointly reasons across medical images, videos, time series (ECG), and clinical texts. Beyond the model itself, we developed a novel training

Thanks <a href="/iScienceLuvr/">Tanishq Mathew Abraham, Ph.D.</a> for posting about our recent work!

We're excited to introduce QoQ-Med, a multimodal medical foundation model that jointly reasons across medical images, videos, time series (ECG), and clinical texts. Beyond the model itself, we developed a novel training

thumb_up_off_alt67

chat_bubble_outline6

repeat15

shareShare

Han Guo

@hanguo97

13 days ago

We know Attention and its linear-time variants, such as linear attention and State Space Models. But what lies in between? Introducing Log-Linear Attention with: - Log-linear time training - Log-time inference (in both time and memory) - Hardware-efficient Triton kernels

thumb_up_off_alt1,1K

chat_bubble_outline14

repeat185

shareShare

[email protected]

@ddvd233

12 days ago

🚀 QoQ-Med is now live on Hugging Face! Load it in seconds with ddvd233/QoQ-Med-VL-7B in your favorite 🤗 Transformers pipeline. No code? No problem: fire up LM Studio (or any llama.cpp GUI), search “QoQ”, and start chatting. Weights + docs → github.com/DDVD233/QoQ_Med

🚀 QoQ-Med is now live on <a href="/huggingface/">Hugging Face</a>!
Load it in seconds with ddvd233/QoQ-Med-VL-7B in your favorite 🤗 Transformers pipeline.
No code? No problem: fire up LM Studio (or any llama.cpp GUI), search “QoQ”, and start chatting.

Weights + docs → github.com/DDVD233/QoQ_Med

thumb_up_off_alt43

chat_bubble_outline4

repeat8

shareShare

Jyo Pari

@jyo_pari

6 days ago

What if an LLM could update its own weights? Meet SEAL🦭: a framework where LLMs generate their own training data (self-edits) to update their weights in response to new inputs. Self-editing is learned via RL, using the updated model’s downstream performance as reward.

thumb_up_off_alt3,3K

chat_bubble_outline124

repeat514

shareShare

Morris Yau

@morrisyau

5 days ago

Transformers: ⚡️fast to train (compute-bound), 🐌slow to decode (memory-bound). Can Transformers be optimal in both? Yes! By exploiting sequential-parallel duality. We introduce Transformer-PSM with constant time per token decode. 🧐 arxiv.org/pdf/2506.10918

thumb_up_off_alt187

chat_bubble_outline2

repeat36

shareShare

MIT NLP

Gate.io

MIT NLP

Ben Cohen-Wang

MIT NLP

MIT NLP

João Loula

Ġabe Ġrand

Paul Liang

[email protected]

Yung-Sung Chuang

Ben Lipkin

Andrew Rouditchenko 🇺🇦

Songlin Yang

Yung-Sung Chuang

Tianyuan Zhang

Tanishq Mathew Abraham, Ph.D.

[email protected]

Han Guo

[email protected]

Jyo Pari

Morris Yau