MIT NLP (@nlp_mit) 's Twitter Profile
MIT NLP

@nlp_mit

NLP Group at @MIT_CSAIL! PIs: @yoonrkim @jacobandreas @lateinteraction @pliang279 @david_sontag, Jim Glass, @roger_p_levy

ID: 1902072731954233344

calendar_today18-03-2025 19:01:42

16 Tweet

3,3K Takipçi

45 Takip Edilen

MIT NLP (@nlp_mit) 's Twitter Profile Photo

MIT NLP @ ICLR 2025 - catch Mehul Damani at poster 219, Thursday 3PM to chat about "Learning How Hard to Think: Input Adaptive Allocation of LM Computation"!

MIT NLP @ ICLR 2025 - catch
<a href="/MehulDamani2/">Mehul Damani</a> at poster 219, Thursday 3PM to chat about "Learning How Hard to Think: Input Adaptive Allocation of LM Computation"!
Ben Cohen-Wang (@bcohenwang) 's Twitter Profile Photo

It can be helpful to pinpoint the in-context information that a language model uses when generating content (is it using provided documents? or its own intermediate thoughts?). We present Attribution with Attention (AT2), a method for doing so efficiently and reliably! (1/8)

It can be helpful to pinpoint the in-context information that a language model uses when generating content (is it using provided documents? or its own intermediate thoughts?). We present Attribution with Attention (AT2), a method for doing so efficiently and reliably! (1/8)
João Loula (@joaoloula) 's Twitter Profile Photo

#ICLR2025 Oral How can we control LMs using diverse signals such as static analyses, test cases, and simulations? In our paper “Syntactic and Semantic Control of Large Language Models via Sequential Monte Carlo” we: Cast controlled generation as an inference problem, with the LM

#ICLR2025 Oral

How can we control LMs using diverse signals such as static analyses, test cases, and simulations?
In our paper “Syntactic and Semantic Control of Large Language Models via Sequential Monte Carlo” we:
Cast controlled generation as an inference problem, with the LM
Ben Lipkin (@ben_lipkin) 's Twitter Profile Photo

Many LM applications may be formulated as targeting some (Boolean) constraint. Generate a… - Python program that passes a test suite - PDDL plan that satisfies a goal - CoT trajectory that yields a positive reward The list goes on… How can we efficiently satisfy these? 🧵👇

Andrew Rouditchenko 🇺🇦 (@arouditchenko) 's Twitter Profile Photo

Do you really need audio to fine-tune your Audio LLM? 🤔 Answer below: Introducing Omni-R1, a simple GRPO fine‑tuning method for Qwen2.5‑Omni on audio question answering. It sets new state‑of‑the‑art accuracies on the MMAU benchmark for Audio LLMs. arxiv.org/abs/2505.09439

Songlin Yang (@songlinyang4) 's Twitter Profile Photo

📢 (1/16) Introducing PaTH 🛣️ — a RoPE-free contextualized position encoding scheme, built for stronger state tracking, better extrapolation, and hardware-efficient training. PaTH outperforms RoPE across short and long language modeling benchmarks arxiv.org/abs/2505.16381

Yung-Sung Chuang (@yungsungchuang) 's Twitter Profile Photo

🚨Do passage rerankers really need explicit reasoning?🤔—Maybe Not! Our findings: ⚖️Standard rerankers outperform those w/ step-by-step reasoning! 🚫Disable reasoning from reasoning reranker actually improves reranking accuracy!🤯 👇But, why? 📰arxiv.org/abs/2505.16886 (1/6)

🚨Do passage rerankers really need explicit reasoning?🤔—Maybe Not!

Our findings:
⚖️Standard rerankers outperform those w/ step-by-step reasoning!
🚫Disable reasoning from reasoning reranker actually improves reranking accuracy!🤯
👇But, why?

📰arxiv.org/abs/2505.16886

(1/6)
Tianyuan Zhang (@tianyuanzhang99) 's Twitter Profile Photo

Bored of linear recurrent memories (e.g., linear attention) and want a scalable, nonlinear alternative? Our new paper “Test-Time Training Done Right” propose LaCT (Large Chunk Test-Time Training) — a highly efficient, massively scalable nonlinear memory with: 💡 Pure PyTorch

Tanishq Mathew Abraham, Ph.D. (@iscienceluvr) 's Twitter Profile Photo

QoQ-Med: Building Multimodal Clinical Foundation Models with Domain-Aware GRPO Training "we introduce QoQ-Med-7B/32B, the first open generalist clinical foundation model that jointly reasons across medical images, time-series signals, and text reports. QoQ-Med is trained with

QoQ-Med: Building Multimodal Clinical Foundation Models with Domain-Aware GRPO Training

"we introduce QoQ-Med-7B/32B, the first open generalist clinical  foundation model that jointly reasons across medical images, time-series  signals, and text reports. QoQ-Med is trained with
dvd@dvd.chat (@ddvd233) 's Twitter Profile Photo

Thanks Tanishq Mathew Abraham, Ph.D. for posting about our recent work! We're excited to introduce QoQ-Med, a multimodal medical foundation model that jointly reasons across medical images, videos, time series (ECG), and clinical texts. Beyond the model itself, we developed a novel training

Thanks <a href="/iScienceLuvr/">Tanishq Mathew Abraham, Ph.D.</a> for posting about our recent work! 

We're excited to introduce QoQ-Med, a multimodal medical foundation model that jointly reasons across medical images, videos, time series (ECG), and clinical texts. Beyond the model itself, we developed a novel training
Han Guo (@hanguo97) 's Twitter Profile Photo

We know Attention and its linear-time variants, such as linear attention and State Space Models. But what lies in between? Introducing Log-Linear Attention with: - Log-linear time training - Log-time inference (in both time and memory) - Hardware-efficient Triton kernels

We know Attention and its linear-time variants, such as linear attention and State Space Models. But what lies in between?

Introducing Log-Linear Attention with:

- Log-linear time training
- Log-time inference (in both time and memory)
- Hardware-efficient Triton kernels
dvd@dvd.chat (@ddvd233) 's Twitter Profile Photo

🚀 QoQ-Med is now live on Hugging Face! Load it in seconds with ddvd233/QoQ-Med-VL-7B in your favorite 🤗 Transformers pipeline. No code? No problem: fire up LM Studio (or any llama.cpp GUI), search “QoQ”, and start chatting. Weights + docs → github.com/DDVD233/QoQ_Med

🚀 QoQ-Med is now live on <a href="/huggingface/">Hugging Face</a>!
Load it in seconds with ddvd233/QoQ-Med-VL-7B in your favorite 🤗 Transformers pipeline.
No code? No problem: fire up LM Studio (or any llama.cpp GUI), search “QoQ”, and start chatting.

Weights + docs → github.com/DDVD233/QoQ_Med
Jyo Pari (@jyo_pari) 's Twitter Profile Photo

What if an LLM could update its own weights? Meet SEAL🦭: a framework where LLMs generate their own training data (self-edits) to update their weights in response to new inputs. Self-editing is learned via RL, using the updated model’s downstream performance as reward.

What if an LLM could update its own weights?

Meet SEAL🦭: a framework where LLMs generate their own training data (self-edits) to update their weights in response to new inputs.

Self-editing is learned via RL, using the updated model’s downstream performance as reward.
Morris Yau (@morrisyau) 's Twitter Profile Photo

Transformers: ⚡️fast to train (compute-bound), 🐌slow to decode (memory-bound). Can Transformers be optimal in both? Yes! By exploiting sequential-parallel duality. We introduce Transformer-PSM with constant time per token decode. 🧐 arxiv.org/pdf/2506.10918

Transformers: ⚡️fast to train (compute-bound), 🐌slow to decode (memory-bound).
Can Transformers be optimal in both? Yes! By exploiting sequential-parallel duality.  We introduce Transformer-PSM with constant time per token decode.  🧐   arxiv.org/pdf/2506.10918