Shannon Shen (@shannonzshen) 's Twitter Profile
Shannon Shen

@shannonzshen

PhD Student @MIT_CSAIL | previously @allen_ai @semanticscholar @harvard @brownuniversity

ID: 1153833084166582272

linkhttps://www.szj.io calendar_today24-07-2019 01:03:39

430 Tweet

1,1K Followers

1,1K Following

Lucy Li (@lucy3_li) 's Twitter Profile Photo

I'm joining UW–Madison Computer Sciences UW School of Computer, Data & Information Sciences as an assistant professor in fall 2026!! There, I'll continue working on language models, computational social science, & responsible AI. 🌲🧀🚣🏻‍♀️ Apply to be my PhD student! Before then, I'll postdoc for a year at another UW🏔️ -- UW NLP Allen School.

I'm joining <a href="/WisconsinCS/">UW–Madison Computer Sciences</a> <a href="/uwcdis/">UW School of Computer, Data & Information Sciences</a> as an assistant professor in fall 2026!! There, I'll continue working on language models, computational social science, &amp; responsible AI. 🌲🧀🚣🏻‍♀️ Apply to be my PhD student!

Before then, I'll postdoc for a year at another UW🏔️ -- <a href="/uwnlp/">UW NLP</a> <a href="/uwcse/">Allen School</a>.
Stella Li (@stellalisy) 's Twitter Profile Photo

🤯 We cracked RLVR with... Random Rewards?! Training Qwen2.5-Math-7B with our Spurious Rewards improved MATH-500 by: - Random rewards: +21% - Incorrect rewards: +25% - (FYI) Ground-truth rewards: + 28.8% How could this even work⁉️ Here's why: 🧵 Blogpost: tinyurl.com/spurious-rewar…

🤯 We cracked RLVR with... Random Rewards?!
Training Qwen2.5-Math-7B with our Spurious Rewards improved MATH-500 by:
- Random rewards: +21%
- Incorrect rewards: +25%
- (FYI) Ground-truth rewards: + 28.8%
How could this even work⁉️ Here's why: 🧵
Blogpost: tinyurl.com/spurious-rewar…
Yung-Sung Chuang (@yungsungchuang) 's Twitter Profile Photo

🚨Do passage rerankers really need explicit reasoning?🤔—Maybe Not! Our findings: ⚖️Standard rerankers outperform those w/ step-by-step reasoning! 🚫Disable reasoning from reasoning reranker actually improves reranking accuracy!🤯 👇But, why? 📰arxiv.org/abs/2505.16886 (1/6)

🚨Do passage rerankers really need explicit reasoning?🤔—Maybe Not!

Our findings:
⚖️Standard rerankers outperform those w/ step-by-step reasoning!
🚫Disable reasoning from reasoning reranker actually improves reranking accuracy!🤯
👇But, why?

📰arxiv.org/abs/2505.16886

(1/6)
Rulin Shao (@rulinshao) 's Twitter Profile Photo

One more fun thing! RLVR can elicit existing behaviors like code reasoning. But! If your model is not good at code but thought it could? - RLVR w/ spurious rewards let Olmo use more code: but perf decreased (Fig 6) - When we discourage it not to: the perf goes up!🤣 (Fig 9)

One more fun thing! 
RLVR can elicit existing behaviors like code reasoning. But! If your model is not good at code but thought it could?

- RLVR w/ spurious rewards let Olmo use more code: but perf decreased (Fig 6)
- When we discourage it not to: the perf goes up!🤣 (Fig 9)
CLS (@chengleisi) 's Twitter Profile Photo

This year, there have been various pieces of evidence that AI agents are starting to be able to conduct scientific research and produce papers end-to-end, at a level where some of these generated papers were already accepted by top-tier conferences/workshops. Intology’s

Han Guo (@hanguo97) 's Twitter Profile Photo

We know Attention and its linear-time variants, such as linear attention and State Space Models. But what lies in between? Introducing Log-Linear Attention with: - Log-linear time training - Log-time inference (in both time and memory) - Hardware-efficient Triton kernels

We know Attention and its linear-time variants, such as linear attention and State Space Models. But what lies in between?

Introducing Log-Linear Attention with:

- Log-linear time training
- Log-time inference (in both time and memory)
- Hardware-efficient Triton kernels
Songlin Yang (@songlinyang4) 's Twitter Profile Photo

Flash Linear Attention (github.com/fla-org/flash-…) will no longer maintain support for the RWKV series (existing code will remain available). Here’s why:

Jyo Pari (@jyo_pari) 's Twitter Profile Photo

What if an LLM could update its own weights? Meet SEAL🦭: a framework where LLMs generate their own training data (self-edits) to update their weights in response to new inputs. Self-editing is learned via RL, using the updated model’s downstream performance as reward.

What if an LLM could update its own weights?

Meet SEAL🦭: a framework where LLMs generate their own training data (self-edits) to update their weights in response to new inputs.

Self-editing is learned via RL, using the updated model’s downstream performance as reward.
Han Guo (@hanguo97) 's Twitter Profile Photo

One key takeaway from recent work on test-time compute: even a small weight update can make a big difference. So, what happens if we meta-learn those updates (and not necessarily at test time)? Excited to share this new work led by Adam Zweiger and Jyo Pari!

Rulin Shao (@rulinshao) 's Twitter Profile Photo

🎉Our Spurious Rewards is available on ArXiv! We added experiments on - More prompts/steps/models/analysis... - Spurious Prompts! Surprisingly, we obtained 19.4% gains when replacing prompts with LaTex placeholder text (\lipsum) 😶‍🌫️ Check out our 2nd blog: tinyurl.com/spurious-prompt

🎉Our Spurious Rewards is available on ArXiv! We added experiments on
- More prompts/steps/models/analysis...
- Spurious Prompts!
Surprisingly, we obtained 19.4% gains when replacing prompts with LaTex placeholder text (\lipsum) 😶‍🌫️

Check out our 2nd blog: tinyurl.com/spurious-prompt
Shannon Shen (@shannonzshen) 's Twitter Profile Photo

Also for the CS audience — Melissa is definitely the deep learning guru in Econ and it was wonderful to work with her! It’s a really exciting opportunity to work on projects that can both advance the underlying algorithms and solve large scale real world problems!

LM4SCI @ COLM2025 (@lm4sci) 's Twitter Profile Photo

🚨 Call for Papers: LM4Sci Conference on Language Modeling 2025 🚨 Excited to announce the Large Language Modeling for Scientific Discovery (LM4Sci) workshop at COLM 2025 in Montreal, Canada! Submission Deadline: June 23 Notification: July 24 Workshop: October 10, 2025

🚨 Call for Papers: LM4Sci <a href="/COLM_conf/">Conference on Language Modeling</a> 2025 🚨

Excited to announce the Large Language Modeling for Scientific Discovery (LM4Sci) workshop at COLM 2025 in Montreal, Canada!

Submission Deadline: June 23
Notification: July 24
Workshop: October 10, 2025
CLS (@chengleisi) 's Twitter Profile Photo

Are AI scientists already better than human researchers? We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts. Main finding: LLM ideas result in worse projects than human ideas.

Are AI scientists already better than human researchers?

We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts.

Main finding: LLM ideas result in worse projects than human ideas.
Weijia Shi (@weijiashi2) 's Twitter Profile Photo

Can data owners & LM developers collaborate to build a strong shared model while each retaining data control? Introducing FlexOlmo💪, a mixture-of-experts LM enabling: • Flexible training on your local data without sharing it • Flexible inference to opt in/out your data

Shannon Shen (@shannonzshen) 's Twitter Profile Photo

Check out the super interesting paper led by Monica Agrawal and Lio Wong! As RAG/Deep Research systems are becoming popular, it might misinterpret people’s information seeking goals and cause harms! We elaborate using medical domains as an example and propose ways to mitigate.

Stella Li (@stellalisy) 's Twitter Profile Photo

WHY do you prefer something over another? Reward models treat preference as a black-box😶‍🌫️but human brains🧠decompose decisions into hidden attributes We built the first system to mirror how people really make decisions in our #COLM2025 paper🎨PrefPalette✨ Why it matters👉🏻🧵

WHY do you prefer something over another?

Reward models treat preference as a black-box😶‍🌫️but human brains🧠decompose decisions into hidden attributes

We built the first system to mirror how people really make decisions in our #COLM2025 paper🎨PrefPalette✨

Why it matters👉🏻🧵
Yung-Sung Chuang (@yungsungchuang) 's Twitter Profile Photo

Scaling CLIP on English-only data is outdated now… 🌍We built CLIP data curation pipeline for 300+ languages 🇬🇧We train MetaCLIP 2 without compromising English-task performance (it actually improves! 🥳It’s time to drop the language filter! 📝arxiv.org/abs/2507.22062 [1/5] 🧵

Scaling CLIP on English-only data is outdated now…

🌍We built CLIP data curation pipeline for 300+ languages
🇬🇧We train MetaCLIP 2 without compromising English-task performance (it actually improves!
🥳It’s time to drop the language filter!

📝arxiv.org/abs/2507.22062

[1/5]

🧵
Jyo Pari (@jyo_pari) 's Twitter Profile Photo

For agents to improve over time, they can’t afford to forget what they’ve already mastered. We found that supervised fine-tuning forgets more than RL when training on a new task! Want to find out why? 👇

For agents to improve over time, they can’t afford to forget what they’ve already mastered.

We found that supervised fine-tuning forgets more than RL when training on a new task! 

Want to find out why? 👇