Shannon Shen (@shannonzshen) Twitter Tweets • TwiCopy

Lucy Li

4 months ago

I'm joining UW–Madison Computer Sciences UW School of Computer, Data & Information Sciences as an assistant professor in fall 2026!! There, I'll continue working on language models, computational social science, & responsible AI. 🌲🧀🚣🏻‍♀️ Apply to be my PhD student! Before then, I'll postdoc for a year at another UW🏔️ -- UW NLP Allen School.

I'm joining <a href="/WisconsinCS/">UW–Madison Computer Sciences</a> <a href="/uwcdis/">UW School of Computer, Data & Information Sciences</a> as an assistant professor in fall 2026!! There, I'll continue working on language models, computational social science, & responsible AI. 🌲🧀🚣🏻‍♀️ Apply to be my PhD student!

Before then, I'll postdoc for a year at another UW🏔️ -- <a href="/uwnlp/">UW NLP</a> <a href="/uwcse/">Allen School</a>.

thumb_up_off_alt658

chat_bubble_outline72

repeat37

shareShare

Stella Li

@stellalisy

3 months ago

🤯 We cracked RLVR with... Random Rewards?! Training Qwen2.5-Math-7B with our Spurious Rewards improved MATH-500 by: - Random rewards: +21% - Incorrect rewards: +25% - (FYI) Ground-truth rewards: + 28.8% How could this even work⁉️ Here's why: 🧵 Blogpost: tinyurl.com/spurious-rewar…

thumb_up_off_alt1,1K

chat_bubble_outline69

repeat322

shareShare

Yung-Sung Chuang

@yungsungchuang

3 months ago

🚨Do passage rerankers really need explicit reasoning?🤔—Maybe Not! Our findings: ⚖️Standard rerankers outperform those w/ step-by-step reasoning! 🚫Disable reasoning from reasoning reranker actually improves reranking accuracy!🤯 👇But, why? 📰arxiv.org/abs/2505.16886 (1/6)

thumb_up_off_alt54

chat_bubble_outline1

repeat16

shareShare

Rulin Shao

@rulinshao

3 months ago

One more fun thing! RLVR can elicit existing behaviors like code reasoning. But! If your model is not good at code but thought it could? - RLVR w/ spurious rewards let Olmo use more code: but perf decreased (Fig 6) - When we discourage it not to: the perf goes up!🤣 (Fig 9)

thumb_up_off_alt129

chat_bubble_outline2

repeat24

shareShare

CLS

@chengleisi

3 months ago

This year, there have been various pieces of evidence that AI agents are starting to be able to conduct scientific research and produce papers end-to-end, at a level where some of these generated papers were already accepted by top-tier conferences/workshops. Intology’s

thumb_up_off_alt212

chat_bubble_outline13

repeat43

shareShare

Han Guo

@hanguo97

3 months ago

We know Attention and its linear-time variants, such as linear attention and State Space Models. But what lies in between? Introducing Log-Linear Attention with: - Log-linear time training - Log-time inference (in both time and memory) - Hardware-efficient Triton kernels

thumb_up_off_alt1,1K

chat_bubble_outline14

repeat185

shareShare

Songlin Yang

@songlinyang4

3 months ago

Flash Linear Attention (github.com/fla-org/flash-…) will no longer maintain support for the RWKV series (existing code will remain available). Here’s why:

thumb_up_off_alt793

chat_bubble_outline11

repeat75

shareShare

Jyo Pari

@jyo_pari

3 months ago

What if an LLM could update its own weights? Meet SEAL🦭: a framework where LLMs generate their own training data (self-edits) to update their weights in response to new inputs. Self-editing is learned via RL, using the updated model’s downstream performance as reward.

thumb_up_off_alt3,3K

chat_bubble_outline124

repeat514

shareShare

Han Guo

@hanguo97

3 months ago

One key takeaway from recent work on test-time compute: even a small weight update can make a big difference. So, what happens if we meta-learn those updates (and not necessarily at test time)? Excited to share this new work led by Adam Zweiger and Jyo Pari!

thumb_up_off_alt41

chat_bubble_outline1

repeat8

shareShare

Rulin Shao

@rulinshao

3 months ago

🎉Our Spurious Rewards is available on ArXiv! We added experiments on - More prompts/steps/models/analysis... - Spurious Prompts! Surprisingly, we obtained 19.4% gains when replacing prompts with LaTex placeholder text (\lipsum) 😶‍🌫️ Check out our 2nd blog: tinyurl.com/spurious-prompt

$🎉Our Spurious Rewards is available on ArXiv! We added experiments on - More prompts/steps/models/analysis... - Spurious Prompts! Surprisingly, we obtained 19.4% gains when replacing prompts with LaTex placeholder text (\lipsum) 😶‍🌫️ Check out our 2nd blog: tinyurl.com/spurious-prompt$

thumb_up_off_alt219

chat_bubble_outline4

repeat40

shareShare

Shannon Shen

@shannonzshen

3 months ago

Also for the CS audience — Melissa is definitely the deep learning guru in Econ and it was wonderful to work with her! It’s a really exciting opportunity to work on projects that can both advance the underlying algorithms and solve large scale real world problems!

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

LM4SCI @ COLM2025

@lm4sci

3 months ago

🚨 Call for Papers: LM4Sci Conference on Language Modeling 2025 🚨 Excited to announce the Large Language Modeling for Scientific Discovery (LM4Sci) workshop at COLM 2025 in Montreal, Canada! Submission Deadline: June 23 Notification: July 24 Workshop: October 10, 2025

🚨 Call for Papers: LM4Sci <a href="/COLM_conf/">Conference on Language Modeling</a> 2025 🚨

Excited to announce the Large Language Modeling for Scientific Discovery (LM4Sci) workshop at COLM 2025 in Montreal, Canada!

Submission Deadline: June 23
Notification: July 24
Workshop: October 10, 2025

thumb_up_off_alt21

chat_bubble_outline1

repeat9

shareShare

CLS

@chengleisi

2 months ago

Are AI scientists already better than human researchers? We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts. Main finding: LLM ideas result in worse projects than human ideas.

thumb_up_off_alt553

chat_bubble_outline10

repeat162

shareShare

Weijia Shi

@weijiashi2

2 months ago

Can data owners & LM developers collaborate to build a strong shared model while each retaining data control? Introducing FlexOlmo💪, a mixture-of-experts LM enabling: • Flexible training on your local data without sharing it • Flexible inference to opt in/out your data

thumb_up_off_alt197

chat_bubble_outline7

repeat59

shareShare

Shannon Shen

@shannonzshen

2 months ago

Check out the self-cite poster if you are around in #ICML2025 !

thumb_up_off_alt6

chat_bubble_outline0

repeat0

shareShare

Shannon Shen

@shannonzshen

2 months ago

Check out the super interesting paper led by Monica Agrawal and Lio Wong! As RAG/Deep Research systems are becoming popular, it might misinterpret people’s information seeking goals and cause harms! We elaborate using medical domains as an example and propose ways to mitigate.

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Stella Li

@stellalisy

2 months ago

WHY do you prefer something over another? Reward models treat preference as a black-box😶‍🌫️but human brains🧠decompose decisions into hidden attributes We built the first system to mirror how people really make decisions in our #COLM2025 paper🎨PrefPalette✨ Why it matters👉🏻🧵

thumb_up_off_alt377

chat_bubble_outline6

repeat71

shareShare

Alex Gu @ iclr

@minimario1729

2 months ago

Thanks to MIT News for covering our vision of AI for code! A lot of progress made, but still a long way to go!

thumb_up_off_alt28

chat_bubble_outline1

repeat3

shareShare

Yung-Sung Chuang

@yungsungchuang

a month ago

Scaling CLIP on English-only data is outdated now… 🌍We built CLIP data curation pipeline for 300+ languages 🇬🇧We train MetaCLIP 2 without compromising English-task performance (it actually improves! 🥳It’s time to drop the language filter! 📝arxiv.org/abs/2507.22062 [1/5] 🧵

thumb_up_off_alt290

chat_bubble_outline3

repeat80

shareShare

Jyo Pari

@jyo_pari

3 days ago

For agents to improve over time, they can’t afford to forget what they’ve already mastered. We found that supervised fine-tuning forgets more than RL when training on a new task! Want to find out why? 👇

thumb_up_off_alt487

chat_bubble_outline5

repeat78

shareShare