Weiting (Steven) Tan (@weiting_nlp) 's Twitter Profile
Weiting (Steven) Tan

@weiting_nlp

Ph.D. student at @jhuclsp, Student Researcher @AIatMeta | Prev @AIatMeta @Amazon Alexa AI

ID: 1414244140544573442

linkhttps://steventan0110.github.io/ calendar_today11-07-2021 15:24:25

65 Tweet

173 Followers

269 Following

Saining Xie (@sainingxie) 's Twitter Profile Photo

Representation matters. Representation matters. Representation matters, even for generative models. We might've been training our diffusion models the wrong way this whole time. Meet REPA: Training Diffusion Transformers is easier than you think! sihyun.me/REPA/(🧵1/n)

Representation matters. 
Representation matters. 
Representation matters, even for generative models.

We might've been training our diffusion models the wrong way this whole time. Meet REPA: Training Diffusion Transformers is easier than you think! sihyun.me/REPA/(🧵1/n)
Weiting (Steven) Tan (@weiting_nlp) 's Twitter Profile Photo

Excited to see that SpiritLM is fully open-sourced now. It supports speech and text as both input and output. Please consider trying it at: github.com/facebookresear…

Sherjil Ozair (@sherjilozair) 's Twitter Profile Photo

Very happy to hear that GANs are getting the test of time award at NeurIPS 2024. The NeurIPS test of time awards are given to papers which have stood the test of the time for a decade. I took some time to reminisce how GANs came about and how AI has evolve in the last decade.

Tianjian Li (@tli104) 's Twitter Profile Photo

I have written a blogpost offering an explanation of why both the chosen and the rejected log-probability decreases during DPO, and more interestingly, why it is a desired phenomenon to some extent. Link: tianjianl.github.io/blog/2024/dpo/

Weiting (Steven) Tan (@weiting_nlp) 's Twitter Profile Photo

I had a great time helping host MASC-SLL at Hopkins last year. MASC-SLL is a great opportunity to connect with fellow AI/NLP/Speech researchers. If your organization is in the Mid-Atlantic region and is interested in hosting the event, please reach out!

DeepSeek (@deepseek_ai) 's Twitter Profile Photo

🛠️ DeepSeek-R1: Technical Highlights 📈 Large-scale RL in post-training 🏆 Significant performance boost with minimal labeled data 🔢 Math, code, and reasoning tasks on par with OpenAI-o1 📄 More details: github.com/deepseek-ai/De… 🐋 4/n

🛠️ DeepSeek-R1: Technical Highlights

📈 Large-scale RL in post-training
🏆 Significant performance boost with minimal labeled data
🔢 Math, code, and reasoning tasks on par with OpenAI-o1
📄 More details: github.com/deepseek-ai/De…

🐋 4/n
Jacob Austin (@jacobaustin132) 's Twitter Profile Photo

Making LLMs run efficiently can feel scary, but scaling isn’t magic, it’s math! We wanted to demystify the “systems view” of LLMs and wrote a little textbook called “How To Scale Your Model” which we’re releasing today. 1/n

Making LLMs run efficiently can feel scary, but scaling isn’t magic, it’s math! We wanted to demystify the “systems view” of LLMs and wrote a little textbook called “How To Scale Your Model” which we’re releasing today. 1/n
Benjamin Van Durme (@ben_vandurme) 's Twitter Profile Photo

Our latest on compressed representations: Key-Value Distillation (KVD). Query-independen transformer compression, with offline supervised distillation.

Our latest on compressed representations: Key-Value Distillation (KVD). Query-independen transformer compression, with offline supervised distillation.
Dongfu Jiang (@dongfujiang) 's Twitter Profile Photo

🚀 Excited to finally share our paper on VerlTool, released today after months of work since the initial release in late May! VerlTool is a high-efficiency, easy-to-use framework for Agentic RL with Tool use (ARLT), built on top of VeRL. It currently supports a wide range of

🚀 Excited to finally share our paper on VerlTool, released today after months of work since the initial release in late May!

VerlTool is a high-efficiency, easy-to-use framework for Agentic RL with Tool use (ARLT), built on top of VeRL. It currently supports a wide range of
Jason Weston (@jaseweston) 's Twitter Profile Photo

🌀Diversity Aware RL (DARLING)🌀 📝: arxiv.org/abs/2509.02534 - Jointly optimizes for quality & diversity using a learned partition function - Outperforms standard RL in quality AND diversity metrics, e.g. higher pass@1/p@k - Works for both non-verifiable & verifiable tasks 🧵1/5

🌀Diversity Aware RL (DARLING)🌀
📝: arxiv.org/abs/2509.02534
- Jointly optimizes for quality & diversity using a learned partition function
- Outperforms standard RL in quality AND diversity metrics, e.g. higher pass@1/p@k
- Works for both non-verifiable & verifiable tasks
🧵1/5
Sanxing Chen (@sanxing_chen) 's Twitter Profile Photo

Most RL for LLMs today is single-step optimization on a given state (e.g., an instruction), which is essentially a bandit setup. But to learn a meta-policy that can solve various bandit problems via in-context trial and error, you need true multi-turn RL over a long horizon. So,

Most RL for LLMs today is single-step optimization on a given state (e.g., an instruction), which is essentially a bandit setup. But to learn a meta-policy that can solve various bandit problems via in-context trial and error, you need true multi-turn RL over a long horizon. So,
Jubayer Ibn Hamid (@jubayer_hamid) 's Twitter Profile Photo

Exploration is fundamental to RL. Yet policy gradient methods often collapse: during training they fail to explore broadly, and converge into narrow, easily exploitable behaviors. The result is poor generalization, limited gains from test-time scaling, and brittleness on tasks

Exploration is fundamental to RL. Yet policy gradient methods often collapse: during training they fail to explore broadly, and converge into narrow, easily exploitable behaviors. The result is poor generalization, limited gains from test-time scaling, and brittleness on tasks