Weiting (Steven) Tan (@weiting_nlp) Twitter Tweets • TwiCopy

Weiting (Steven) Tan

@weiting_nlp

+ Follow

Ph.D. student at @jhuclsp, Student Researcher @AIatMeta | Prev @AIatMeta @Amazon Alexa AI

ID: 1414244140544573442

linkhttps://steventan0110.github.io/ calendar_today11-07-2021 15:24:25

65 Tweet

173 Followers

269 Following

Saining Xie

@sainingxie

a year ago

Representation matters. Representation matters. Representation matters, even for generative models. We might've been training our diffusion models the wrong way this whole time. Meet REPA: Training Diffusion Transformers is easier than you think! sihyun.me/REPA/(🧵1/n)

thumb_up_off_alt1,1K

chat_bubble_outline29

repeat268

shareShare

Weiting (Steven) Tan

@weiting_nlp

a year ago

Excited to see that SpiritLM is fully open-sourced now. It supports speech and text as both input and output. Please consider trying it at: github.com/facebookresear…

thumb_up_off_alt4

chat_bubble_outline0

repeat1

shareShare

Sherjil Ozair

@sherjilozair

a year ago

Very happy to hear that GANs are getting the test of time award at NeurIPS 2024. The NeurIPS test of time awards are given to papers which have stood the test of the time for a decade. I took some time to reminisce how GANs came about and how AI has evolve in the last decade.

thumb_up_off_alt982

chat_bubble_outline18

repeat119

shareShare

Tianjian Li

@tli104

a year ago

I have written a blogpost offering an explanation of why both the chosen and the rejected log-probability decreases during DPO, and more interestingly, why it is a desired phenomenon to some extent. Link: tianjianl.github.io/blog/2024/dpo/

thumb_up_off_alt13

chat_bubble_outline0

repeat6

shareShare

Weiting (Steven) Tan

@weiting_nlp

a year ago

I had a great time helping host MASC-SLL at Hopkins last year. MASC-SLL is a great opportunity to connect with fellow AI/NLP/Speech researchers. If your organization is in the Mid-Atlantic region and is interested in hosting the event, please reach out!

thumb_up_off_alt4

chat_bubble_outline0

repeat1

shareShare

JHU Computer Science

@jhucompsci

a year ago

Congratulations to Prof. Philipp Koehn on being named a Fellow of the ACL 2025! cs.jhu.edu/news/philipp-k…

thumb_up_off_alt30

chat_bubble_outline0

repeat4

shareShare

DeepSeek

@deepseek_ai

10 months ago

🛠️ DeepSeek-R1: Technical Highlights 📈 Large-scale RL in post-training 🏆 Significant performance boost with minimal labeled data 🔢 Math, code, and reasoning tasks on par with OpenAI-o1 📄 More details: github.com/deepseek-ai/De… 🐋 4/n

thumb_up_off_alt5,5K

chat_bubble_outline239

repeat852

shareShare

Jacob Austin

@jacobaustin132

10 months ago

Making LLMs run efficiently can feel scary, but scaling isn’t magic, it’s math! We wanted to demystify the “systems view” of LLMs and wrote a little textbook called “How To Scale Your Model” which we’re releasing today. 1/n

thumb_up_off_alt1,1K

chat_bubble_outline25

repeat377

shareShare

Benjamin Van Durme

@ben_vandurme

8 months ago

Our latest on compressed representations: Key-Value Distillation (KVD). Query-independen transformer compression, with offline supervised distillation.

thumb_up_off_alt134

chat_bubble_outline2

repeat28

shareShare

Dongfu Jiang

@dongfujiang

3 months ago

🚀 Excited to finally share our paper on VerlTool, released today after months of work since the initial release in late May! VerlTool is a high-efficiency, easy-to-use framework for Agentic RL with Tool use (ARLT), built on top of VeRL. It currently supports a wide range of

thumb_up_off_alt157

chat_bubble_outline2

repeat37

shareShare

Jason Weston

@jaseweston

3 months ago

🌀Diversity Aware RL (DARLING)🌀 📝: arxiv.org/abs/2509.02534 - Jointly optimizes for quality & diversity using a learned partition function - Outperforms standard RL in quality AND diversity metrics, e.g. higher pass@1/p@k - Works for both non-verifiable & verifiable tasks 🧵1/5

thumb_up_off_alt404

chat_bubble_outline4

repeat81

shareShare

Sanxing Chen

@sanxing_chen

2 months ago

Most RL for LLMs today is single-step optimization on a given state (e.g., an instruction), which is essentially a bandit setup. But to learn a meta-policy that can solve various bandit problems via in-context trial and error, you need true multi-turn RL over a long horizon. So,

thumb_up_off_alt25

chat_bubble_outline1

repeat12

shareShare

Jubayer Ibn Hamid

@jubayer_hamid

2 months ago

Exploration is fundamental to RL. Yet policy gradient methods often collapse: during training they fail to explore broadly, and converge into narrow, easily exploitable behaviors. The result is poor generalization, limited gains from test-time scaling, and brittleness on tasks

thumb_up_off_alt1,1K

chat_bubble_outline16

repeat135

shareShare