Huayu Chen (@chenhuay17) 's Twitter Profile
Huayu Chen

@chenhuay17

Phd candidate at TsinghuaSAIL

ID: 1847135162636947456

linkhttp://chendrag.github.io calendar_today18-10-2024 04:38:48

7 Tweet

22 Followers

57 Following

Huayu Chen (@chenhuay17) 's Twitter Profile Photo

Introduce our recent work. Toward Guidance-Free AR Visual Generation via Condition Contrastive Alignment arxiv.org/abs/2410.09347 CCA vastly improves fid/is performance of pretrained autoregressive visual models by only 1 epoch of fintuning. Requires only pretraining data.

Introduce our recent work.

Toward Guidance-Free AR Visual Generation via Condition Contrastive Alignment

arxiv.org/abs/2410.09347

 CCA vastly improves fid/is performance of pretrained autoregressive visual models by only 1 epoch of fintuning. Requires only pretraining data.
Huayu Chen (@chenhuay17) 's Twitter Profile Photo

Guidance-Free Training Add <10 lines of code to achieve guidance-free visual sampling. Same or even better performance with CFG on 5 distinct models across diffusion/AR/masked. Support from-scratch training. Require no extra GPU memory. Check out arxiv.org/abs/2501.15420

Guidance-Free Training

Add &lt;10 lines of code to achieve guidance-free visual sampling. 

Same or even better performance with CFG on 5 distinct models across diffusion/AR/masked. Support from-scratch training. Require no extra GPU memory.

Check out arxiv.org/abs/2501.15420
Yin Cui (@yincuicv) 's Twitter Profile Photo

Is self-improvement exclusive to RL? Can we use supervised learning to match LLMs trained with SOTA RL algorithms? In Negative-aware Fine-Tuning (NFT), we introduce a purely supervised learning method to enhance LLMs' math reasoning with no external teachers. NFT matches or

Is self-improvement exclusive to RL?

Can we use supervised learning to match LLMs trained with SOTA RL algorithms?

In Negative-aware Fine-Tuning (NFT), we introduce a purely supervised learning method to enhance LLMs' math reasoning with no external teachers. NFT matches or
Haoxiang Wang (@haoxiang__wang) 's Twitter Profile Photo

🚀Excited to share our new LLM math reasoning work! 🔥Supervised learning (as a replacement for RL) can reach SoTA performance on LLM math reasoning! 📊

Ganqu Cui (@charlesfornlp) 's Twitter Profile Photo

So many works talking about entropy, but what is the **mechanism** of entropy in RL for LLMs? 🤔 Our work gives a principled understanding, as well as two tricks that get entropy **controlled** 🧵

So many works talking about entropy, but what is the **mechanism** of entropy in RL for LLMs? 🤔

Our work gives a principled understanding, as well as two tricks that get entropy **controlled** 🧵
Aran Komatsuzaki (@arankomatsuzaki) 's Twitter Profile Photo

DiffusionNFT: RL for diffusion models via the forward process • Contrastive fine-tuning: positives vs negatives → implicit policy improvement • Works with any solver, no CFG, no trajectory storage • 25× more efficient than FlowGRPO • Boosts SD3.5-M: GenEval 0.24 → 0.98 in

DiffusionNFT: RL for diffusion models via the forward process

• Contrastive fine-tuning: positives vs negatives → implicit policy improvement
• Works with any solver, no CFG, no trajectory storage
• 25× more efficient than FlowGRPO
• Boosts SD3.5-M: GenEval 0.24 → 0.98 in
Ming-Yu Liu (@liu_mingyu) 's Twitter Profile Photo

Looking for an RL algorithm for improving your diffusion models? DiffusionNFT might be able to help. Check it out. github.com/NVlabs/Diffusi… 25x more efficient than FlowGRPO and gives you SOTA results on various benchmarks. with Huayu Chen Kaiwen Zheng Qinsheng Zhang Haoxiang Wang

Looking for an RL algorithm for improving your diffusion models? DiffusionNFT might be able to help. Check it out.
github.com/NVlabs/Diffusi…

25x more efficient than FlowGRPO and gives you SOTA results on various benchmarks.

with <a href="/chenhuay17/">Huayu Chen</a> <a href="/zkwthu/">Kaiwen Zheng</a> <a href="/qsh_zh/">Qinsheng Zhang</a> <a href="/Haoxiang__Wang/">Haoxiang Wang</a>
Huayu Chen (@chenhuay17) 's Twitter Profile Photo

Check out Haotian Ye @ NeurIPS25's work on regularized video diffusion RL. It is amazing how simple data regularization turns out to be so effective in preventing hacking problem and boost quality.