Yaofang Liu (@stephenajason) 's Twitter Profile
Yaofang Liu

@stephenajason

Ph. D. Candidate, CityUHK, research intern at Noah's Ark Lab, HK, Prev. Tencent AI Lab, working on diffusion models and video generation

ID: 892327075759013888

linkhttps://scholar.google.com/citations?user=WWb7Y7AAAAAJ&hl=en calendar_today01-08-2017 10:12:08

105 Tweet

88 Takipçi

288 Takip Edilen

Yaofang Liu (@stephenajason) 's Twitter Profile Photo

a nice project to check: its a free open source vibe coding app and supports the new kimi k2 model, built entirely with radio huggingface.co/spaces/akhaliq…

Yaofang Liu (@stephenajason) 's Twitter Profile Photo

Thanks for sharing, AK. Pusa V1.0 is truly an exceptional model, surprising us with unprecedented low cost. Our approach fine-tunes the Wan-T2V-14B model using only $500 and 4K training samples, yet outperforms Wan’s official Image-to-Video model on VBench-I2V with 10 inference

Thanks for sharing, AK. Pusa V1.0 is truly an exceptional model, surprising us with unprecedented low cost. Our approach fine-tunes the Wan-T2V-14B model using only $500 and 4K training samples, yet outperforms Wan’s official Image-to-Video model on VBench-I2V with 10 inference
Emad (@emostaque) 's Twitter Profile Photo

Excellent work taking Wan T2V to sota I2V on just 4k samples and $500 of compute The key is really the right data and care of the underlying model structure which we currently brute force There are some interesting things in the approach, check it out!

Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

Training a language model with raw rewards collapses when every hard question hands back zero. Guided Hybrid Policy Optimization (GHPO) fixes that by dropping small hints only on those stubborn items. RL with verifiable rewards is a game of 1 for correct, 0 for wrong. If the

Training a language model with raw rewards collapses when every hard question hands back zero.

Guided Hybrid Policy Optimization (GHPO) fixes that by dropping small hints only on those stubborn items.

RL with verifiable rewards is a game of 1 for correct, 0 for wrong.

If the
Yaofang Liu (@stephenajason) 's Twitter Profile Photo

From early last year, our group has been thinking what’s the boundary between SFT and RL. Is SFT truly useless w.r.t developing reasoning ability? The intuition back then was no. And now, we give our new work that really reveals how SFT can benefit reasoning!! Please check our