Zhen Liu (@itsthezhen) 's Twitter Profile
Zhen Liu

@itsthezhen

Assistant Prof @ CUHK-SZ. PhD at @Mila_Quebec & @UMontreal, MS & BS at @GeorgiaTech and past visitor at @MPI_IS. Deep Learning / 3D vision / 3D Content Gen.

ID: 4157036115

linkhttp://itszhen.com calendar_today10-11-2015 14:43:10

84 Tweet

329 Takipçi

301 Takip Edilen

Dinghuai Zhang 张鼎怀 (@zdhnarsil) 's Twitter Profile Photo

Your verl & vllm is secrectly giving your off-policy (when using quantized rollout) and you should treat it as an off-policy problem! How? As a probabilistic guy we shud say (truncated) importance sampling 😀 Check Feng Yao 's tweet here!

Zhen Liu (@itsthezhen) 's Twitter Profile Photo

TL;DR: Meet BesiegeField—a playground where LLMs build, test, and refine machines from standard parts in real time. We tested agentic workflows and RLVR with top LLMs: even the strongest still show limits in compositional machine design. 🔗 besiegefield.github.io 🧵 below

TL;DR: Meet BesiegeField—a playground where LLMs build, test, and refine machines from standard parts in real time.

We tested agentic workflows and RLVR with top LLMs: even the strongest still show limits in compositional machine design.

🔗 besiegefield.github.io
🧵 below
Weiyang Liu (@besteuler) 's Twitter Profile Photo

🤯 Merging many finetuned LLMs into one model, effectively? Introducing Functional Dual Anchor (FDA), a new framework for model merging. 🚀 Current merging works poorly due to the underlying parameter conflicts. FDA shifts knowledge integration to the input-representation space

🤯 Merging many finetuned LLMs into one model, effectively? Introducing Functional Dual Anchor (FDA), a new framework for model merging.

🚀 Current merging works poorly due to the underlying parameter conflicts. FDA shifts knowledge integration to the input-representation space
Weiyang Liu (@besteuler) 's Twitter Profile Photo

🤩 This is awesome. When we are doing the agentic design project (besiegefield.github.io) using the Besiege game environment, we have to hack the game to get as much feedback as possible to do RL and stuff. However, I start to think differently after seeing the Genshin agent.