Kunhao Zheng @ ICLR 2025 (@kunhaoz) 's Twitter Profile
Kunhao Zheng @ ICLR 2025

@kunhaoz

ร‰cole Polytechnique X18, SJTU. Now in the amazing FAIR CodeGen @AIatMeta. Alumni: @Huggingface, Sea AI Lab, intern @openai

ID: 1087607633823952898

calendar_today22-01-2019 07:07:21

131 Tweet

538 Followers

529 Following

Delong Chen (้™ˆๅพท้พ™) (@delong0_0) 's Twitter Profile Photo

This is my first paper done at FAIR. We show that adaptive visual token segmentation, especially in subobject-level (i.e., subwords in images), enables VLMs to have a better and faster learning of image understanding! arxiv.org/pdf/2402.14327

This is my first paper done at FAIR. We show that adaptive visual token segmentation, especially in subobject-level (i.e., subwords in images), enables VLMs to have a better and faster learning of image understanding!
arxiv.org/pdf/2402.14327
Krunoslav Lehman Pavasovic (@krunolehman) 's Twitter Profile Photo

1/ Happy to share my first accepted paper as a PhD student at Meta and ร‰cole normale supรฉrieure | PSL which I will present at ICLR 2026: ๐Ÿ“š Our work proposes difFOCI, a novel rank-based objective for โœจbetter feature learningโœจ In collab with David Lopez-Paz, Giulio Biroli and Levent Sagun!

AI at Meta (@aiatmeta) 's Twitter Profile Photo

๐Ÿ“ท Hello Singapore! Meta is at #ICLR2025 EXPO ๐Ÿ“ท Meta will be in Singapore this week for #ICLR25! Stop by our booth to chat with our team or learn more about our latest research. Things to know: ๐Ÿ“ท Find us @ Booth #L03 (Rows 3-4, Columns L-M) in Hall 2. ๐Ÿ“ท We're sharing 50+

๐Ÿ“ท  Hello Singapore! Meta is at #ICLR2025 EXPO  ๐Ÿ“ท
Meta will be in Singapore this week for #ICLR25! Stop by our booth to chat with our team or learn more about our latest research.

Things to know:
๐Ÿ“ท Find us @ Booth #L03 (Rows 3-4, Columns L-M) in Hall 2.
๐Ÿ“ท We're sharing 50+
Kunhao Zheng @ ICLR 2025 (@kunhaoz) 's Twitter Profile Photo

#ICLR2025 Come say hi at our Fri 25 Apr poster session: "What Makes Large Language Models Reason in (Multi-Turn) Code Generation?" ๐Ÿ“โœจ ๐Ÿ“ Hall 3 + Hall 2B #263 ๐Ÿ•™ Fri 25 Apr, 10 a.m.โ€“12:30 p.m. +08 link: iclr.cc/virtual/2025/pโ€ฆ paper: arxiv.org/abs/2410.08105

#ICLR2025 
Come say hi at our Fri 25 Apr poster session:

"What Makes Large Language Models Reason in (Multi-Turn) Code Generation?" ๐Ÿ“โœจ

๐Ÿ“ Hall 3 + Hall 2B #263
๐Ÿ•™ Fri 25 Apr, 10 a.m.โ€“12:30 p.m. +08
link: iclr.cc/virtual/2025/pโ€ฆ
paper: arxiv.org/abs/2410.08105
Kunhao Zheng @ ICLR 2025 (@kunhaoz) 's Twitter Profile Photo

#ICLR2025 Come say hi at our Sat 26 Apr poster session: "The KoLMogorov Test: Compression by Code Generation" ๐Ÿ“โœจ ๐Ÿ“Hall 3 + Hall 2B #557 ๐Ÿ•™Sat 26 Apr 10 a.m. - 12:30 p.m. +08 repo: github.com/facebookresearโ€ฆ paper: arxiv.org/abs/2503.13992โ€ฆ

#ICLR2025  Come say hi at our Sat 26 Apr poster session: 

"The KoLMogorov Test: Compression by Code Generation" ๐Ÿ“โœจ

๐Ÿ“Hall 3 + Hall 2B #557
๐Ÿ•™Sat 26 Apr 10 a.m. - 12:30 p.m. +08

repo: github.com/facebookresearโ€ฆ
paper: arxiv.org/abs/2503.13992โ€ฆ
Yunzhen Feng (@feeelix_feng) 's Twitter Profile Photo

Check out our poster tmr at 10am at the ICLR Bidirectional Human-AI Alignment workshop! We cover how on-policy preference sampling can be biased and our optimal response sampling for human labeling. NYU Center for Data Science AI at Meta Julia Kempe Yaqi Duan x.com/feeelix_feng/sโ€ฆ

Check out our poster tmr at 10am at the ICLR Bidirectional Human-AI Alignment workshop! We cover how on-policy preference sampling can be biased and our optimal response sampling for human labeling.
<a href="/NYUDataScience/">NYU Center for Data Science</a>
<a href="/AIatMeta/">AI at Meta</a>
<a href="/KempeLab/">Julia Kempe</a>
<a href="/YaqiDuanPKU/">Yaqi Duan</a>
x.com/feeelix_feng/sโ€ฆ
Kunhao Zheng @ ICLR 2025 (@kunhaoz) 's Twitter Profile Photo

๐Ÿšจ Your RL only improves ๐—ฝ๐—ฎ๐˜€๐˜€@๐Ÿญ, not ๐—ฝ๐—ฎ๐˜€๐˜€@๐—ธ? ๐Ÿšจ Thatโ€™s not a bug โ€” itโ€™s a ๐—ณ๐—ฒ๐—ฎ๐˜๐˜‚๐—ฟ๐—ฒ ๐—ผ๐—ณ ๐˜๐—ต๐—ฒ ๐—ผ๐—ฏ๐—ท๐—ฒ๐—ฐ๐˜๐—ถ๐˜ƒ๐—ฒ youโ€™re optimizing. You get what you optimize for. If you want better pass@k, you need to optimize for pass@k at training time. ๐Ÿงต How?

๐Ÿšจ Your RL only improves ๐—ฝ๐—ฎ๐˜€๐˜€@๐Ÿญ, not ๐—ฝ๐—ฎ๐˜€๐˜€@๐—ธ? ๐Ÿšจ

Thatโ€™s not a bug โ€” itโ€™s a ๐—ณ๐—ฒ๐—ฎ๐˜๐˜‚๐—ฟ๐—ฒ ๐—ผ๐—ณ ๐˜๐—ต๐—ฒ ๐—ผ๐—ฏ๐—ท๐—ฒ๐—ฐ๐˜๐—ถ๐˜ƒ๐—ฒ youโ€™re optimizing.

You get what you optimize for. If you want better pass@k, you need to optimize for pass@k at training time.

๐Ÿงต How?
Kunhao Zheng @ ICLR 2025 (@kunhaoz) 's Twitter Profile Photo

โ„๏ธAndrew Zhaoโ„๏ธ Yeah we are doing it and itโ€™s called Soft Policy Optimization: arxiv.org/abs/2503.05453 It can learn from arbitrary on/off policy samples. TLDR it reparametrizes Q function by your LLM. An elegant property: Belleman equation satisfied by construction so not separate TD loss.

Kunhao Zheng @ ICLR 2025 (@kunhaoz) 's Twitter Profile Photo

Letโ€™s be clear. If you use schulman k3 estimator yโ€™all optimizing for another thing: the reverse KL. Funnily people think schulman k2 estimator is biased, but it gives the right gradient.

Mathurin Videau (@mathuvu_) 's Twitter Profile Photo

We present an Autoregressive U-Net that incorporates tokenization inside the model, pooling raw bytes into words then word-groups. AU-Net focuses most of its compute on building latent vectors that correspond to larger units of meaning. Joint work with Badr Youbi Idrissi 1/8

We present an Autoregressive U-Net that incorporates tokenization inside the model, pooling raw bytes into words then word-groups. AU-Net focuses most of its compute on building latent vectors that correspond to larger units of meaning.
Joint work with <a href="/byoubii/">Badr Youbi Idrissi</a> 1/8