Kunhao Zheng @ ICLR 2025 (@kunhaoz) Twitter Tweets • TwiCopy

Kunhao Zheng @ ICLR 2025

@kunhaoz

+ Follow

École Polytechnique X18, SJTU. Now in the amazing FAIR CodeGen @AIatMeta. Alumni: @Huggingface, Sea AI Lab, intern @openai

ID: 1087607633823952898

calendar_today22-01-2019 07:07:21

131 Tweet

538 Followers

529 Following

Delong Chen (陈德龙)

@delong0_0

9 months ago

This is my first paper done at FAIR. We show that adaptive visual token segmentation, especially in subobject-level (i.e., subwords in images), enables VLMs to have a better and faster learning of image understanding! arxiv.org/pdf/2402.14327

thumb_up_off_alt59

chat_bubble_outline1

repeat13

shareShare

Krunoslav Lehman Pavasovic

@krunolehman

8 months ago

1/ Happy to share my first accepted paper as a PhD student at Meta and École normale supérieure | PSL which I will present at ICLR 2026: 📚 Our work proposes difFOCI, a novel rank-based objective for ✨better feature learning✨ In collab with David Lopez-Paz, Giulio Biroli and Levent Sagun!

thumb_up_off_alt41

chat_bubble_outline1

repeat13

shareShare

AI at Meta

@aiatmeta

8 months ago

📷 Hello Singapore! Meta is at #ICLR2025 EXPO 📷 Meta will be in Singapore this week for #ICLR25! Stop by our booth to chat with our team or learn more about our latest research. Things to know: 📷 Find us @ Booth #L03 (Rows 3-4, Columns L-M) in Hall 2. 📷 We're sharing 50+

thumb_up_off_alt168

chat_bubble_outline5

repeat69

shareShare

Kunhao Zheng @ ICLR 2025

@kunhaoz

8 months ago

#ICLR2025 Come say hi at our Fri 25 Apr poster session: "What Makes Large Language Models Reason in (Multi-Turn) Code Generation?" 📝✨ 📍 Hall 3 + Hall 2B #263 🕙 Fri 25 Apr, 10 a.m.–12:30 p.m. +08 link: iclr.cc/virtual/2025/p… paper: arxiv.org/abs/2410.08105

thumb_up_off_alt19

chat_bubble_outline0

repeat0

shareShare

Kunhao Zheng @ ICLR 2025

@kunhaoz

8 months ago

#ICLR2025 Come say hi at our Sat 26 Apr poster session: "The KoLMogorov Test: Compression by Code Generation" 📝✨ 📍Hall 3 + Hall 2B #557 🕙Sat 26 Apr 10 a.m. - 12:30 p.m. +08 repo: github.com/facebookresear… paper: arxiv.org/abs/2503.13992…

thumb_up_off_alt16

chat_bubble_outline0

repeat2

shareShare

Yunzhen Feng

@feeelix_feng

8 months ago

Check out our poster tmr at 10am at the ICLR Bidirectional Human-AI Alignment workshop! We cover how on-policy preference sampling can be biased and our optimal response sampling for human labeling. NYU Center for Data Science AI at Meta Julia Kempe Yaqi Duan x.com/feeelix_feng/s…

thumb_up_off_alt20

chat_bubble_outline1

repeat7

shareShare

Kunhao Zheng @ ICLR 2025

@kunhaoz

8 months ago

🚨 Your RL only improves 𝗽𝗮𝘀𝘀@𝟭, not 𝗽𝗮𝘀𝘀@𝗸? 🚨 That’s not a bug — it’s a 𝗳𝗲𝗮𝘁𝘂𝗿𝗲 𝗼𝗳 𝘁𝗵𝗲 𝗼𝗯𝗷𝗲𝗰𝘁𝗶𝘃𝗲 you’re optimizing. You get what you optimize for. If you want better pass@k, you need to optimize for pass@k at training time. 🧵 How?

thumb_up_off_alt823

chat_bubble_outline12

repeat141

shareShare

Kunhao Zheng @ ICLR 2025

@kunhaoz

8 months ago

Come meet us at the Human-AI Alignment workshop and talk about this fancy sampling scheme

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Kunhao Zheng @ ICLR 2025

@kunhaoz

7 months ago

❄️Andrew Zhao❄️ Yeah we are doing it and it’s called Soft Policy Optimization: arxiv.org/abs/2503.05453 It can learn from arbitrary on/off policy samples. TLDR it reparametrizes Q function by your LLM. An elegant property: Belleman equation satisfied by construction so not separate TD loss.

thumb_up_off_alt12

chat_bubble_outline0

repeat3

shareShare

Kunhao Zheng @ ICLR 2025

@kunhaoz

7 months ago

Emotional Replica.

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Kunhao Zheng @ ICLR 2025

@kunhaoz

7 months ago

thumb_up_off_alt4

chat_bubble_outline1

repeat0

shareShare

yobibyte

@y0b1byte

6 months ago

another good one!

thumb_up_off_alt436

chat_bubble_outline1

repeat42

shareShare

Kunhao Zheng @ ICLR 2025

@kunhaoz

6 months ago

Let’s be clear. If you use schulman k3 estimator y’all optimizing for another thing: the reverse KL. Funnily people think schulman k2 estimator is biased, but it gives the right gradient.

thumb_up_off_alt8

chat_bubble_outline0

repeat0

shareShare

Mathurin Videau

@mathuvu_

6 months ago

We present an Autoregressive U-Net that incorporates tokenization inside the model, pooling raw bytes into words then word-groups. AU-Net focuses most of its compute on building latent vectors that correspond to larger units of meaning. Joint work with Badr Youbi Idrissi 1/8

thumb_up_off_alt189

chat_bubble_outline14

repeat47

shareShare