Young (Agentic/acc)🤖 (@0x_cryptoyang) Twitter Tweets • TwiCopy

Young (Agentic/acc)🤖

@0x_cryptoyang

3 months ago

Awesome work！

thumb_up_off_alt19

chat_bubble_outline6

repeat0

shareShare

Young (Agentic/acc)🤖

@0x_cryptoyang

3 months ago

amazing work！CWM is here👀

thumb_up_off_alt18

chat_bubble_outline9

repeat0

shareShare

Happy to share that we have two papers got accepted by NeurIPS Conference 2025 as #Spotlight papers! 1. 👼Angles Don’t Lie: Unlocking Training-Efficient RL from a Model’s Own Signals TL;DR: Token angles—the model’s self-generated signals—can reveal how well it grasps the data. By

Happy to share that we have two papers got accepted by <a href="/NeurIPSConf/">NeurIPS Conference</a> 2025 as #Spotlight papers!

1. 👼Angles Don’t Lie: Unlocking Training-Efficient RL from a Model’s Own Signals

TL;DR: Token angles—the model’s self-generated signals—can reveal how well it grasps the data. By

thumb_up_off_alt308

chat_bubble_outline18

repeat32

shareShare

Xinyu Yang

@xinyu2ml

2 months ago

These days, LoRA seems less prominent in mainstream discussions compared to full FT. However, the post from Thinking Machines highlights that LoRA can actually match full FT in real-world customization scenarios! One year ago, one of my previous works (arxiv.org/pdf/2412.06289)

These days, LoRA seems less prominent in mainstream discussions compared to full FT. However, the post from <a href="/thinkymachines/">Thinking Machines</a> highlights that LoRA can actually match full FT in real-world customization scenarios!

One year ago, one of my previous works (arxiv.org/pdf/2412.06289)

thumb_up_off_alt169

chat_bubble_outline5

repeat25

shareShare

Young (Agentic/acc)🤖

@0x_cryptoyang

2 months ago

Awesome! We've moved closer to the realization of world models. That’s the video Nano Banana moments!🚀

thumb_up_off_alt6

chat_bubble_outline7

repeat0

shareShare

Young (Agentic/acc)🤖

@0x_cryptoyang

2 months ago

Perhaps we should ask ByteDance for insights.haha

thumb_up_off_alt4

chat_bubble_outline3

repeat0

shareShare

Young (Agentic/acc)🤖

@0x_cryptoyang

2 months ago

吉卜力 🔜 Sora 2 Nano banana 🔜 Veo3 tiáo tiáo dà lù 🔜 World Model

thumb_up_off_alt5

chat_bubble_outline4

repeat0

shareShare

Yifan Zhang

@yifan_zhang_

2 months ago

Excellent to see the new Tinker-docs from Thinking Machines, which confirm an inconsistency in the GRPO loss. We explored this issue in our prior work (complex-reasoning.github.io/RPG) and developed a more robust method with substantial performance improvements: • +12 absolute points

Excellent to see the new Tinker-docs from <a href="/thinkymachines/">Thinking Machines</a>, which confirm an inconsistency in the GRPO loss.

We explored this issue in our prior work (complex-reasoning.github.io/RPG) and developed a more robust method with substantial performance improvements:
• +12 absolute points

thumb_up_off_alt182

chat_bubble_outline4

repeat21

shareShare

Yung-Sung Chuang

@yungsungchuang

2 months ago

Sign up this event if you’re interested in MetaCLIP2! Shang-Wen Li Hu Xu 🔥🔥🔥

thumb_up_off_alt54

chat_bubble_outline4

repeat6

shareShare

Shizhe Diao

@shizhediao

2 months ago

🚀 Introducing BroRL: Scaling Reinforcement Learning via Broadened Exploration When step-scaling hits a plateau, scale rollouts, not steps. BroRL takes reinforcement learning beyond saturation—reviving stalled models by expanding exploration with large-N rollouts. 👇 (1/n)

thumb_up_off_alt207

chat_bubble_outline19

repeat42

shareShare

Young (Agentic/acc)🤖

@0x_cryptoyang

2 months ago

Amazing！Not just the GPT, but the whole package.

thumb_up_off_alt7

chat_bubble_outline3

repeat0

shareShare

Chenfeng_X

@chenfeng_x

2 months ago

🥳We’re releasing StreamDiffusionV2 for the live-stream community—from individual creators with one GPU to enterprise platforms with many. StreamDiffusionV2 is our follow-up to StreamDiffusion: #StreamDiffusion powered real products, but temporal consistency still bugged us.

thumb_up_off_alt204

chat_bubble_outline11

repeat43

shareShare

Young (Agentic/acc)🤖

@0x_cryptoyang

2 months ago

GRPO is like PPO, but instead of chasing absolute rewards, it learns from relative performance within a group of samples. For each prompt, the model generates several outputs → scores them → and optimizes based on who did better relative to others, not the raw reward.

thumb_up_off_alt4

chat_bubble_outline3

repeat0

shareShare

Zhepeng Cen

@zhepengcen

2 months ago

🚀 Scaling RL to Pretraining Levels with Webscale-RL RL for LLMs has been bottlenecked by tiny datasets (<10B tokens) vs pretraining (>1T). Our Webscale-RL pipeline converts pretraining text into diverse RL-ready QA data — scaling RL to pretraining levels! All codes and