Young (Agentic/acc)🤖 (@0x_cryptoyang) 's Twitter Profile
Young (Agentic/acc)🤖

@0x_cryptoyang

Cooking Crypto🤝AI | Doing about #ZKP #LLMs #Agentic etc.
Recap of the previous Day on AI Everything🌟

ID: 1342351121063366657

calendar_today25-12-2020 06:07:08

2,2K Tweet

5,5K Takipçi

2,2K Takip Edilen

Chenfeng_X (@chenfeng_x) 's Twitter Profile Photo

Happy to share that we have two papers got accepted by NeurIPS Conference 2025 as #Spotlight papers! 1. 👼Angles Don’t Lie: Unlocking Training-Efficient RL from a Model’s Own Signals TL;DR: Token angles—the model’s self-generated signals—can reveal how well it grasps the data. By

Happy to share that we have two papers got accepted by <a href="/NeurIPSConf/">NeurIPS Conference</a>  2025 as #Spotlight papers! 

1. 👼Angles Don’t Lie: Unlocking Training-Efficient RL from a Model’s Own Signals

TL;DR: Token angles—the model’s self-generated signals—can reveal how well it grasps the data. By
Xinyu Yang (@xinyu2ml) 's Twitter Profile Photo

These days, LoRA seems less prominent in mainstream discussions compared to full FT. However, the post from Thinking Machines highlights that LoRA can actually match full FT in real-world customization scenarios! One year ago, one of my previous works (arxiv.org/pdf/2412.06289)

These days, LoRA seems less prominent in mainstream discussions compared to full FT. However, the post from <a href="/thinkymachines/">Thinking Machines</a> highlights that LoRA can actually match full FT in real-world customization scenarios!

One year ago, one of my previous works (arxiv.org/pdf/2412.06289)
Yifan Zhang (@yifan_zhang_) 's Twitter Profile Photo

Excellent to see the new Tinker-docs from Thinking Machines, which confirm an inconsistency in the GRPO loss. We explored this issue in our prior work (complex-reasoning.github.io/RPG) and developed a more robust method with substantial performance improvements: • +12 absolute points

Excellent to see the new Tinker-docs from <a href="/thinkymachines/">Thinking Machines</a>, which confirm an inconsistency in the GRPO loss.

We explored this issue in our prior work (complex-reasoning.github.io/RPG) and developed a more robust method with substantial performance improvements: 
• +12 absolute points
Shizhe Diao (@shizhediao) 's Twitter Profile Photo

🚀 Introducing BroRL: Scaling Reinforcement Learning via Broadened Exploration When step-scaling hits a plateau, scale rollouts, not steps. BroRL takes reinforcement learning beyond saturation—reviving stalled models by expanding exploration with large-N rollouts. 👇 (1/n)

🚀 Introducing BroRL: Scaling Reinforcement Learning via Broadened Exploration

When step-scaling hits a plateau, scale rollouts, not steps.
BroRL takes reinforcement learning beyond saturation—reviving stalled models by expanding exploration with large-N rollouts.
👇 (1/n)
Chenfeng_X (@chenfeng_x) 's Twitter Profile Photo

🥳We’re releasing StreamDiffusionV2 for the live-stream community—from individual creators with one GPU to enterprise platforms with many. StreamDiffusionV2 is our follow-up to StreamDiffusion: #StreamDiffusion powered real products, but temporal consistency still bugged us.

Young (Agentic/acc)🤖 (@0x_cryptoyang) 's Twitter Profile Photo

GRPO is like PPO, but instead of chasing absolute rewards, it learns from relative performance within a group of samples. For each prompt, the model generates several outputs → scores them → and optimizes based on who did better relative to others, not the raw reward.

Zhepeng Cen (@zhepengcen) 's Twitter Profile Photo

🚀 Scaling RL to Pretraining Levels with Webscale-RL RL for LLMs has been bottlenecked by tiny datasets (<10B tokens) vs pretraining (>1T). Our Webscale-RL pipeline converts pretraining text into diverse RL-ready QA data — scaling RL to pretraining levels! All codes and

🚀 Scaling RL to Pretraining Levels with Webscale-RL

RL for LLMs has been bottlenecked by tiny datasets (&lt;10B tokens) vs pretraining (&gt;1T).
Our Webscale-RL pipeline converts pretraining text into diverse RL-ready QA data — scaling RL to pretraining levels!

All codes and