amar (@amarproduct) 's Twitter Profile
amar

@amarproduct

product @supermodel_ai. prev @microsoft

ID: 1879612550746406912

calendar_today15-01-2025 19:56:15

37 Tweet

31 Followers

269 Following

amar (@amarproduct) 's Twitter Profile Photo

real-time learning is a crucial component in building agents that autonomously improve at their intended task. I'm excited to see how your product evolves. congrats Andy Kasey Zhang

amar (@amarproduct) 's Twitter Profile Photo

I think they’re addressing different parts of the value chain. Uber cabs might become obsolete but they’ll continue as a distribution platform for companies like Waymo and Tesla to deploy their vehicles

amar (@amarproduct) 's Twitter Profile Photo

Vending-Bench by Andon Labs is a great example of practical eval design. Grounding model performance in real-world UX flows is what we need more of. andonlabs.com/evals/vending-…

amar (@amarproduct) 's Twitter Profile Photo

This is a great way to frame AI — not as magic, but as new leverage. When a new form of leverage emerges (like AI agents today), there’s a brief window where the output far exceeds the input before the crowd catches up and margins compress

Shizhe Diao (@shizhediao) 's Twitter Profile Photo

RLVR is powerful — but how do you train with multiple rewards effectively? 🤔 🎯GDPO (not GRPO) is coming. We introduce Group reward-Decoupled Normalization Policy Optimization (GDPO), a new multi-reward RL algorithm that consistently improves per-reward convergence over GRPO

RLVR is powerful — but how do you train with multiple rewards effectively? 🤔
🎯GDPO (not GRPO) is coming. 

We introduce Group reward-Decoupled Normalization Policy Optimization (GDPO), a new multi-reward RL algorithm that consistently improves per-reward convergence over GRPO
amar (@amarproduct) 's Twitter Profile Photo

this explains a lot. multi reward GRPO kind of felt unstable which makes sense considering it was designed for single-objective optimization. basically, summing rewards before normalization forces distinct groups to provide identical signals time to update the default :)