subho ghosh (@subhoghosh02) 's Twitter Profile
subho ghosh

@subhoghosh02

22. seeking local optima in ML & global maxima in life. training on life

ID: 1201157604212236288

linkhttp://Github.com/iGhoshSubho calendar_today01-12-2019 15:14:32

6,6K Tweet

1,1K Takipçi

407 Takip Edilen

Yunhao (Robin) Tang (@robinphysics) 's Twitter Profile Photo

Maybe to one's surprise, taking KL estimates as `kl_loss` to minimize does *not* enforce the KL. This implementation, however, is quite common in open source RL repos and recent research papers. In short: grad of an unbiased KL estimate is not an unbiased estimate of KL grad.

Maybe to one's surprise, taking KL estimates as `kl_loss` to minimize does *not* enforce the KL.

This implementation, however, is quite common in open source RL repos and recent research papers.

In short: grad of an unbiased KL estimate is not an unbiased estimate of KL grad.
Naga/Abhi (@nagasaiabhinay) 's Twitter Profile Photo

Too early to be sure but I'm trying to use optimal control ala RB Modulation but replacing consistency loss with reward signal as a kind of test time scaling technique. Baseline vs With reward model.

Too early to be sure but I'm trying to use optimal control ala RB Modulation but replacing consistency loss with reward signal as a kind of test time scaling technique.

Baseline vs With reward model.
Mathurin Massias (@mathusmassias) 's Twitter Profile Photo

New paper on the generalization of Flow Matching arxiv.org/abs/2506.03719 🤯 Why does flow matching generalize? Did you know that the flow matching target you're trying to learn **can only generate training points**? with Quentin Bertrand, Anne Gagneux & Rémi Emonet 👇👇👇

Naga/Abhi (@nagasaiabhinay) 's Twitter Profile Photo

Hmm, Still using the LAION Aesthetic reward on Flux. You can tell the difference, but it doesn't feel quite there yet. Will play around with couple of other reward models and their combinations. This is relative reward by the way. Much more stable vs absolute reward.

Hmm, Still using the LAION Aesthetic reward on Flux. You can tell the difference, but it doesn't feel quite there yet. Will play around with couple of other reward models and their combinations. This is relative reward by the way. Much more stable vs absolute reward.
Zhihao Jia (@jiazhihao) 's Twitter Profile Photo

One of the best ways to reduce LLM latency is by fusing all computation and communication into a single GPU megakernel. But writing megakernels by hand is extremely hard. 🚀Introducing Mirage Persistent Kernel (MPK), a compiler that automatically transforms LLMs into optimized

One of the best ways to reduce LLM latency is by fusing all computation and communication into a single GPU megakernel. But writing megakernels by hand is extremely hard.

🚀Introducing Mirage Persistent Kernel (MPK), a compiler that automatically transforms LLMs into optimized