Tanishq Mathew Abraham, Ph.D. (@iscienceluvr) 's Twitter Profile
Tanishq Mathew Abraham, Ph.D.

@iscienceluvr

CEO @SophontAI |
PhD at 19 (2023) |
Founder, ex CEO @MedARC_AI |
ex Research Director Stability AI |
Biomed. engineer @ 14 |
TEDx talk➡bit.ly/3tpAuan

ID: 441465751

linkhttps://tanishq.ai calendar_today20-12-2011 03:45:50

16,16K Tweet

75,75K Takipçi

1,1K Takip Edilen

Tanishq Mathew Abraham, Ph.D. (@iscienceluvr) 's Twitter Profile Photo

Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning "we demonstrate that employing only two techniques, i.e., advantage normalization (group-level mean, batch-level std) and token-level loss aggregation, can unlock the learning capability of critic-free policies using

Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning

"we demonstrate that employing only two techniques, i.e., advantage normalization (group-level mean, batch-level std) and token-level loss aggregation, can unlock the learning capability of critic-free policies using