Cameron R. Wolfe, Ph.D. (@cwolferesearch) 's Twitter Profile
Cameron R. Wolfe, Ph.D.

@cwolferesearch

Research @Netflix • Writer @ Deep (Learning) Focus • PhD @optimalab1 • I make AI understandable

ID: 1425585940542763010

linkhttps://cameronrwolfe.me calendar_today11-08-2021 22:32:35

3,3K Tweet

26,26K Takipçi

674 Takip Edilen

Sairam Sundaresan (@dsaience) 's Twitter Profile Photo

I can't believe I'm saying this - I'm officially a published author :D After three years, my first book is out. "AI for the Rest of Us" with Bloomsbury Academic is finally in the world. I wrote it because I watched too many people get left behind in AI conversations. The gap

I can't believe I'm saying this - I'm officially a published author :D

After three years, my first book is out.

"AI for the Rest of Us" with <a href="/BloomsburyAcad/">Bloomsbury Academic</a>  is finally in the world.

I wrote it because I watched too many people get left behind in AI conversations.

The gap
Bloomsbury Academic (@bloomsburyacad) 's Twitter Profile Photo

"Through clever storytelling and illustration, [Sundaresan] brings technical concepts to life[.]" — Dr. Cameron R. Wolfe, Senior Research Scientist at Netflix (Cameron R. Wolfe, Ph.D.) Learn more: bit.ly/42ZCs4z Sairam Sundaresan

"Through clever storytelling and illustration, [Sundaresan] brings technical concepts to life[.]" — Dr. Cameron R. Wolfe, Senior Research Scientist at Netflix (<a href="/cwolferesearch/">Cameron R. Wolfe, Ph.D.</a>)

Learn more: bit.ly/42ZCs4z <a href="/DSaience/">Sairam Sundaresan</a>
Cameron R. Wolfe, Ph.D. (@cwolferesearch) 's Twitter Profile Photo

The value of RL is very clearly / nicely articulated by DeepSeekMath… - RL enhances maj@k (majority vote), but not pass@k. - RL boosts the probability of correct completions that are already in top-k. - RL does NOT clearly enhance model capabilities.

The value of RL is very clearly / nicely articulated by DeepSeekMath…

- RL enhances maj@k (majority vote), but not pass@k.
- RL boosts the probability of correct completions that are already in top-k.
- RL does NOT clearly enhance model capabilities.
Cameron R. Wolfe, Ph.D. (@cwolferesearch) 's Twitter Profile Photo

assistive coding tools definitely make me more productive, but the pattern isn't uniform. biggest productivity boost comes later in the day / at night when I'm mentally exhausted. LLMs lower the barrier to entry for getting extra work done. validating or iterating on code with an

Cameron R. Wolfe, Ph.D. (@cwolferesearch) 's Twitter Profile Photo

The memory folding mechanism proposed in this paper is great. It makes sense that agents should spend time explicitly compressing their memory into a semantic / organized format to avoid context explosion. Worth mentioning though that memory compression / retention in agents

The memory folding mechanism proposed in this paper is great. It makes sense that agents should spend time explicitly compressing their memory into a semantic / organized format to avoid context explosion.

Worth mentioning though that memory compression / retention in agents
Cameron R. Wolfe, Ph.D. (@cwolferesearch) 's Twitter Profile Photo

The next AI Agents in Production conference is on November 18th. For those interested in the practical side of LLMs / agents, this is a good event to attend. Some highlights: - Completely free. - Everything can be viewed online. - Good talks from top companies (OAI, GDM, Meta,

The next AI Agents in Production conference is on November 18th. For those interested in the practical side of LLMs / agents, this is a good event to attend. Some highlights:

- Completely free.
- Everything can be viewed online.
- Good talks from top companies (OAI, GDM, Meta,
Cameron R. Wolfe, Ph.D. (@cwolferesearch) 's Twitter Profile Photo

This is (in my opinion) one of the top-3 most useful books to be written on LLMs. I highly recommend reading / buying it. I've personally read it >10 times since Nathan started writing it.

Cameron R. Wolfe, Ph.D. (@cwolferesearch) 's Twitter Profile Photo

Generalized Advantage Estimation (GAE)–used in PPO–is one of the most complicated aspects of reinforcement learning (RL). Here’s how it works and how we can implement it… The advantage tells us how much better a given action is compared to the average action in a given state:

Generalized Advantage Estimation (GAE)–used in PPO–is one of the most complicated aspects of reinforcement learning (RL). Here’s how it works and how we can implement it…

The advantage tells us how much better a given action is compared to the average action in a given state:
Cameron R. Wolfe, Ph.D. (@cwolferesearch) 's Twitter Profile Photo

The Olmo technical reports / artifacts are by far the most useful resource for those working on LLMs outside of closed frontier labs. You can read the papers, read the code, look at the data, and even train the models yourself. No other resource provides this level of detail, and

Cameron R. Wolfe, Ph.D. (@cwolferesearch) 's Twitter Profile Photo

Interesting note from Olmo-3 that KL divergence is excluded from GRPO loss. This is becoming a standard choice for reasoning / RL training pipelines, and it doesn't seem to cause training instability. Yet another reminder that RL for LLMs very different than traditional DeepRL.

Interesting note from Olmo-3 that KL divergence is excluded from GRPO loss. This is becoming a standard choice for reasoning / RL training pipelines, and it doesn't seem to cause training instability. Yet another reminder that RL for LLMs very different than traditional DeepRL.
Cameron R. Wolfe, Ph.D. (@cwolferesearch) 's Twitter Profile Photo

The original PPO-based RLHF pipeline had 4 model copies: 1. Policy 2. Reference 3. Critic 4. Reward Model Recent GRPO-based RLVR pipelines have eliminated all of these models except for the policy. - The critic is no longer needed because values are estimated from group

The original PPO-based RLHF pipeline had 4 model copies:

1. Policy
2. Reference
3. Critic
4. Reward Model

Recent GRPO-based RLVR pipelines have eliminated all of these models except for the policy.

- The critic is no longer needed because values are estimated from group
Cameron R. Wolfe, Ph.D. (@cwolferesearch) 's Twitter Profile Photo

The swarm optimization approach used by Olmo-3 to discover good pretraining data mixtures is super cool. It runs a guided search over possible data mixtures by: 1. Randomly sampling a bunch of mixtures. 2. Training small-scale proxy models on these mixtures. 3. Evaluating the

The swarm optimization approach used by Olmo-3 to discover good pretraining data mixtures is super cool. 

It runs a guided search over possible data mixtures by:

1. Randomly sampling a bunch of mixtures.
2. Training small-scale proxy models on these mixtures.
3. Evaluating the