Joan Cabezas (@josancamon19) 's Twitter Profile
Joan Cabezas

@josancamon19

Co-founder cifrato.ai (YC W25) | prev built @omedotme, tripplanner.ai

ID: 354608817

calendar_today14-08-2011 00:52:33

179 Tweet

203 Followers

240 Following

Joan Cabezas (@josancamon19) 's Twitter Profile Photo

gsm8k in fact might be too easy or potentially contaminated, regarding less steps for bigger models, 8B seems to saturate at 12/16 steps, whereas 14B continues to get gains (marginal) up to 30 steps, and smaller models peak at 40/50 steps, on lr/hp, didn't see any meaningful

gsm8k in fact might be too easy or potentially contaminated, regarding less steps for bigger models, 8B seems to saturate at 12/16 steps, whereas 14B continues to get gains (marginal) up to 30 steps, and smaller models peak at 40/50 steps, on lr/hp, didn't see any meaningful
Andrej Karpathy (@karpathy) 's Twitter Profile Photo

Excited to release new repo: nanochat! (it's among the most unhinged I've written). Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single,

Excited to release new repo: nanochat!
(it's among the most unhinged I've written).

Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single,
Joan Cabezas (@josancamon19) 's Twitter Profile Photo

next steps to get this right: 1. explore more complex (e.g. tool calling) RL behaviors, ditch gsm8k. 2. qwen contamination issues, use Gemma 1B 4B 12B 27B -pt. 3. use David Hall marin checkpoints on 8B to figure task X% = a*C-pt + b*C-RL, (a, b being the optimal ratios to %).

Joan Cabezas (@josancamon19) 's Twitter Profile Photo

interpretability folks should spend some time hiring cracked frontend engineers, tools available look like software from the 90's, what if it just looked like an interactive fMRI? so many ways to make this cool, cc Neel Nanda

interpretability folks should spend some time hiring cracked frontend engineers, tools available look like software from the 90's, what if it just looked like an interactive fMRI? so many ways to make this cool, cc <a href="/NeelNanda5/">Neel Nanda</a>
Devvrit (@devvrit_khatri) 's Twitter Profile Photo

Wish to build scaling laws for RL but not sure how to scale? Or what scales? Or would RL even scale predictably? We introduce: The Art of Scaling Reinforcement Learning Compute for LLMs

Wish to build scaling laws for RL but not sure how to scale? Or what scales? Or would RL even scale predictably?
We introduce: The Art of Scaling Reinforcement Learning Compute for LLMs
Joan Cabezas (@josancamon19) 's Twitter Profile Photo

"we estimate that your P(breaking flow) geometrically increases 10% every second that passes while you wait for agent response, with the exact threshold varying based on perceived complexity of the request. The arbitrary “flow window” we hold ourselves to is 5 seconds.". finally

"we estimate that your P(breaking flow) geometrically increases 10% every second that passes while you wait for agent response, with the exact threshold varying based on perceived complexity of the request. The arbitrary “flow window” we hold ourselves to is 5 seconds.".

finally
Felipe Chávez (@felipekiwi90) 's Twitter Profile Photo

Six months ago, we introduced Robot.com Today, we launch it 🎉 Over the past few years, we’ve quietly scaled from 300K to 1.7 million+ robotic tasks. 500+ real robots. Doing real work every day — delivering, moving, inspecting, and more. Here's the lineup: 1.