Jinuk Kim (@jusjinuk1) 's Twitter Profile
Jinuk Kim

@jusjinuk1

CS PhD student @SeoulNatlUni. Previously Research Intern @Google & @Samsung

ID: 1482603102545285120

linkhttps://jinukkim.me calendar_today16-01-2022 06:38:37

220 Tweet

39 Followers

268 Following

Jinuk Kim (@jusjinuk1) 's Twitter Profile Photo

This matters a lot for making long-horizon RL tuning for LLMs scalable, especially if you're gpu-poor. In the long run, I agree with dr. jack morris (see x.com/jxmnop/status/…): Moving from post-training to training-aware methods is a proven recipe (e.g., GQA, QAT, Linear attention).

Jinuk Kim (@jusjinuk1) 's Twitter Profile Photo

An important discussion thread clarifying the use of importance sampling in GSPO versus GRPO; it would have been helpful if the Qwen team had included these explanations in their paper.

Agrim Gupta (@agrimgupta92) 's Twitter Profile Photo

3/ One emergent capability I find remarkable is long-term consistency, especially because we don’t use any explicit 3D representations or priors. Simply training the model to generate the next frame auto-regressively teaches it to maintain physical consistency across time

Cursor (@cursor_ai) 's Twitter Profile Photo

We've trained a new Tab model that is now the default in Cursor. This model makes 21% fewer suggestions than the previous model while having a 28% higher accept rate for the suggestions it makes. Learn more about how we improved Tab with online RL.

Andrej Karpathy (@karpathy) 's Twitter Profile Photo

Finally had a chance to listen through this pod with Sutton, which was interesting and amusing. As background, Sutton's "The Bitter Lesson" has become a bit of biblical text in frontier LLM circles. Researchers routinely talk about and ask whether this or that approach or idea

Jinuk Kim (@jusjinuk1) 's Twitter Profile Photo

An implementation that performs fast QLoRA-style RL training. The next step would be training the model with QAT based on RL objective, where the rollout uses the quantized model (thus, arguably on-policy). Inference optimization is becoming crucial for frontier model training.

Andrew Ng (@andrewyng) 's Twitter Profile Photo

Readers responded with both surprise and agreement last week when I wrote that the single biggest predictor of how rapidly a team makes progress building an AI agent lay in their ability to drive a disciplined process for evals (measuring the system’s performance) and error

George Grigorev (@iamgrigorev) 's Twitter Profile Photo

bro Andrej Karpathy literally re-implemented the entire lm-eval-harness in 2 Python files It's been very useful for my own repo and easy to adapt for SuperBPE case

bro <a href="/karpathy/">Andrej Karpathy</a> literally re-implemented the entire lm-eval-harness in 2 Python files
It's been very useful for my own repo and easy to adapt for SuperBPE case
Jinuk Kim (@jusjinuk1) 's Twitter Profile Photo

Kimi K2 think model; probably gpt-oss was trained the same way too. Frontier labs’ moat in foundation model training is fast system engineering that expands the surface and opens up more ideas.

Kimi K2 think model; probably gpt-oss was trained the same way too.

Frontier labs’ moat in foundation model training is fast system engineering that expands the surface and opens up more ideas.
Ricursive Intelligence (@ricursiveai) 's Twitter Profile Photo

Introducing Ricursive Intelligence, a frontier AI lab enabling a recursive self-improvement loop between AI and the chips that fuel it. Learn more at ricursive.com

Sam Altman (@sama) 's Twitter Profile Photo

Peter Steinberger is joining OpenAI to drive the next generation of personal agents. He is a genius with a lot of amazing ideas about the future of very smart agents interacting with each other to do very useful things for people. We expect this will quickly become core to our