Jinuk Kim (@jusjinuk1) Twitter Tweets • TwiCopy

Jinuk Kim

9 months ago

This matters a lot for making long-horizon RL tuning for LLMs scalable, especially if you're gpu-poor. In the long run, I agree with dr. jack morris (see x.com/jxmnop/status/…): Moving from post-training to training-aware methods is a proven recipe (e.g., GQA, QAT, Linear attention).

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

will brown

@willccbb

9 months ago

Rohan Pandey envs which codify domain-specific knowledge + eval criteria which isn’t readily apparent from web browsing

thumb_up_off_alt55

chat_bubble_outline2

repeat1

shareShare

Jinuk Kim

@jusjinuk1

9 months ago

An important discussion thread clarifying the use of importance sampling in GSPO versus GRPO; it would have been helpful if the Qwen team had included these explanations in their paper.

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Agrim Gupta

@agrimgupta92

9 months ago

3/ One emergent capability I find remarkable is long-term consistency, especially because we don’t use any explicit 3D representations or priors. Simply training the model to generate the next frame auto-regressively teaches it to maintain physical consistency across time

thumb_up_off_alt278

chat_bubble_outline4

repeat34

shareShare

Cursor

@cursor_ai

7 months ago

We've trained a new Tab model that is now the default in Cursor. This model makes 21% fewer suggestions than the previous model while having a 28% higher accept rate for the suggestions it makes. Learn more about how we improved Tab with online RL.

thumb_up_off_alt2,2K

chat_bubble_outline126

repeat174

shareShare

Richard Sutton

@richardssutton

7 months ago

Dwarkesh and I had a frank exchange of views. I hope we moved the conversation forward. Dwarkesh is a true gentleman.

thumb_up_off_alt3,3K

chat_bubble_outline78

repeat227

shareShare

Andrej Karpathy

@karpathy

7 months ago

Finally had a chance to listen through this pod with Sutton, which was interesting and amusing. As background, Sutton's "The Bitter Lesson" has become a bit of biblical text in frontier LLM circles. Researchers routinely talk about and ask whether this or that approach or idea

thumb_up_off_alt4,4K

chat_bubble_outline217

repeat522

shareShare

Jinuk Kim

@jusjinuk1

6 months ago

An implementation that performs fast QLoRA-style RL training. The next step would be training the model with QAT based on RL objective, where the rollout uses the quantized model (thus, arguably on-policy). Inference optimization is becoming crucial for frontier model training.

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Andrew Ng

@andrewyng

6 months ago

Readers responded with both surprise and agreement last week when I wrote that the single biggest predictor of how rapidly a team makes progress building an AI agent lay in their ability to drive a disciplined process for evals (measuring the system’s performance) and error

thumb_up_off_alt1,1K

chat_bubble_outline84

repeat290

shareShare

George Grigorev

@iamgrigorev

6 months ago

bro Andrej Karpathy literally re-implemented the entire lm-eval-harness in 2 Python files It's been very useful for my own repo and easy to adapt for SuperBPE case

bro <a href="/karpathy/">Andrej Karpathy</a> literally re-implemented the entire lm-eval-harness in 2 Python files
It's been very useful for my own repo and easy to adapt for SuperBPE case

thumb_up_off_alt1,1K

chat_bubble_outline13

repeat48

shareShare

Jinuk Kim

@jusjinuk1

6 months ago

Kimi K2 think model; probably gpt-oss was trained the same way too. Frontier labs’ moat in foundation model training is fast system engineering that expands the surface and opens up more ideas.

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Ricursive Intelligence

@ricursiveai

5 months ago

Introducing Ricursive Intelligence, a frontier AI lab enabling a recursive self-improvement loop between AI and the chips that fuel it. Learn more at ricursive.com

thumb_up_off_alt827

chat_bubble_outline37

repeat121

shareShare

Bill D'Alessandro

@billda

3 months ago

x.com/i/article/2017…

thumb_up_off_alt689

chat_bubble_outline32

repeat67

shareShare

Sam Altman

@sama

2 months ago

Peter Steinberger is joining OpenAI to drive the next generation of personal agents. He is a genius with a lot of amazing ideas about the future of very smart agents interacting with each other to do very useful things for people. We expect this will quickly become core to our

thumb_up_off_alt13,13K

chat_bubble_outline1,1K

repeat1,1K

shareShare