Kyle Corbitt (@corbtt) Twitter Tweets • TwiCopy

repeat7

account_circle

Kyle Corbitt

3 weeks ago

Intentionally not bothering to refactor/fix any tech debt for the next 2 months. GPT-5 will be able to just cleanly rewrite my codebase in one fell swoop right?

thumb_up_off_alt18

account_circle

Kyle Corbitt

3 weeks ago

Still pulling the data together but seems like newest GPT-4-turbo is a bit worse on average on our evals than the previous gpt-4-0125-preview. Will post data tomorrow probably.

thumb_up_off_alt8

account_circle

Kyle Corbitt

3 weeks ago

Making tool calls a first-class citizen in the OpenAI API was a mistake. JSON mode can do everything tool calls can and more, and is conceptually simpler.

thumb_up_off_alt31

repeat1

account_circle

Kyle Corbitt

3 weeks ago

If you want to try out the new Llama 3 models when they drop next week the best way to do so is to get your dataset uploaded and ready to go on OpenPipe. We will have fine-tuning and inference live ASAP after it's live.

thumb_up_off_alt36

repeat7

account_circle

Kyle Corbitt

3 weeks ago

The Information reports that the timeline for releasing the smallest Llama 3 variants has moved up to next week!

This prob means Meta has found their 7B beats Mistral 7B on benchmarks (otherwise they wouldn't give it a dedicated release). Let's go! 🚀

theinformation.com/articles/meta-…

thumb_up_off_alt8

account_circle

Kyle Corbitt

3 weeks ago

Potentially important development in parameter-efficient-fine-tuning. Much smaller parameter count than LoRA, which translates directly to smaller overhead at serving time and shorter training time. Coupled with (claimed) higher perf!

As always, need to verify it reproduces and…

thumb_up_off_alt6

account_circle

Kyle Corbitt

3 weeks ago

I run a fine-tuning company and this is what I tell everyone.

Prompting is for 0 to 1. Fine tuning is for 1 to 100.

account_circle

Kyle Corbitt

3 weeks ago

2025: it's now considered good manners to add subtle typos and grammar errors to your emails. signals a real human spent time on it.

2026: all frontier models are now RLHF'd to add typos and grammar errors.

thumb_up_off_alt19

repeat2

account_circle

Kyle Corbitt

3 weeks ago

This new optimizer is potentially a major breakthrough for fine-tuning in production.

Why?

You no longer need to choose a learning rate schedule that tails off to 0. That makes continual fine-tuning on new production data much less fraught, since you don't need to re-warm your…

thumb_up_off_alt23

repeat1

account_circle

Kyle Corbitt

4 weeks ago

'Anything worth doing is worth doing r̶i̶g̶h̶t̶ in a half-a** way so you can get to market fast and iterate from there'
-- any actually successful SaaS founder in a competitive industry

thumb_up_off_alt12

repeat2

account_circle

Kyle Corbitt

4 weeks ago

An AI-empowered employee has a vastly higher skill floor than a non-AI-empowered employee.

Just asked an engineer with 0 marketing experience to set up our Hubspot so to send a product newsletter to all our current and future users. Pre-AI, would not have been worth the ramp to…

thumb_up_off_alt25

account_circle

Jeremy Howard

@jeremyphoward

1 month ago

Kyle Corbitt The reason people do most things in model training is because that's what everyone else does. Everyone else does it because the first paper in the sequence did it. That paper did it because a PhD student had some code for that handy.

thumb_up_off_alt14

repeat4

account_circle

Kyle Corbitt

1 month ago

Hard to overstate how big a deal this is for fine-tuning: with existing methods, you have to know *ahead of time* how many epochs you want to train for.

This training technique, if true, would let you just keep evaling checkpoints at epoch 1, 3, 5, etc. until it's good enough!…

thumb_up_off_alt35

repeat3

account_circle

Kyle Corbitt

1 month ago

Really enjoyed this conversation with the Cerebral Valley team. They've done a fantastic job of cultivating the AI community in SF. 🙂

thumb_up_off_alt4

repeat3

account_circle

Kyle Corbitt

1 month ago

Anyone know how much perf you gain from quantization on optimized hardware? Like on an H100 do you get higher throughput with a 13B model in FP8 or a 7B model in BF16?

thumb_up_off_alt3

account_circle

Kyle Corbitt

1 month ago

imo the industry lost a lot when we switched from few-shot prompting with GPT-3 to instruction-prompting with GPT-3.5 and 4. A few good examples can guide a model much more strongly than written instructions.

Expecting a big comeback here.

thumb_up_off_alt96

repeat6

account_circle

Kyle Corbitt

1 month ago

For a long time I tried to come up with some clever new insight every time I was asked to give a conference/meetup talk.

Eventually I realized that just giving the same talk over and over (as long as the audience doesn't fully overlap) is the boring-but-optimal solution.

thumb_up_off_alt6