Sebastian Raschka (@rasbt) 's Twitter Profile
Sebastian Raschka

@rasbt

ML/AI researcher & former stats professor turned LLM research engineer. Author of "Build a Large Language Model From Scratch" (amzn.to/4fqvn0D).

ID: 865622395

linkhttps://sebastianraschka.com calendar_today07-10-2012 02:06:16

17,17K Tweet

326,326K Takipçi

1,1K Takip Edilen

Louie_Artifex (@virosa) 's Twitter Profile Photo

I'm 3 weeks into a book study on Sebastian Raschka's "How to Build an LLM from Scratch", and this book is incredible. The github repo contains Jupyter notebooks, and there are (free!) videos that augment the book. I am super impressed with the quality all around.

Sebastian Raschka (@rasbt) 's Twitter Profile Photo

Wrote up a more detailed analysis of gpt-oss, how far we've come since GPT-2, and how it compares to Qwen3: magazine.sebastianraschka.com/p/from-gpt-2-t…

Percy Liang (@percyliang) 's Twitter Profile Photo

GPT-5 and GPT-5 mini added to HELM capabilities v1.12.0. Interestingly, GPT-5 mini tops the leaderboard ahead of GPT-5 because on Omni-MATH, GPT-5 uses more reasoning tokens (and is hard to control) and hits our reasoning token budget of 14096. Doing fair evals is tricky!

GPT-5 and GPT-5 mini added to HELM capabilities v1.12.0. Interestingly, GPT-5 mini tops the leaderboard ahead of GPT-5 because on Omni-MATH, GPT-5 uses more reasoning tokens (and is hard to control) and hits our reasoning token budget of 14096. Doing fair evals is tricky!
ℏεsam (@hesamation) 's Twitter Profile Photo

learn how to build an LLM from scratch, honestly. Sebastian Raschka's repo is really a gem. it has notebooks with diagrams and explanations that will teach you 100% of: > attention mechanism > implementing a GPT model > pretraining and fine-tuning my top recommendation for studying LLMs.

learn how to build an LLM from scratch, honestly.
<a href="/rasbt/">Sebastian Raschka</a>'s repo is really a gem. it has notebooks with diagrams and explanations that will teach you 100% of:
&gt; attention mechanism
&gt; implementing a GPT model
&gt; pretraining and fine-tuning
my top recommendation for studying LLMs.
Sebastian Raschka (@rasbt) 's Twitter Profile Photo

It’s so interesting to see both communities swapping ideas (and maybe converge?): -> NLP towards text diffusion model -> CV towards autoregressive image gen

Sebastian Raschka (@rasbt) 's Twitter Profile Photo

I love this upgrade. Lightning AI is my go-to cloud compute platform due to its user-friendliness (great UI, persistent environment, multi-GPU and multi-node support, etc), and now the prices are also really great. An A100 for $1.55/hour through Lambda Labs or an H100 for $2.70

I love this upgrade. Lightning AI is my go-to cloud compute platform due to its user-friendliness (great UI, persistent environment, multi-GPU and multi-node support, etc), and now the prices are also really great.

An A100 for $1.55/hour through Lambda Labs or an H100 for $2.70
Sebastian Raschka (@rasbt) 's Twitter Profile Photo

Pretty cool that they open sourced the actual full-sized production model. Here’s the Grok 2.5 architecture overview next to a roughly similarly sized Qwen3 model. The MoE residual is quite interesting. Kind of like a shared expert. I don't think I've seen this setup before.

Pretty cool that they open sourced the actual full-sized production model. 
Here’s the Grok 2.5 architecture overview next to a roughly similarly sized Qwen3 model.
The MoE residual is quite interesting. Kind of like a shared expert. I don't think I've seen this setup before.