Sebastian Raschka (@rasbt) Twitter Tweets • TwiCopy

Sebastian Raschka

@rasbt

+ Follow

ML/AI researcher & former stats professor turned LLM research engineer. Author of "Build a Large Language Model From Scratch" (amzn.to/4fqvn0D).

ID: 865622395

linkhttps://sebastianraschka.com calendar_today07-10-2012 02:06:16

17,17K Tweet

326,326K Takipçi

1,1K Takip Edilen

Louie_Artifex

@virosa

a month ago

I'm 3 weeks into a book study on Sebastian Raschka's "How to Build an LLM from Scratch", and this book is incredible. The github repo contains Jupyter notebooks, and there are (free!) videos that augment the book. I am super impressed with the quality all around.

thumb_up_off_alt144

chat_bubble_outline5

repeat13

shareShare

Sebastian Raschka

@rasbt

a month ago

Wrote up a more detailed analysis of gpt-oss, how far we've come since GPT-2, and how it compares to Qwen3: magazine.sebastianraschka.com/p/from-gpt-2-t…

thumb_up_off_alt793

chat_bubble_outline24

repeat157

shareShare

Dimitris Papailiopoulos

@dimitrispapail

a month ago

Walls are good. Last time AI hit a wall, test time compute was invented.

thumb_up_off_alt294

chat_bubble_outline20

repeat15

shareShare

Percy Liang

@percyliang

a month ago

GPT-5 and GPT-5 mini added to HELM capabilities v1.12.0. Interestingly, GPT-5 mini tops the leaderboard ahead of GPT-5 because on Omni-MATH, GPT-5 uses more reasoning tokens (and is hard to control) and hits our reasoning token budget of 14096. Doing fair evals is tricky!

thumb_up_off_alt108

chat_bubble_outline19

repeat12

shareShare

ℏεsam

@hesamation

25 days ago

learn how to build an LLM from scratch, honestly. Sebastian Raschka's repo is really a gem. it has notebooks with diagrams and explanations that will teach you 100% of: > attention mechanism > implementing a GPT model > pretraining and fine-tuning my top recommendation for studying LLMs.

learn how to build an LLM from scratch, honestly.
<a href="/rasbt/">Sebastian Raschka</a>'s repo is really a gem. it has notebooks with diagrams and explanations that will teach you 100% of:
> attention mechanism
> implementing a GPT model
> pretraining and fine-tuning
my top recommendation for studying LLMs.

thumb_up_off_alt3,3K

chat_bubble_outline26

repeat510

shareShare

Sebastian Raschka

@rasbt

24 days ago

I have a soft spot for good math books! 📚

thumb_up_off_alt154

chat_bubble_outline9

repeat14

shareShare

Sebastian Raschka

@rasbt

21 days ago

It’s so interesting to see both communities swapping ideas (and maybe converge?): -> NLP towards text diffusion model -> CV towards autoregressive image gen

thumb_up_off_alt306

chat_bubble_outline10

repeat30

shareShare

Sebastian Raschka

@rasbt

14 days ago

I love this upgrade. Lightning AI is my go-to cloud compute platform due to its user-friendliness (great UI, persistent environment, multi-GPU and multi-node support, etc), and now the prices are also really great. An A100 for $1.55/hour through Lambda Labs or an H100 for $2.70

thumb_up_off_alt663

chat_bubble_outline20

repeat59

shareShare

Sebastian Raschka

@rasbt

14 days ago

Pretty cool that they open sourced the actual full-sized production model. Here’s the Grok 2.5 architecture overview next to a roughly similarly sized Qwen3 model. The MoE residual is quite interesting. Kind of like a shared expert. I don't think I've seen this setup before.

thumb_up_off_alt1,1K

chat_bubble_outline41

repeat196

shareShare