lucas (@_lukaemon) Twitter Tweets • TwiCopy

Denny Zhou

a month ago

Slides for my lecture “LLM Reasoning” at Stanford CS 25: dennyzhou.github.io/LLM-Reasoning-… Key points: 1. Reasoning in LLMs simply means generating a sequence of intermediate tokens before producing the final answer. Whether this resembles human reasoning is irrelevant. The crucial

thumb_up_off_alt2,2K

chat_bubble_outline22

repeat322

shareShare

Guangxuan Xiao

@guangxuan_xiao

15 days ago

I've written the full story of Attention Sinks — a technical deep-dive into how the mechanism was developed and how our research ended up being used in OpenAI's new OSS models. For those interested in the details: hanlab.mit.edu/blog/streaming…

thumb_up_off_alt895

chat_bubble_outline17

repeat114

shareShare

Andrej Karpathy

@karpathy

14 days ago

barry farkus ⏩️ I'm starting to do some of this too. I have a script that packages all of the files of my project (which isn't a giant repo and fits just fine) uses `files-to-prompt`, then I start new conversation, copy paste, and ask a question at the end, and manually select appropriate model.

thumb_up_off_alt886

chat_bubble_outline57

repeat27

shareShare

lucas

@_lukaemon

13 days ago

using claude code for computer usage is fun, context and token eating monster ...

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

lucas

@_lukaemon

13 days ago

can see the obvious different goal between politician and technologist in this exchange. imo, david's take is political. mostly driving stories based on selective partial facts. elon is red pilled.

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

lucas

@_lukaemon

13 days ago

1. ai would be better at most tasks than 99% humans 2. ai could be decent, loving, truth seeking partner they are not mutually exclusive future.

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

lucas

@_lukaemon

12 days ago

only 24% plus sub, 7% free user using reasoning model ... til: 1. consumer ai has first mover advantage and saturate at fairly low intelligence level. most people won't need a team of phd in the pocket, but it sounds cool. 2. basic intelligence at scale is strategically

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

lucas

@_lukaemon

12 days ago

this interview is so good youtube.com/watch?v=eY8pQM…

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

finbarr

@finbarrtimbers

9 days ago

I strongly agree with this

thumb_up_off_alt35

chat_bubble_outline1

repeat1

shareShare

Igor Babuschkin

@ibab

9 days ago

Today was my last day at xAI, the company that I helped start with Elon Musk in 2023. I still remember the day I first met Elon, we talked for hours about AI and what the future might hold. We both felt that a new AI company with a different kind of mission was needed. Building

thumb_up_off_alt20,20K

chat_bubble_outline970

repeat1,1K

shareShare

lucas

@_lukaemon

9 days ago

45mm stack height is too much. running shoes are running out of control ...

thumb_up_off_alt0

chat_bubble_outline1

repeat0

shareShare

Costa Huang

@vwxyzjn

9 days ago

Check out Jason’s new work on RL tricks. Lots of ablation studies comparing popular techniques like Clip Higher 🤩

thumb_up_off_alt79

chat_bubble_outline1

repeat5

shareShare

Hattie Zhou

@oh_that_hat

8 days ago

AI models “think” in two ways: - in the latent space over layers - in the token space over a sequence Latent space = natural talent, chain of thought = hard work. Just like for humans, hard work can get you far, but talent sets the ceiling. This is why pretraining can’t die.

thumb_up_off_alt620

chat_bubble_outline28

repeat52

shareShare

lucas

@_lukaemon

8 days ago

meaning, even current multiturn tool calling agent is a temporary scaffold, useful one though.

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Jacob Austin

@jacobaustin132

5 days ago

Today we're putting out an update to the JAX TPU book, this time on GPUs. How do GPUs work, especially compared to TPUs? How are they networked? And how does this affect LLM training? 1/n

thumb_up_off_alt3,3K

chat_bubble_outline36

repeat516

shareShare

lucas

@_lukaemon

a day ago

Lucas Beyer - Computer Vision in the Age of LLMs youtube.com/watch?v=kxO6AR… 'the grass is eating the horse'

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare