lucas (@_lukaemon) 's Twitter Profile
lucas

@_lukaemon

ai researcher. maximizing rabbit holes ...

ID: 1284342578683318273

linkhttps://lucasshen.org/ calendar_today18-07-2020 04:21:56

5,5K Tweet

610 Followers

4,4K Following

Denny Zhou (@denny_zhou) 's Twitter Profile Photo

Slides for my lecture “LLM Reasoning” at Stanford CS 25: dennyzhou.github.io/LLM-Reasoning-… Key points: 1. Reasoning in LLMs simply means generating a sequence of intermediate tokens before producing the final answer. Whether this resembles human reasoning is irrelevant. The crucial

Guangxuan Xiao (@guangxuan_xiao) 's Twitter Profile Photo

I've written the full story of Attention Sinks — a technical deep-dive into how the mechanism was developed and how our research ended up being used in OpenAI's new OSS models. For those interested in the details: hanlab.mit.edu/blog/streaming…

I've written the full story of Attention Sinks — a technical deep-dive into how the mechanism was developed and how our research ended up being used in OpenAI's new OSS models.

For those interested in the details:
hanlab.mit.edu/blog/streaming…
Andrej Karpathy (@karpathy) 's Twitter Profile Photo

barry farkus ⏩️ I'm starting to do some of this too. I have a script that packages all of the files of my project (which isn't a giant repo and fits just fine) uses `files-to-prompt`, then I start new conversation, copy paste, and ask a question at the end, and manually select appropriate model.

lucas (@_lukaemon) 's Twitter Profile Photo

can see the obvious different goal between politician and technologist in this exchange. imo, david's take is political. mostly driving stories based on selective partial facts. elon is red pilled.

lucas (@_lukaemon) 's Twitter Profile Photo

1. ai would be better at most tasks than 99% humans 2. ai could be decent, loving, truth seeking partner they are not mutually exclusive future.

lucas (@_lukaemon) 's Twitter Profile Photo

only 24% plus sub, 7% free user using reasoning model ... til: 1. consumer ai has first mover advantage and saturate at fairly low intelligence level. most people won't need a team of phd in the pocket, but it sounds cool. 2. basic intelligence at scale is strategically

Igor Babuschkin (@ibab) 's Twitter Profile Photo

Today was my last day at xAI, the company that I helped start with Elon Musk in 2023. I still remember the day I first met Elon, we talked for hours about AI and what the future might hold. We both felt that a new AI company with a different kind of mission was needed. Building

Costa Huang (@vwxyzjn) 's Twitter Profile Photo

Check out Jason’s new work on RL tricks. Lots of ablation studies comparing popular techniques like Clip Higher 🤩

Hattie Zhou (@oh_that_hat) 's Twitter Profile Photo

AI models “think” in two ways: - in the latent space over layers - in the token space over a sequence Latent space = natural talent, chain of thought = hard work. Just like for humans, hard work can get you far, but talent sets the ceiling. This is why pretraining can’t die.

Jacob Austin (@jacobaustin132) 's Twitter Profile Photo

Today we're putting out an update to the JAX TPU book, this time on GPUs. How do GPUs work, especially compared to TPUs? How are they networked? And how does this affect LLM training? 1/n

Today we're putting out an update to the JAX TPU book, this time on GPUs. How do GPUs work, especially compared to TPUs? How are they networked? And how does this affect LLM training? 1/n