Alek Dimitriev (@tensor_rotator) 's Twitter Profile
Alek Dimitriev

@tensor_rotator

Inference @Anthropic, prev Gemini @Google, prev prev PhD @UTAustin

ID: 727974502404083712

linkhttp://alekdimi.github.io calendar_today04-05-2016 21:33:41

377 Tweet

309 Followers

1,1K Following

Jiacheng Liu (@liujc1998) 's Twitter Profile Photo

Ever wondered what CAN'T be transformed by Transformers? 🪨 I wrote a fun blog post on finding "fixed points" of your LLMs. If you prompt it with a fixed point token, the LLM is gonna decode it repeatedly forever, guaranteed. There's some connection with LLMs' repetition issue.

Ever wondered what CAN'T be transformed by Transformers? 🪨

I wrote a fun blog post on finding "fixed points" of your LLMs. If you prompt it with a fixed point token, the LLM is gonna decode it repeatedly forever, guaranteed.

There's some connection with LLMs' repetition issue.
Stripe Press (@stripepress) 's Twitter Profile Photo

What is intelligence? What will it take to create AGI? What happens once we succeed? The Scaling Era: An Oral History of AI, 2019–2025 by Dwarkesh Patel and gavin leech is in the bay explores the questions animating those at the frontier of AI research. It’s out today: press.stripe.com/scaling

Dwarkesh Patel (@dwarkesh_sp) 's Twitter Profile Photo

The Scaling Era is out today. I'm actually surprised with how well this format works. Even better than my expectations. It's so interesting to read side-by-side how hyperscalar CEOs, AI researchers, and economists will answer the same question. Thank you to the Stripe Press

Dylan Patel ✈️ ICLR (@dylan522p) 's Twitter Profile Photo

Today we are launching InferenceMAX! We have support from Nvidia, AMD, OpenAI, Microsoft, Pytorch, SGLang, vLLM, Oracle, CoreWeave, TogetherAI, Nebius, Crusoe, HPE, SuperMicro, Dell It runs every day on the latest software (vLLM, SGLang, etc) across hundreds of GPUs, $10Ms of

Sham Kakade (@shamkakade6) 's Twitter Profile Photo

1/8 Second Order Optimizers like SOAP and Muon have shown impressive performance on LLM optimization. But are we fully utilizing the potential of second order information? New work: we show that a full second order optimizer is much better than existing optimizers in terms of

1/8 Second Order Optimizers like SOAP and Muon have shown impressive performance on LLM optimization. But are we fully utilizing the potential of second order information? New work: we show that a full second order optimizer is much better than existing optimizers in terms of