Kazz (@kazzorr_) Twitter Tweets • TwiCopy

Elvis

@glitchphoton

a month ago

x.com/i/article/2025…

thumb_up_off_alt2,2K

chat_bubble_outline90

repeat224

shareShare

Something is rotten with policy gradient. PG has become *the* RL loss for LLMs. But it’s not even good at basic RL. Even on MNIST with bandit feedback, vanilla PG performs far worse than cross-entropy because it wastes gradient budget. Delightful Policy Gradient:

thumb_up_off_alt436

chat_bubble_outline16

repeat44

shareShare

Thariq

@trq212

15 days ago

I put a lot of heart into my technical writing, I hope it's useful to you all. 📌 Here's a pinned thread of everything I've written. (much of this will be posted on the Claude blog soon as well)

thumb_up_off_alt6,6K

chat_bubble_outline210

repeat649

shareShare

Simplifying Complexity

@simplifyinai

14 days ago

🚨 BREAKING: Tencent has killed the “next-token” paradigm. Tencent and Tsinghua has released CALM (Continuous Autoregressive Language Models), and it completely disrupts the next-token paradigm. LLMs currently waste massive amounts of compute predicting discrete, single tokens

thumb_up_off_alt4,4K

chat_bubble_outline138

repeat564

shareShare

alphaXiv

@askalphaxiv

14 days ago

"Foundations of Schrödinger Bridges for Generative Modeling" This paper shows that diffusion models, score-based models, and flow matching are really just different views of the same core idea: a Schrödinger bridge that moves noise into data along the most efficient stochastic

thumb_up_off_alt496

chat_bubble_outline5

repeat68

shareShare

Emiel Hoogeboom

@emiel_hoogeboom

14 days ago

You may think discrete distillation is fundamentally flawed, you are (surprisingly) wrong. 🤯 Meet Discrete Moment Distillation (D-MMD). It is a new method that brings fast, few-step sampling to discrete diffusion models! 🧵👇

thumb_up_off_alt250

chat_bubble_outline6

repeat39

shareShare

Unsloth AI

@unslothai

13 days ago

You can now train Qwen3.5 with RL in our free notebook! You just need 8GB VRAM to RL Qwen3.5-2B locally! Qwen3.5 will learn to solve math problems autonomously via vision GRPO. RL Guide: unsloth.ai/docs/get-start… GitHub: github.com/unslothai/unsl… Qwen3-4B: colab.research.google.com/github/unsloth…

thumb_up_off_alt1,1K

chat_bubble_outline17

repeat160

shareShare

Lucas Maes

@lucasmaes_

13 days ago

JEPA are finally easy to train end-to-end without any tricks! Excited to introduce LeWorldModel: a stable, end-to-end JEPA that learns world models directly from pixels, no heuristics. 15M params, 1 GPU, and full planning <1 second. 📑: le-wm.github.io

thumb_up_off_alt3,3K

chat_bubble_outline95

repeat511

shareShare

Antonio Orvieto

@orvieto_antonio

13 days ago

Optimization theory for adaptive methods actually predicts most of what we know about hyperparameter scaling in LLM pretraining, and suggests new strategies as well. We did a deep dive here.

thumb_up_off_alt567

chat_bubble_outline10

repeat69

shareShare

alphaXiv

@askalphaxiv

13 days ago

Yann LeCun and his team can't stop cooking "LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels" One of the biggest bottlenecks of JEPA is they are hard to train, and this new research changes that. They propose LeWorldModel, which shows that a

thumb_up_off_alt1,1K

chat_bubble_outline40

repeat241

shareShare

Nav Singh

@heynavsingh

12 days ago

🚨 Electrical engineers are going to hate this. Someone just turned React into a circuit board factory. Write code. Get a real PCB manufactured and delivered to your door. It's called tscircuit. React for Electronics. No Altium. No $10,000/year licenses. No 6-month learning

thumb_up_off_alt4,4K

chat_bubble_outline138

repeat639

shareShare

Snyk

@snyksec

12 days ago

Andrej Karpathy The LiteLLM dependency incident didn't "just happen" though. This is part of a larger campaign LiteLLM already extends to supply chain security fallout for other projects: snyk.io/articles/poiso…

thumb_up_off_alt1,1K

chat_bubble_outline15

repeat154

shareShare

Wildminder

@wildmindai

11 days ago

NVIDIA says: no more "brute force every pixel" of video understanding. AutoGaze- identifies and removes redundant video patches before they enter a Vision Transformer. Now we can processes 4K long-video in real-time. Works with SigLIP2 and NVILA. autogaze.github.io

thumb_up_off_alt2,2K

chat_bubble_outline75

repeat165

shareShare

Brian Roemmele

@brianroemmele

11 days ago

LeWorldModel: Yann LeCuns Radical Simplification of World Models Just Made Physics-Aware AI Practical In the race for artificial general intelligence, two paths have emerged. One is the familiar scale everything route: bigger LLMs trained on ever-larger text corpora. The other,

thumb_up_off_alt418

chat_bubble_outline20

repeat89

shareShare

Sawyer Hood

@sawyerhood

11 days ago

Introducing the new dev-browser cli. The fastest way for an agent to use a browser is to let it write code. Just `npm i -g dev-browser` and tell your agent to "use dev-browser"

thumb_up_off_alt2,2K

chat_bubble_outline147

repeat277

shareShare

Om Patel

@om_patel5

11 days ago

THIS GUY MADE A CLAUDE CODE SKILL THAT CLONES ANY WEBSITE IN ONE PROMPT everyone tries to clone websites by taking screenshots and hoping for the best. that gets you maybe halfway there. there's a better way. Claude Code has a built-in Chrome MCP that goes straight to the

thumb_up_off_alt3,3K

chat_bubble_outline106

repeat299

shareShare

himanshu dubey

@himanshustwts

10 days ago

nanoGPT by Andrej Karpathy is still the most relevant reference to hack and learn if someone is starting out in ai research. i tried to look (been a longtime!) what all work has been done to beat the baseline: > Architectural modernization (RoPE, QK-norm, ReLU, RMSnorm etc) >

nanoGPT by <a href="/karpathy/">Andrej Karpathy</a> is still the most relevant reference to hack and learn if someone is starting out in ai research.

i tried to look (been a longtime!) what all work has been done to beat the baseline:

> Architectural modernization (RoPE, QK-norm, ReLU, RMSnorm etc)
>

thumb_up_off_alt308

chat_bubble_outline7

repeat16

shareShare

Yesterday Work

@yesterday_work_

9 days ago

🚨 BREAKING: HuggingFace just dropped their complete AI engineering playbook to the public. They released 12 courses that were internal-only until this week. This covers LLMs, Robotics, and MCP, which is the exact tech stack behind Llama, Mistral, and every major open model.

thumb_up_off_alt1,1K

chat_bubble_outline23

repeat273

shareShare

Martin

@mjbukow

7 days ago

Andy It's more complex than that. Because the residual stream is purely additive, low-level gradient noise and intralayer communication signals accumulate across layers. The norm of the hidden states steadily increases with depth. In the last few layers, the model turns up the volume

thumb_up_off_alt64

chat_bubble_outline3

repeat5

shareShare

Boris Cherny

@bcherny

7 days ago

I wanted to share a bunch of my favorite hidden and under-utilized features in Claude Code. I'll focus on the ones I use the most. Here goes.

thumb_up_off_alt11,11K

chat_bubble_outline289

repeat1,1K

shareShare

Kazz

Elvis

Ian Osband

Thariq

Simplifying Complexity

alphaXiv

Emiel Hoogeboom

Unsloth AI

Lucas Maes

Antonio Orvieto

alphaXiv

Nav Singh

Snyk

Wildminder

Brian Roemmele

Sawyer Hood

Om Patel

himanshu dubey

Yesterday Work

Martin

Boris Cherny