Harry (@categorified) 's Twitter Profile
Harry

@categorified

Beauty is truth, truth beauty,—that is all ye know on earth, and all ye need to know.

ID: 1720097875341000705

calendar_today02-11-2023 15:17:55

17 Tweet

5 Followers

103 Following

xjdr (@_xjdr) 's Twitter Profile Photo

stochasm nvfp4 . i am actually writing a blog about it (probably) to go along with the next set of nmoe releases. the expert LR should be different than the dense and embedding LR in _most_ settings. in bf16 it should be lower, but muon and your actual global batch can impact this. its

Harry (@categorified) 's Twitter Profile Photo

I wonder how much of deepseek’s engram perf gains would disappear with better multi word tokenisation arxiv.org/abs/2503.13423

Baseten (@basetenco) 's Twitter Profile Photo

🚀 We're thrilled to introduce the fastest, most accurate, and cost-efficient Whisper-powered transcription and diarization on the market: • 2400× RTF with Whisper Large V3 Turbo • Streaming transcription with consistent low latency • The most accurate real-time diarization

🚀 We're thrilled to introduce the fastest, most accurate, and cost-efficient Whisper-powered transcription and diarization on the market:  

• 2400× RTF with Whisper Large V3 Turbo
• Streaming transcription with consistent low latency
• The most accurate real-time diarization
Baseten (@basetenco) 's Twitter Profile Photo

Tired of waiting for video generation? Say less. We've optimized the Wan 2.2 runtime to hit: 3x faster inference on NVIDIA Blackwell, 2.5x faster on Hopper, 67% cost reduction. Read the full breakdown of our kernel optimizations and benchmarks here: baseten.co/blog/wan-2-2-v…

Tired of waiting for video generation? Say less.

We've optimized the Wan 2.2 runtime to hit: 3x faster inference on NVIDIA Blackwell, 2.5x faster on Hopper, 67% cost reduction.

Read the full breakdown of our kernel optimizations and benchmarks here: baseten.co/blog/wan-2-2-v…
Tuhin Srivastava (@tuhinone) 's Twitter Profile Photo

Baseten’s day 0 bet was that inference was the technology that would enable the best user experiences AI could deliver–fast, smart, reliable, secure. And that those experiences would rely not only on a handful of giant general intelligence models, but millions of specialized

NVIDIA AI Developer (@nvidiaaidev) 's Twitter Profile Photo

Most “efficient attention” tricks collapse at high KV compression ratios—DMS shows you can get ~8× KV compression with ~1K training steps and still improve reasoning Pareto frontiers vs dense Qwen-R1 models. The key: a learned, delayed token-eviction policy trained via logit

Most “efficient attention” tricks collapse at high KV compression ratios—DMS shows you can get ~8× KV compression with ~1K training steps and still improve reasoning Pareto frontiers vs dense Qwen-R1 models.   

The key: a learned, delayed token-eviction policy trained via logit
Harry (@categorified) 's Twitter Profile Photo

I really wonder how far we can push this: if we instead let the model choose the length of time to retain a token, and eventually evict all tokens, this could be a great way to get infinite context

Baseten (@basetenco) 's Twitter Profile Photo

We boosted acceptance rate by up to 40% with the Baseten Speculation Engine. How? By combining Multi-Token Prediction (MTP) with Suffix Automaton (SA) decoding. This hybrid approach crushes production coding workloads, delivering 30%+ longer acceptance lengths on code editing

We boosted acceptance rate by up to 40% with the Baseten Speculation Engine.

How? By combining Multi-Token Prediction (MTP) with Suffix Automaton (SA) decoding.

This hybrid approach crushes production coding workloads, delivering 30%+ longer acceptance lengths on code editing
Baseten (@basetenco) 's Twitter Profile Photo

The best OpenClaw🦞 setup is fully open-source. Kimi K2.5 on Baseten outperforms Opus 4.5 on agentic benchmarks at 8x lower cost. Faster inference, same or better quality. Set up in 2 minutes here: baseten.co/blog/openclaw-…

The best OpenClaw🦞 setup is fully open-source. 

Kimi K2.5 on Baseten outperforms Opus 4.5 on agentic benchmarks at 8x lower cost.

Faster inference, same or better quality. 

Set up in 2 minutes here: baseten.co/blog/openclaw-…
Paras Stefanopoulos (@stefanopopoulos) 's Twitter Profile Photo

OpenClaw w/ Kimi K2.5 is so good... The inference speeds on Baseten are nuts! To really knock your socks off... this "X" was written by yours truly, OpenClaw + Kimi K2.5 😎

Baseten (@basetenco) 's Twitter Profile Photo

LLMs are amnesiacs. Once context fills up, they forget everything. To fight this means grappling with a core question: how do you update a neural network without breaking what it already knows? In this piece, Charlie O'Neill and Harry Partridge argue that continual learning is

LLMs are amnesiacs. Once context fills up, they forget everything. To fight this means grappling with a core question: how do you update a neural network without breaking what it already knows?

In this piece, <a href="/charles0neill/">Charlie O'Neill</a> and <a href="/part_harry_/">Harry Partridge</a> argue that continual learning is
John Carmack (@id_aa_carmack) 's Twitter Profile Photo

256 Tb/s data rates over 200 km distance have been demonstrated on single mode fiber optic, which works out to 32 GB of data in flight, “stored” in the fiber, with 32 TB/s bandwidth. Neural network inference and training can have deterministic weight reference patterns, so it is

Baseten (@basetenco) 's Twitter Profile Photo

Introducing Kimi K2.5 on Baseten’s Model APIs with the most performant TTFT (0.26 sec) and TPS (340) on Artificial Analysis. Even among a landscape of incredible open source models, Kimi K2.5 stands out with its multi-modal capabilities and it's ability to accommodate an

Introducing Kimi K2.5 on Baseten’s Model APIs with the most performant TTFT (0.26 sec) and TPS (340) on Artificial Analysis.

Even among a landscape of incredible open source models, Kimi K2.5 stands out with its multi-modal capabilities and it's ability to accommodate an
Ali Taha (@aliestaha) 's Twitter Profile Photo

we quantized the best open-source diffusion model on the market 4 bits huge speedup (almost) no quality loss this is a full explanation of the trillion dollar industry's oldest trick

we quantized the best open-source diffusion model on the market

4 bits
huge speedup
(almost) no quality loss

this is a full explanation of the trillion dollar industry's oldest trick