Stuart Sul
@stuart_sul
cs @ stanford
ID: 1811402960288751616
11-07-2024 14:12:07
0 Tweet
8 Followers
52 Following
GPU kernel launches are expensive--so we fused the entire Llama-1B into a single kernel. Very excited to kick off our megakernel framework series with Thunderkittens hazyresearch. More coming soon!
Happy Throughput Thursday! We’re excited to release Tokasaurus: an LLM inference engine designed from the ground up for high-throughput workloads with large and small models. (Joint work with Ayush Chakravarthy, Ryan Ehrlich, Sabri Eyuboglu, Bradley Brown, Joseph Shetaye,