Junru Shao (@junrushao) 's Twitter Profile
Junru Shao

@junrushao

opinions are my own

ID: 881152441

linkhttps://linktr.ee/junrushao calendar_today14-10-2012 23:02:15

556 Tweet

1,1K Followers

415 Following

BosonAI (@boson_ai) 's Twitter Profile Photo

Excited to share Higgs-V2, improved both general and roleplaying abilities. The performance boost comes from the in-house built reward model. More at boson.ai/higgs-v2/

Excited to share Higgs-V2, improved both general and roleplaying abilities. The performance boost comes from the in-house built reward model. More at boson.ai/higgs-v2/
Hieu Pham (@hyhieu226) 's Twitter Profile Photo

📚🧑‍🎓New tutorial on WGMMA (WarpGroup Matrix Multiplication and Accumulation) research.colfax-intl.com/cutlass-tutori… If you have run PyTorch, Jax, or FlashAttention-3 on an H100 GPU, you have used WGMMA. Arguably the most important primitive in the H100's Hopper architecture, WGMMA is the

📚🧑‍🎓New tutorial on WGMMA (WarpGroup Matrix Multiplication and Accumulation) research.colfax-intl.com/cutlass-tutori…

If you have run PyTorch, Jax, or FlashAttention-3 on an H100 GPU, you have used WGMMA.

Arguably the most important primitive in the H100's Hopper architecture, WGMMA is the
Junru Shao (@junrushao) 's Twitter Profile Photo

Novelty considered harmful in this case. PyTorch/numpy syntax is a proven de facto standard to general users, so there’s literally no reason to reinvent the wheels

Yixin Dong (@yi_xin_dong) 's Twitter Profile Photo

🚀✨Introducing XGrammar: a fast, flexible, and portable engine for structured generation! 🤖Accurate JSON/grammar generation ⚡️3-10x speedup in latency 🤝Easy LLM engine integration ✅ Now in MLC-LLM, SGLang, WebLLM; vLLM & HuggingFace coming soon! blog.mlc.ai/2024/11/22/ach…

🚀✨Introducing XGrammar: a fast, flexible, and portable engine for structured generation!

🤖Accurate JSON/grammar generation
⚡️3-10x speedup in latency
🤝Easy LLM engine integration
✅ Now in MLC-LLM, SGLang, WebLLM; vLLM & HuggingFace coming soon!

blog.mlc.ai/2024/11/22/ach…
Tianqi Chen (@tqchenml) 's Twitter Profile Photo

🚀Future LLM agents speak JSON, python, and other structures. Excited to announce XGrammar, an structured generation library that enables zero-overhead structure constraining. Bring 2x-10x speedup in grammar guided LLM serving. Checkout github repo, blog to learn more 👉

Zihao Ye (@ye_combinator) 's Twitter Profile Photo

We are excite to announce FlashInfer v0.2! Core contributions of this release include: - Block/Vector Sparse (Paged) Attention on FlashAttention-3 - JIT compilation for customized attention variants - Fused Multi-head Latent Attention (MLA) decoding kernel - Lots of bugfix and

We are excite to announce FlashInfer v0.2!

Core contributions of this release include:
- Block/Vector  Sparse (Paged) Attention on FlashAttention-3 
- JIT compilation for customized attention variants
- Fused Multi-head Latent Attention (MLA) decoding kernel
- Lots of bugfix and
Hongyi Jin (@hongyijin258) 's Twitter Profile Photo

🚀Making cross-engine LLM serving programmable. Introducing LLM Microserving: a new RISC-style approach to design LLM serving API at sub-request level. Scale LLM serving with programmable cross-engine serving patterns, all in a few lines of Python. blog.mlc.ai/2025/01/07/mic…

🚀Making cross-engine LLM serving programmable. 
Introducing LLM Microserving: a new RISC-style approach to design LLM serving API at sub-request level. Scale LLM serving with programmable cross-engine serving patterns, all in a few lines of Python.
blog.mlc.ai/2025/01/07/mic…
Charlie Ruan (@charlie_ruan) 's Twitter Profile Photo

DeepSeek R1 Distilled models now on #WebLLM — locally accelerated by WebGPU and counting "r"s in 🍓 Reasoning models join the edge regime; small models are increasingly capable—excited to see what value edge can bring in 2025. Try it w/ no setup at chat.webllm.ai

Lei Wang (@lei_wang_1999) 's Twitter Profile Photo

Excited to release tilelang v0.1.0, another pythonic dsl for writing AI kernels with optional layout/pipeline annotations, and optional thread-level programming interface. If these features sound useful, please check it out and give a try :) github.com/tile-ai/tilela…

Excited to release tilelang v0.1.0, another pythonic dsl for writing AI kernels with optional layout/pipeline annotations, and optional thread-level programming interface. If these features sound useful, please check it out and give a try :) github.com/tile-ai/tilela…
Lei Wang (@lei_wang_1999) 's Twitter Profile Photo

Building on top of tvm is powerful! 🙌 I was able to adapt WGSL (WebGPU codegen) from TVM to Tile language in just a few hours, and believe adapting Hexagon, Metal, and other backends should be just as straightforward. Contributions are welcome! 🥰

Building on top of tvm is powerful! 🙌 I was able to adapt WGSL (WebGPU codegen) from TVM to Tile language in just a few hours, and believe adapting Hexagon, Metal, and other backends should be just as straightforward. Contributions are welcome! 🥰
Shiyi Cao (@shiyi_c98) 's Twitter Profile Photo

Thanks AK for sharing our new work (a great effort led by Dacheng Li ) in the coding domain NovaSky! S* extends parallel scaling with sequential refinement and enhances selection with adaptive input synthesis, achieving superior performance and great

Zihao Ye (@ye_combinator) 's Twitter Profile Photo

LLM is not all about tensor cores. categorical sampling under filters (top-p/top-k/min-p) are critical operators in llms as vocabulary size grows, flashinfer uses sorting-free rejection sampling algorithm for efficient sampling. checkout this great blog post written by @0xsling0

Lei Wang (@lei_wang_1999) 's Twitter Profile Photo

Happy to announce tilelang v0.1.3 🚀 Love to see and huge thanks for contributors to bring enhancements, optimizations, and bug fixes including Cute upgrades ✨, New kernels and tutorials like DeepGEMM⚡, Autotuning and Kernel Caches💾, and many more : ) github.com/tile-ai/tilela…

Happy to announce tilelang v0.1.3 🚀 Love to see and huge thanks for contributors to bring enhancements, optimizations, and bug fixes including Cute upgrades ✨, New kernels and tutorials like DeepGEMM⚡, Autotuning and Kernel Caches💾, and many more : ) github.com/tile-ai/tilela…
Tianqi Chen (@tqchenml) 's Twitter Profile Photo

Happy to share our latest work at ASPLOS 2025! LLMs are dynamic, both in sequence and batches. Relax brings an ML compiler IR that globally tracks symbolic shapes across functions on multiple levels. Bring efficient and flexible LLM AOT compilation arxiv.org/abs/2311.02103.

Lei Wang (@lei_wang_1999) 's Twitter Profile Photo

The DeepSeek team is so audacious as they tried writing tilelang kernels🥰, and luckily it's fast. Huge thanks for giving tilelang a try

The DeepSeek team is so audacious as they tried writing tilelang kernels🥰, and luckily it's fast. Huge thanks for giving tilelang a try
Infini-AI-Lab (@infiniailab) 's Twitter Profile Photo

🔥 We introduce Multiverse, a new generative modeling framework for adaptive and lossless parallel generation. 🚀 Multiverse is the first open-source non-AR model to achieve AIME24 and AIME25 scores of 54% and 46% 🌐 Website: multiverse4fm.github.io 🧵 1/n