Aurick Qiao (@aurickq) 's Twitter Profile
Aurick Qiao

@aurickq

@SnowflakeDB AI Research | @LLM360 | Previously @PetuumInc | PhD @SCSatCMU | CS @UWaterloo

ID: 798664593056862208

calendar_today15-11-2016 23:10:52

112 Tweet

317 Followers

246 Following

Jeff Rasley (@jeffra45) 's Twitter Profile Photo

🧵1/ New release from Snowflake AI Research: Shift Parallelism is a new LLM inference technique built on top of vLLM, released through ArcticInference. It dramatically improves latency while preserving high throughput. Here’s what it looks like in action 👇

Hao AI Lab (@haoailab) 's Twitter Profile Photo

🚀 Dynasor is now production-ready in open-source stacks! NVIDIA TensorRT-LLM Snowflake ArcticInference Try it today ↓ TensorRT-LLM ➡️ github.com/NVIDIA/TensorR… Snowflake ➡️ github.com/snowflakedb/Ar… 🎮Original Dynasor Repo: github.com/hao-ai-lab/Dyn…

Hao AI Lab (@haoailab) 's Twitter Profile Photo

[Lmgame Bench] o3-pro: A Milestone in LLM Gaming! 🕹️ The leap from o3 to o3-pro is bigger than you might have thought. We tested o3-pro on Tetris and Sokoban— achieved SOTA on both and outperformed its previous self by a big margin. 🔍 🧱 Tetris Update o3-pro: ✅ 8+ lines

vLLM (@vllm_project) 's Twitter Profile Photo

vLLM has just reached 50K github stars! Huge thanks to the community!🚀 Together let's bring easy, fast, and cheap LLM serving for everyone✌🏻

vLLM has just reached 50K github stars! Huge thanks to the community!🚀
Together let's bring easy, fast, and cheap LLM serving for everyone✌🏻
Zhihao Jia (@jiazhihao) 's Twitter Profile Photo

One of the best ways to reduce LLM latency is by fusing all computation and communication into a single GPU megakernel. But writing megakernels by hand is extremely hard. 🚀Introducing Mirage Persistent Kernel (MPK), a compiler that automatically transforms LLMs into optimized

One of the best ways to reduce LLM latency is by fusing all computation and communication into a single GPU megakernel. But writing megakernels by hand is extremely hard.

🚀Introducing Mirage Persistent Kernel (MPK), a compiler that automatically transforms LLMs into optimized
Stas Bekman (@stasbekman) 's Twitter Profile Photo

My first project at Snowflake AI Research is complete! I present to you Arctic Long Sequence Training (ALST) Paper: arxiv.org/abs/2506.13996 Blog: snowflake.com/en/engineering… ALST is a set of modular, open-source techniques that enable training on sequences up to 15 million

My first project at <a href="/Snowflake/">Snowflake</a> AI Research is complete! 

I present to you Arctic Long Sequence Training (ALST) 

Paper: arxiv.org/abs/2506.13996
Blog: snowflake.com/en/engineering…

ALST is a set of modular, open-source techniques that enable training on sequences up to 15 million
Zhewei Yao (@yao_zhewei) 's Twitter Profile Photo

Michael Bendersky Jonathan Frankle Congrats Michael Bendersky! But it is better to quote the right number for Arctic-Text2SQL-R1, it is actually 73.84 since May 7th. You can check it from the leaderboard, bird-bench.github.io

Stas Bekman (@stasbekman) 's Twitter Profile Photo

Yay, our team has just published a new paper, “Shift Parallelism: Low-Latency, High-Throughput LLM Inference for Dynamic Workloads" arxiv.org/abs/2509.16495 Shift Parallelism is a new inference parallelism strategy that can dynamically switch between Tensor Parallelism and

Yay, our team has just published a new paper, “Shift Parallelism: Low-Latency, High-Throughput LLM Inference for Dynamic Workloads"

arxiv.org/abs/2509.16495

Shift Parallelism is a new inference parallelism strategy that can dynamically switch between Tensor Parallelism and
Mert Hidayetoğlu (@merthidayetoglu) 's Twitter Profile Photo

🚀 Excited to share that our new paper, “Shift Parallelism: Low-Latency, High-Throughput LLM Inference for Dynamic Workloads,” is now live on arXiv! Read the full paper 👉 arxiv.org/abs/2509.16495