Aurick Qiao (@aurickq) Twitter Tweets • TwiCopy

Aurick Qiao

@aurickq

+ Follow

@SnowflakeDB AI Research | @LLM360 | Previously @PetuumInc | PhD @SCSatCMU | CS @UWaterloo

ID: 798664593056862208

calendar_today15-11-2016 23:10:52

112 Tweet

317 Followers

246 Following

Jeff Rasley

@jeffra45

6 months ago

🧵1/ New release from Snowflake AI Research: Shift Parallelism is a new LLM inference technique built on top of vLLM, released through ArcticInference. It dramatically improves latency while preserving high throughput. Here’s what it looks like in action 👇

thumb_up_off_alt72

chat_bubble_outline1

repeat18

shareShare

Hongyi Wang

@hongyiwang10

6 months ago

Super excited to see GenBio featured in the Google Cloud blog on bio startups!

thumb_up_off_alt10

chat_bubble_outline0

repeat3

shareShare

Hao AI Lab

@haoailab

6 months ago

🚀 Dynasor is now production-ready in open-source stacks! NVIDIA TensorRT-LLM Snowflake ArcticInference Try it today ↓ TensorRT-LLM ➡️ github.com/NVIDIA/TensorR… Snowflake ➡️ github.com/snowflakedb/Ar… 🎮Original Dynasor Repo: github.com/hao-ai-lab/Dyn…

thumb_up_off_alt54

chat_bubble_outline3

repeat17

shareShare

Hao AI Lab

@haoailab

5 months ago

[Lmgame Bench] o3-pro: A Milestone in LLM Gaming! 🕹️ The leap from o3 to o3-pro is bigger than you might have thought. We tested o3-pro on Tetris and Sokoban— achieved SOTA on both and outperformed its previous self by a big margin. 🔍 🧱 Tetris Update o3-pro: ✅ 8+ lines

thumb_up_off_alt558

chat_bubble_outline12

repeat110

shareShare

vLLM

@vllm_project

5 months ago

vLLM has just reached 50K github stars! Huge thanks to the community!🚀 Together let's bring easy, fast, and cheap LLM serving for everyone✌🏻

thumb_up_off_alt265

chat_bubble_outline8

repeat22

shareShare

Zhihao Jia

@jiazhihao

5 months ago

One of the best ways to reduce LLM latency is by fusing all computation and communication into a single GPU megakernel. But writing megakernels by hand is extremely hard. 🚀Introducing Mirage Persistent Kernel (MPK), a compiler that automatically transforms LLMs into optimized

thumb_up_off_alt439

chat_bubble_outline6

repeat68

shareShare

Stas Bekman

@stasbekman

5 months ago

My first project at Snowflake AI Research is complete! I present to you Arctic Long Sequence Training (ALST) Paper: arxiv.org/abs/2506.13996 Blog: snowflake.com/en/engineering… ALST is a set of modular, open-source techniques that enable training on sequences up to 15 million

My first project at <a href="/Snowflake/">Snowflake</a> AI Research is complete!

I present to you Arctic Long Sequence Training (ALST)

Paper: arxiv.org/abs/2506.13996
Blog: snowflake.com/en/engineering…

ALST is a set of modular, open-source techniques that enable training on sequences up to 15 million

thumb_up_off_alt369

chat_bubble_outline16

repeat63

shareShare

Aurick Qiao

@aurickq

5 months ago

Arctic Inference helps All Hands AI complete real-world coding tasks 2x faster through faster LLM inference. Check it out!

thumb_up_off_alt23

chat_bubble_outline0

repeat8

shareShare

Zhewei Yao

@yao_zhewei

4 months ago

Michael Bendersky Jonathan Frankle Congrats Michael Bendersky! But it is better to quote the right number for Arctic-Text2SQL-R1, it is actually 73.84 since May 7th. You can check it from the leaderboard, bird-bench.github.io

thumb_up_off_alt7

chat_bubble_outline1

repeat1

shareShare

Ilya Sutskever

@ilyasut

3 months ago

a revolutionary breakthrough if i've ever seen one

thumb_up_off_alt22,22K

chat_bubble_outline708

repeat925

shareShare

Stas Bekman

@stasbekman

2 months ago

Yay, our team has just published a new paper, “Shift Parallelism: Low-Latency, High-Throughput LLM Inference for Dynamic Workloads" arxiv.org/abs/2509.16495 Shift Parallelism is a new inference parallelism strategy that can dynamically switch between Tensor Parallelism and

thumb_up_off_alt293

chat_bubble_outline3

repeat39

shareShare

Mert Hidayetoğlu

@merthidayetoglu

2 months ago

🚀 Excited to share that our new paper, “Shift Parallelism: Low-Latency, High-Throughput LLM Inference for Dynamic Workloads,” is now live on arXiv! Read the full paper 👉 arxiv.org/abs/2509.16495

thumb_up_off_alt4

chat_bubble_outline2

repeat1

shareShare