Alexey Tumanov (@alsched) 's Twitter Profile
Alexey Tumanov

@alsched

Assistant Professor of Computer Science @gatech_scs @gtcomputing | postdoc @Berkeley_EECS @ucbrise | ML Systems

ID: 1003479212

linkhttps://faculty.cc.gatech.edu/~atumanov calendar_today11-12-2012 06:49:33

243 Tweet

538 Followers

274 Following

Georgia Tech School of Computer Science (@gatech_scs) 's Twitter Profile Photo

Three SCS faculty members were recognized by their students for outstanding teaching and educational impact. Congratulations to Ashutosh Dhekne, Alexey Tumanov, and Umakishore Ramachandran 👏👏👏 blog.ctl.gatech.edu/2024/05/21/spr…

Alexey Tumanov (@alsched) 's Twitter Profile Photo

Really proud of my PhD student's work on developing the new mechanism and policy that significantly improves tail latency performance in Large Language Model (LLM) inference without sacrificing throughput. Already received 10+ citations, source is OSS and adopted in the industry.

Alexey Tumanov (@alsched) 's Twitter Profile Photo

Let's set the standard for the interactive performance of LLMs capturing nuances of user experience. While latency/throughput tension is well known to the Systems community, latency jitter is less explored. Fluidity index & fluid token generation rate more aptly capture LLM perf.

Amey Agrawal (@agrawalamey12) 's Twitter Profile Photo

⚡ Speed Meets Accuracy: Unlike approximation-based methods, Mnemosyne achieves exact inference—ensuring that the generated output remains precise, even when processing 10 million tokens by effectively combining all these parallelization techniques to scale up to hundred of GPUS

⚡ Speed Meets Accuracy:

Unlike approximation-based methods, Mnemosyne achieves exact inference—ensuring that the generated output remains precise, even when processing 10 million tokens by effectively combining all these parallelization techniques to scale up to hundred of GPUS
Amey Agrawal (@agrawalamey12) 's Twitter Profile Photo

🔗 Curious to learn more? Dive into our paper to explore the technical details behind Mnemosyne: arxiv.org/abs/2409.17264…. Join work between Georgia Tech Computing, Microsoft and UCSD Engineering with amazing Esha Choukse, Alexey Tumanov, Ram Ramjee, Junda Chen, Íñigo Goiri & Chaojie Zhang!

Alexey Tumanov (@alsched) 's Twitter Profile Photo

First publicly known support for LLM context of up to 10M tokens with high throughput & interactive production-grade TBT SLOs (30ms) with Mnemosyne. What would it take to pair program with GenAI on millions of LoC? Or analyze 10/110hrs of video/audio content? All precisely! <v>

Amey Agrawal (@agrawalamey12) 's Twitter Profile Photo

Google has silently but surely developed an edge over OpenAI. Long context processing seems to be the key to Google's AI strategy. NotebookLM is a prime example of what long context processing can unlock. In our latest paper, we talk about how systems can be built to support

Alexey Tumanov (@alsched) 's Twitter Profile Photo

Super-charged technical program this year at ACM SoCC: acmsocc.org/2024/schedule.… Looking forward! Hope to see you there! #socc24

ACM SoCC (@acmsocc) 's Twitter Profile Photo

At SoCC’24, Anastasia Ailamaki from EPFL will give a keynote on how disaggregated memory resources are becoming the norm and how this “new memory wall” affects database system design. This talk will be amazing, make sure to be there!! acmsocc.org/2024/keynotes.…

At SoCC’24, Anastasia Ailamaki from EPFL will give a keynote on how disaggregated memory resources are becoming the norm and how this “new memory wall” affects database system design. This talk will be amazing, make sure to be there!!
acmsocc.org/2024/keynotes.…
Amey Agrawal (@agrawalamey12) 's Twitter Profile Photo

Sequence pipeline parallelism being rapidly adopted for extreme long context inference in the industry! Checkout our paper on system design for long context inference for more details arxiv.org/abs/2409.17264

Sequence pipeline parallelism being rapidly adopted for extreme long context inference in the industry! Checkout our paper on system design for long context inference for more details arxiv.org/abs/2409.17264
Amey Agrawal (@agrawalamey12) 's Twitter Profile Photo

Super long-context models with context window spanning millions of tokens are becoming commonplace (Google DeepMind Gemini, xAI Grok 3, Qwen Qwen2.5). But efficiently serving these models is tough, especially alongside short requests. Head-of-Line (HOL) blocking becomes

Super long-context models with context window spanning millions of tokens are becoming commonplace (<a href="/GoogleDeepMind/">Google DeepMind</a> Gemini, <a href="/xai/">xAI</a> Grok 3, <a href="/Alibaba_Qwen/">Qwen</a> Qwen2.5). But efficiently serving these models is tough, especially alongside short requests. Head-of-Line (HOL) blocking becomes
Amey Agrawal (@agrawalamey12) 's Twitter Profile Photo

Super excited to share another incredible systems that we have built over the past two years! Training giant foundation models (like Llama-3 405B) costs a FORTUNE đź’° (millions of dollars)! Optimizing the training "recipe" (parallelism, memory tricks, etc.) is critical but

Amey Agrawal (@agrawalamey12) 's Twitter Profile Photo

Maya offers a transparent, accurate, and efficient way to model and optimize large-scale DL training without needing expensive hardware clusters for exploration. A crucial step towards sustainable AI! Read the paper: arxiv.org/abs/2503.20191 Work done with Srihas Yarlagadda , Elton Pinto ,

Sachit Kuhar (@sachitkuhar) 's Twitter Profile Photo

Full code 🔓 github.com/sachitkuhar/PL… Collaboration with Yash Jain and Alexey Tumanov. (6/6) #EfficientAI #EdgeAI #Quantization #TMLR #AI #GaTech #GeorgiaTech

Amey Agrawal (@agrawalamey12) 's Twitter Profile Photo

Interesting work on long context inference from NVIDIA, where they scale KV parallelism on gb200-nvl72 systems! To learn more about accelerating long context inference and trade-offs between different parallelism dimensions checkout out our paper, Medha: arxiv.org/abs/2409.17264

Georgia Tech School of Computer Science (@gatech_scs) 's Twitter Profile Photo

Congratulations 👏 to our faculty who were recognized on the Spring 2025 CIOS Honor Roll for their outstanding teaching and educational impact: Assoc. Prof. Alexey Tumanov and Asst. Prof. Jan Van Den Brand!

Congratulations 👏 to our faculty who were recognized on the Spring 2025 CIOS Honor Roll for their outstanding teaching and educational impact: 
Assoc. Prof. Alexey Tumanov and Asst. Prof. Jan Van Den Brand!
Amey Agrawal (@agrawalamey12) 's Twitter Profile Photo

After hitting evaluation puzzles like this in our own work, we analyzed patterns across LLM inference papers and identified 8 systematic evaluation issues that can make performance comparisons misleading. We have compiled a practical evaluation checklist to help avoid these