Yixin Dong (@yi_xin_dong) 's Twitter Profile
Yixin Dong

@yi_xin_dong

Ph.D. student @SCSatCMU, prev @deepseek_ai, @uwcse, @sjtu1896. @ApacheTVM contributor. Working on ML and systems. All views are my own

ID: 1028438483365490688

linkhttps://github.com/Ubospica calendar_today12-08-2018 00:30:16

68 Tweet

400 Followers

544 Following

Xinyu Yang (@xinyu2ml) 's Twitter Profile Photo

We will be presenting "APE: Faster and Longer Context-Augmented Generation via Adaptive Parallel Encoding", a novel encoding method that enables: ๐Ÿš€Pre-caching Contexts for Fast Inference ๐ŸRe-using Positions for Long Context Our poster session is located in Hall 3 and Hall 2B,

We will be presenting "APE: Faster and Longer Context-Augmented Generation via Adaptive Parallel Encoding", a novel encoding method that enables:
๐Ÿš€Pre-caching Contexts for Fast Inference
๐ŸRe-using Positions for Long Context

Our poster session is located in Hall 3 and Hall 2B,
Muyang Li (@lmxyy1999) 's Twitter Profile Photo

๐Ÿš€ How to run 12B FLUX.1 on your local laptop with 2-3ร— speedup? Come check out our #SVDQuant (#ICLR2025 Spotlight) poster session! ๐ŸŽ‰ ๐Ÿ—“๏ธ When: Friday, Apr 25, 10โ€“12:30 (Singapore time) ๐Ÿ“ Where: Hall 3 + Hall 2B, Poster 169 ๐Ÿ“Œ Poster: tinyurl.com/poster-svdquant ๐ŸŽฎ Demo:

๐Ÿš€ How to run 12B FLUX.1 on your local laptop with 2-3ร— speedup? Come check out our #SVDQuant (#ICLR2025 Spotlight) poster session! ๐ŸŽ‰ 
๐Ÿ—“๏ธ When: Friday, Apr 25, 10โ€“12:30 (Singapore time)
๐Ÿ“ Where: Hall 3 + Hall 2B, Poster 169
๐Ÿ“Œ Poster: tinyurl.com/poster-svdquant
๐ŸŽฎ Demo:
Cognition (@cognition_labs) 's Twitter Profile Photo

Project DeepWiki Up-to-date documentation you can talk to, for every repo in the world. Think Deep Research for GitHub โ€“ powered by Devin. Itโ€™s free for open-source, no sign-up! Visit deepwiki com or just swap github โ†’ deepwiki on any repo URL:

Saining Xie (@sainingxie) 's Twitter Profile Photo

Wow, Deeply Supervised Nets received the Test of Time award at AISTATS Conference 2025! It was the very first paper I submitted during my PhD. Fun fact: the paper was originally rejected by NeurIPS with scores of 8/8/7 (yes, that pain stuck with me... maybe now I can finally let it

zhyncs (@zhyncs42) 's Twitter Profile Photo

MLSys 2025 is coming up! Want to meet the developers behind FlashInfer, XGrammar, and SGLang LMSYS Org in person? Join us for the Happy Hour on May 12โ€”weโ€™d love to see you there! lu.ma/dl99yjoe

Si-ze Zheng (@deeplyignorant) 's Twitter Profile Photo

๐Ÿš€ We released Triton-distributed! ๐ŸŒŸ Build compute-comm. overlapping kernels for GPUsโ€”performance rivals optimized libraries ๐Ÿ”— github.com/ByteDance-Seedโ€ฆ ๐Ÿ‘ Shoutout to AMD for testing our work! Check their blog: ๐Ÿ”— โ€ฆrocm-blogs--981.com.readthedocs.build/projects/interโ€ฆ

Yixin Dong (@yi_xin_dong) 's Twitter Profile Photo

We are hosting a happy hour with LMSYS Org at #mlsys2025! Join us for engaging talks on SGLang, the structured generation library XGrammar, and the high-performance kernel library FlashInfer. Enjoy great food, lively discussions, and connect with the community! Click to join ๐Ÿ‘‰

We are hosting a happy hour with <a href="/lmsysorg/">LMSYS Org</a> at #mlsys2025! Join us for engaging talks on SGLang, the structured generation library XGrammar, and the high-performance kernel library FlashInfer. Enjoy great food, lively discussions, and connect with the community! Click to join ๐Ÿ‘‰
NVIDIA AI Developer (@nvidiaaidev) 's Twitter Profile Photo

๐ŸŽ‰ Congratulations to the FlashInfer team โ€“ their technical paper, "FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving," just won best paper at #MLSys2025. ๐Ÿ† ๐Ÿ™Œ We are excited to share that we are now backing FlashInfer โ€“ a supporter and

Zihao Ye (@ye_combinator) 's Twitter Profile Photo

Weโ€™re thrilled that FlashInfer won a Best Paper Award at MLSys 2025! ๐ŸŽ‰ This wouldnโ€™t have been possible without the community โ€” huge thanks to LMSYS Orgโ€™s sglang for deep co-design (which is crtical for inference kernel evolution) and stress-testing over the years, and to

Xinyu Yang (@xinyu2ml) 's Twitter Profile Photo

๐ŸŒŸ Don't miss out! The paper submission deadline for the R2-FM workshop is May 30th (AoE). We welcome your related work contributions! โค๏ธโ€๐Ÿ”ฅ

๐š๐”ช๐Ÿพ๐šก๐šก๐Ÿพ (@gm8xx8) 's Twitter Profile Photo

Hardware-Efficient Attention for Fast Decoding Princeton optimizes decoding by maximizing arithmetic intensity (FLOPs/byte) for better memoryโ€“compute efficiency: - GTA (Grouped-Tied Attention) Ties key/value states + partial RoPE โ†’ 2ร— arithmetic intensity vs. GQA, ยฝ KV cache,

Hardware-Efficient Attention for Fast Decoding

Princeton optimizes decoding by maximizing arithmetic intensity (FLOPs/byte) for better memoryโ€“compute efficiency:

- GTA (Grouped-Tied Attention)
Ties key/value states + partial RoPE โ†’ 2ร— arithmetic intensity vs. GQA, ยฝ KV cache,
Intology (@intologyai) 's Twitter Profile Photo

The 1st fully AI-generated scientific discovery to pass the highest level of peer review โ€“ the main track of an A* conference (ACL 2025). Zochi, the 1st PhD-level agent. Beta open.

Enze Xie (@xieenze_jr) 's Twitter Profile Photo

๐Ÿš€ Fast-dLLM: 27.6ร— Faster Diffusion LLMs with KV Cache & Parallel Decoding ๐Ÿ’ฅ Key Features๐ŸŒŸ - Block-Wise KV Cache Reuses 90%+ attention activations via bidirectional caching (prefix/suffix), enabling 8.1ร—โ€“27.6ร— throughput gains with <2% accuracy loss ๐Ÿ”„ -

๐Ÿš€ Fast-dLLM: 27.6ร— Faster Diffusion LLMs with KV Cache &amp; Parallel Decoding ๐Ÿ’ฅ  

Key Features๐ŸŒŸ  
- Block-Wise KV Cache  
  Reuses 90%+ attention activations via bidirectional caching (prefix/suffix), enabling 8.1ร—โ€“27.6ร— throughput gains with &lt;2% accuracy loss ๐Ÿ”„  
-
Hao Kang (@gt_haokang) 's Twitter Profile Photo

๐Ÿš€๐Ÿ“‰ A new kind of efficiency challenge: "Win Fast or Lose Slow: Balancing Speed and Accuracy in Latency-Sensitive Decisions of LLMs" We explore a new frontier: what if the reward doesnโ€™t come from being rightโ€”but from being fast and right? ๐Ÿ”— arxiv.org/abs/2505.19481 ๐Ÿ›œ

Databricks (@databricks) 's Twitter Profile Photo

Announcing Agent Bricks: auto-optimize agents for your domain tasks. Provide a high-level description of the agentโ€™s task, and connect your enterprise data โ€” Agent Bricks handles the rest. Agent Bricks builds out an agent system that automatically optimizes against your goals

Infini-AI-Lab (@infiniailab) 's Twitter Profile Photo

๐Ÿ”ฅ We introduce Multiverse, a new generative modeling framework for adaptive and lossless parallel generation. ๐Ÿš€ Multiverse is the first open-source non-AR model to achieve AIME24 and AIME25 scores of 54% and 46% ๐ŸŒ Website: multiverse4fm.github.io ๐Ÿงต 1/n

Xinyu Yang (@xinyu2ml) 's Twitter Profile Photo

๐Ÿš€ Super excited to share Multiverse! ๐Ÿƒ Itโ€™s been a long journey exploring the space between model design and hardware efficiency. What excites me most is realizing that, beyond optimizing existing models, we can discover better model architectures by embracing system-level

Zhihao Jia (@jiazhihao) 's Twitter Profile Photo

One of the best ways to reduce LLM latency is by fusing all computation and communication into a single GPU megakernel. But writing megakernels by hand is extremely hard. ๐Ÿš€Introducing Mirage Persistent Kernel (MPK), a compiler that automatically transforms LLMs into optimized

One of the best ways to reduce LLM latency is by fusing all computation and communication into a single GPU megakernel. But writing megakernels by hand is extremely hard.

๐Ÿš€Introducing Mirage Persistent Kernel (MPK), a compiler that automatically transforms LLMs into optimized