Yixin Dong (@yi_xin_dong) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

We will be presenting "APE: Faster and Longer Context-Augmented Generation via Adaptive Parallel Encoding", a novel encoding method that enables: 🚀Pre-caching Contexts for Fast Inference 🐍Re-using Positions for Long Context Our poster session is located in Hall 3 and Hall 2B,

thumb_up_off_alt50

chat_bubble_outline0

repeat22

shareShare

Muyang Li

@lmxyy1999

4 months ago

🚀 How to run 12B FLUX.1 on your local laptop with 2-3× speedup? Come check out our #SVDQuant (#ICLR2025 Spotlight) poster session! 🎉 🗓️ When: Friday, Apr 25, 10–12:30 (Singapore time) 📍 Where: Hall 3 + Hall 2B, Poster 169 📌 Poster: tinyurl.com/poster-svdquant 🎮 Demo:

thumb_up_off_alt33

chat_bubble_outline0

repeat7

shareShare

Cognition

@cognition_labs

4 months ago

Project DeepWiki Up-to-date documentation you can talk to, for every repo in the world. Think Deep Research for GitHub – powered by Devin. It’s free for open-source, no sign-up! Visit deepwiki com or just swap github → deepwiki on any repo URL:

thumb_up_off_alt4,4K

chat_bubble_outline137

repeat724

shareShare

Saining Xie

@sainingxie

3 months ago

Wow, Deeply Supervised Nets received the Test of Time award at AISTATS Conference 2025! It was the very first paper I submitted during my PhD. Fun fact: the paper was originally rejected by NeurIPS with scores of 8/8/7 (yes, that pain stuck with me... maybe now I can finally let it

thumb_up_off_alt499

chat_bubble_outline33

repeat42

shareShare

zhyncs

@zhyncs42

3 months ago

MLSys 2025 is coming up! Want to meet the developers behind FlashInfer, XGrammar, and SGLang LMSYS Org in person? Join us for the Happy Hour on May 12—we’d love to see you there! lu.ma/dl99yjoe

thumb_up_off_alt35

chat_bubble_outline0

repeat9

shareShare

Si-ze Zheng

@deeplyignorant

3 months ago

🚀 We released Triton-distributed! 🌟 Build compute-comm. overlapping kernels for GPUs—performance rivals optimized libraries 🔗 github.com/ByteDance-Seed… 👏 Shoutout to AMD for testing our work! Check their blog: 🔗 …rocm-blogs--981.com.readthedocs.build/projects/inter…

thumb_up_off_alt53

chat_bubble_outline2

repeat10

shareShare

Yixin Dong

@yi_xin_dong

3 months ago

We are hosting a happy hour with LMSYS Org at #mlsys2025! Join us for engaging talks on SGLang, the structured generation library XGrammar, and the high-performance kernel library FlashInfer. Enjoy great food, lively discussions, and connect with the community! Click to join 👉

We are hosting a happy hour with <a href="/lmsysorg/">LMSYS Org</a> at #mlsys2025! Join us for engaging talks on SGLang, the structured generation library XGrammar, and the high-performance kernel library FlashInfer. Enjoy great food, lively discussions, and connect with the community! Click to join 👉

thumb_up_off_alt80

chat_bubble_outline1

repeat12

shareShare

Tianqi Chen

@tqchenml

3 months ago

FlashInfer won #MLSys2025 best paper🏆, with backing from NVIDIA AI Developer to bring top LLM inference kernels to the community

thumb_up_off_alt140

chat_bubble_outline5

repeat26

shareShare

NVIDIA AI Developer

@nvidiaaidev

3 months ago

🎉 Congratulations to the FlashInfer team – their technical paper, "FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving," just won best paper at #MLSys2025. 🏆 🙌 We are excited to share that we are now backing FlashInfer – a supporter and

thumb_up_off_alt202

chat_bubble_outline4

repeat46

shareShare

Zihao Ye

@ye_combinator

3 months ago

We’re thrilled that FlashInfer won a Best Paper Award at MLSys 2025! 🎉 This wouldn’t have been possible without the community — huge thanks to LMSYS Org’s sglang for deep co-design (which is crtical for inference kernel evolution) and stress-testing over the years, and to

thumb_up_off_alt229

chat_bubble_outline16

repeat37

shareShare

Xinyu Yang

@xinyu2ml

3 months ago

🌟 Don't miss out! The paper submission deadline for the R2-FM workshop is May 30th (AoE). We welcome your related work contributions! ❤️‍🔥

thumb_up_off_alt10

chat_bubble_outline0

repeat3

shareShare

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

2 months ago

Hardware-Efficient Attention for Fast Decoding Princeton optimizes decoding by maximizing arithmetic intensity (FLOPs/byte) for better memory–compute efficiency: - GTA (Grouped-Tied Attention) Ties key/value states + partial RoPE → 2× arithmetic intensity vs. GQA, ½ KV cache,

thumb_up_off_alt196

chat_bubble_outline4

repeat46

shareShare

Intology

@intologyai

2 months ago

The 1st fully AI-generated scientific discovery to pass the highest level of peer review – the main track of an A* conference (ACL 2025). Zochi, the 1st PhD-level agent. Beta open.

thumb_up_off_alt655

chat_bubble_outline36

repeat134

shareShare

Enze Xie

@xieenze_jr

2 months ago

🚀 Fast-dLLM: 27.6× Faster Diffusion LLMs with KV Cache & Parallel Decoding 💥 Key Features🌟 - Block-Wise KV Cache Reuses 90%+ attention activations via bidirectional caching (prefix/suffix), enabling 8.1×–27.6× throughput gains with <2% accuracy loss 🔄 -

thumb_up_off_alt174

chat_bubble_outline8

repeat34

shareShare

Hao Kang

@gt_haokang

2 months ago

🚀📉 A new kind of efficiency challenge: "Win Fast or Lose Slow: Balancing Speed and Accuracy in Latency-Sensitive Decisions of LLMs" We explore a new frontier: what if the reward doesn’t come from being right—but from being fast and right? 🔗 arxiv.org/abs/2505.19481 🛜

thumb_up_off_alt59

chat_bubble_outline3

repeat12

shareShare

Databricks

@databricks

2 months ago

Announcing Agent Bricks: auto-optimize agents for your domain tasks. Provide a high-level description of the agent’s task, and connect your enterprise data — Agent Bricks handles the rest. Agent Bricks builds out an agent system that automatically optimizes against your goals

thumb_up_off_alt161

chat_bubble_outline7

repeat33

shareShare

Infini-AI-Lab

@infiniailab

2 months ago

🔥 We introduce Multiverse, a new generative modeling framework for adaptive and lossless parallel generation. 🚀 Multiverse is the first open-source non-AR model to achieve AIME24 and AIME25 scores of 54% and 46% 🌐 Website: multiverse4fm.github.io 🧵 1/n

thumb_up_off_alt207

chat_bubble_outline2

repeat76

shareShare

Xinyu Yang

@xinyu2ml

2 months ago

🚀 Super excited to share Multiverse! 🏃 It’s been a long journey exploring the space between model design and hardware efficiency. What excites me most is realizing that, beyond optimizing existing models, we can discover better model architectures by embracing system-level

thumb_up_off_alt57

chat_bubble_outline3

repeat18

shareShare

Zhihao Jia

@jiazhihao

2 months ago

One of the best ways to reduce LLM latency is by fusing all computation and communication into a single GPU megakernel. But writing megakernels by hand is extremely hard. 🚀Introducing Mirage Persistent Kernel (MPK), a compiler that automatically transforms LLMs into optimized

thumb_up_off_alt439

chat_bubble_outline6

repeat68

shareShare