LMCache Lab (@lmcache) 's Twitter Profile
LMCache Lab

@lmcache

🧪 Open-Source Team that maintains LMCache and Production Stack
🤖 Democratizing AI by providing efficient LLM serving for ALL

ID: 1834864489365450752

linkhttps://lmcache.ai/ calendar_today14-09-2024 07:59:32

19 Tweet

339 Followers

681 Following

LMCache Lab (@lmcache) 's Twitter Profile Photo

CacheGen(arxiv.org/abs/2310.07240) lets you store KV caches on disk or AWS S3 and load them way faster than recomputing! Modern LLMs use long contexts, but reprocessing these every time is slow and resource-intensive. While engines like vLLM (and LMCache) can cache contexts in

CacheGen(arxiv.org/abs/2310.07240) lets you store KV caches on disk or AWS S3 and load them way faster than recomputing! 

Modern LLMs use long contexts, but reprocessing these every time is slow and resource-intensive.

While engines like vLLM (and LMCache) can cache contexts in
Red Hat AI (@redhat_ai) 's Twitter Profile Photo

We're thrilled to share an integration between KServe and llm-d, bringing powerful, scalable LLM serving to Kubernetes. Our Red Hat + AI & 🌶 team is integrating llm-d, a Kubernetes-native distributed inference framework, into KServe. This is all about combining the best of both

LMCache Lab (@lmcache) 's Twitter Profile Photo

Looking forward to seeing LMCache on stage at Cloud Native K8s AI Day! 🚀 Don’t miss the KServe Next session for cutting-edge LLM serving insights.

LMCache Lab (@lmcache) 's Twitter Profile Photo

8 KV-Cache Systems You Can’t Afford to Miss in 2025 By 2025, KV-cache has evolved from a “nice-to-have” optimization into a critical layer for high-performance large language model (LLM) serving. From GPU-resident paging tricks to persistent, cross-node cache sharing, the

8 KV-Cache Systems You Can’t Afford to Miss in 2025

By 2025, KV-cache has evolved from a “nice-to-have” optimization into a critical layer for high-performance large language model (LLM) serving.
From GPU-resident paging tricks to persistent, cross-node cache sharing, the
Sumanth (@sumanth_077) 's Twitter Profile Photo

Fastest inference engine for LLMs! LMCache is an LLM serving engine that reduce Time to First Token (TTFT) and increase throughput, especially under long-context scenarios. 100% Open Source

Fastest inference engine for LLMs!

LMCache is an LLM serving engine that reduce Time to First Token (TTFT) and increase throughput, especially under long-context scenarios.

100% Open Source
LMCache Lab (@lmcache) 's Twitter Profile Photo

Mark your calendars! Excited for the first FastAGI meetup featuring incredible speakers on AI infra & agents 🚀 Looking forward to the discussions and energy at LMCache Lab!

LMCache Lab (@lmcache) 's Twitter Profile Photo

Thanks to Alex @ New Port AI invite us to introduce lmcache at the Bay Area Generative AI Builders Meetup! More details here: lu.ma/5sdoeg1y

LMCache Lab (@lmcache) 's Twitter Profile Photo

🚀 Exciting to see LMCache x Mooncake being discussed at the vLLM Shanghai Meetup! The ecosystem around vLLM is evolving fast — from distributed inference to hardware optimizations — and cache innovations like this will be key to unlocking the next level of efficiency &