LMCache Lab (@lmcache) Twitter Tweets • TwiCopy

LMCache Lab

@lmcache

+ Follow

🧪 Open-Source Team that maintains LMCache and Production Stack
🤖 Democratizing AI by providing efficient LLM serving for ALL

ID: 1834864489365450752

linkhttps://lmcache.ai/ calendar_today14-09-2024 07:59:32

19 Tweet

339 Takipçi

681 Takip Edilen

LMCache Lab

@lmcache

4 months ago

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

LMCache Lab

@lmcache

3 months ago

Grateful to the BentoML team for integrating LMCache — excited to help boost LLM inference together!

thumb_up_off_alt6

chat_bubble_outline0

repeat1

shareShare

CacheGen(arxiv.org/abs/2310.07240) lets you store KV caches on disk or AWS S3 and load them way faster than recomputing! Modern LLMs use long contexts, but reprocessing these every time is slow and resource-intensive. While engines like vLLM (and LMCache) can cache contexts in

thumb_up_off_alt28

chat_bubble_outline1

repeat6

shareShare

Red Hat AI

@redhat_ai

3 months ago

We're thrilled to share an integration between KServe and llm-d, bringing powerful, scalable LLM serving to Kubernetes. Our Red Hat + AI & 🌶 team is integrating llm-d, a Kubernetes-native distributed inference framework, into KServe. This is all about combining the best of both

thumb_up_off_alt8

chat_bubble_outline0

repeat4

shareShare

LMCache Lab

@lmcache

3 months ago

Looking forward to seeing LMCache on stage at Cloud Native K8s AI Day! 🚀 Don’t miss the KServe Next session for cutting-edge LLM serving insights.

thumb_up_off_alt11

chat_bubble_outline0

repeat0

shareShare

LMCache Lab

@lmcache

3 months ago

8 KV-Cache Systems You Can’t Afford to Miss in 2025 By 2025, KV-cache has evolved from a “nice-to-have” optimization into a critical layer for high-performance large language model (LLM) serving. From GPU-resident paging tricks to persistent, cross-node cache sharing, the

thumb_up_off_alt70

chat_bubble_outline1

repeat16

shareShare

Sumanth

@sumanth_077

3 months ago

Fastest inference engine for LLMs! LMCache is an LLM serving engine that reduce Time to First Token (TTFT) and increase throughput, especially under long-context scenarios. 100% Open Source

thumb_up_off_alt699

chat_bubble_outline9

repeat132

shareShare

LMCache Lab

@lmcache

3 months ago

love how Ashutosh frames the problem, ask us any questions about kvcache!

thumb_up_off_alt7

chat_bubble_outline1

repeat0

shareShare

LMCache Lab

@lmcache

3 months ago

Mark your calendars! Excited for the first FastAGI meetup featuring incredible speakers on AI infra & agents 🚀 Looking forward to the discussions and energy at LMCache Lab!

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

LMCache Lab

@lmcache

3 months ago

Thanks to Alex @ New Port AI invite us to introduce lmcache at the Bay Area Generative AI Builders Meetup! More details here: lu.ma/5sdoeg1y

thumb_up_off_alt5

chat_bubble_outline0

repeat1

shareShare

LMCache Lab

@lmcache

3 months ago

🚀 Exciting to see LMCache x Mooncake being discussed at the vLLM Shanghai Meetup! The ecosystem around vLLM is evolving fast — from distributed inference to hardware optimizations — and cache innovations like this will be key to unlocking the next level of efficiency &

thumb_up_off_alt7

chat_bubble_outline0

repeat1

shareShare

EmbeddedLLM

@embeddedllm

3 months ago

LMCache Lab in vLLM Singapore meetup!

<a href="/lmcache/">LMCache Lab</a> in <a href="/vllm_project/">vLLM</a> Singapore meetup!

thumb_up_off_alt11

chat_bubble_outline0

repeat3

shareShare