profile-img
Joey Gonzalez

@profjoeyg

Professor @UCBerkeley, co-director of @LMSysorg, and co-founder @RunLLM

calendar_today25-06-2011 00:24:02

450 Tweets

2,6K Followers

275 Following

Joey Gonzalez(@profjoeyg) 's Twitter Profile Photo

Serving LLMs? My students found a way to accelerate serving by over an order-of-magnitude just by changing the way memory is managed (spoiler alert): gpu memory fragmentation = slow. Introducing vLLM with PagedAttention:

Serving LLMs? My students found a way to accelerate serving by over an order-of-magnitude just by changing the way memory is managed (spoiler alert): gpu memory fragmentation = slow. Introducing vLLM with PagedAttention:
account_circle