Reyna Abhyankar (@reyna_abhyankar) 's Twitter Profile
Reyna Abhyankar

@reyna_abhyankar

I like computers

ID: 1164413313800785920

calendar_today22-08-2019 05:45:50

12 Tweet

19 Takipçi

16 Takip Edilen

Zhihao Jia (@jiazhihao) 's Twitter Profile Photo

Generative LLMs are slow and expensive to serve. Their much smaller, distilled versions are faster and cheaper but achieve suboptimal generative performance. We show it is possible to achieve the best of both worlds. Code: github.com/flexflow/FlexF… Paper: cs.cmu.edu/~zhihaoj2/pape…

Yiying Zhang (@yiying__zhang) 's Twitter Profile Photo

Today, LLMs are constantly being augmented with tools, agents, models, RAG, etc. We built InferCept [ICML'24], the first serving framework designed for augmented LLMs. InferCept sustains a 1.6x-2x higher serving load than SOTA LLM serving systems. #AugLLM mlsys.wuklab.io/posts/infercep…

Yiying Zhang (@yiying__zhang) 's Twitter Profile Photo

LLM prompts are getting longer and increasingly shared with agents, tools, documents, etc. We introduce Preble, the first distributed LLM serving system targeting long and shared prompts. Preble reduces latency by 1.5-14.5x over SOTA serving systems. #LLM mlsys.wuklab.io/posts/preble/

Yiying Zhang (@yiying__zhang) 's Twitter Profile Photo

Join us at ICML in Vienna next Thursday 11:30-1pm local time (poster session 5) for our poster on InfeCept (Augmented, or compound, AI serving system) at Hall C 4-9 #709 Know more about InferCept with our newly posted video: youtube.com/watch?v=iOs1b0…

Yiying Zhang (@yiying__zhang) 's Twitter Profile Photo

WukLab's new study reveals CPU scheduling overhead can dominate LLM inference time—up to 50% in systems like vLLM! Scheduling overhead can no longer be ignored as model forwarding speeds increase and more scheduling tasks get added.#LLM #vLLM #SGLang Read tinyurl.com/yk4jeaz8

Yiying Zhang (@yiying__zhang) 's Twitter Profile Photo

Struggling with developing high-quality gen-AI apps? Meet Cognify: an open-source tool for automatically optimizing gen-AI workflows. 48% higher generation quality, 9x lower cost, fully compatible with LangChain, DSPy, Python. Read & try Cognify: tinyurl.com/a8b9cdnj #GenseeAI

Struggling with developing high-quality gen-AI apps? Meet Cognify: an open-source tool for automatically optimizing gen-AI workflows. 48% higher generation quality, 9x lower cost, fully compatible with LangChain, DSPy, Python. Read & try Cognify: tinyurl.com/a8b9cdnj #GenseeAI
Yiying Zhang (@yiying__zhang) 's Twitter Profile Photo

Boost your gen-AI workflow's quality by 2.8x with just $5 in 24 minutes! Check how Cognify autotunes gen-AI workflow’s quality and execution efficiency with a tiny budget in our latest blog post tinyurl.com/4tyvvdks. Paper tinyurl.com/3kx2xjn9. Code tinyurl.com/2tp9bndr.

Yiying Zhang (@yiying__zhang) 's Twitter Profile Photo

Computer-use AI agents (CUAs) are powerful, but way too slow. A 2-minute human task can take a CUA over 20 minutes! At Wuklab, we're building faster CUAs. Recently, we created OSWorld-Human, a new benchmark to close the speed gap between humans and machines. Read our full blog