Yiying Zhang (@yiying__zhang) 's Twitter Profile
Yiying Zhang

@yiying__zhang

Founder and CEO of GenseeAI, Associate Professor of Computer Science at UCSD. LLM serving, AI Workflows, Agents

ID: 936289743511441408

calendar_today30-11-2017 17:44:05

31 Tweet

1,1K Takipçi

139 Takip Edilen

Yizhou Shan (@yizhou_shan) 's Twitter Profile Photo

Clio is an hardware-based (FPGA) memory disaggregation solution with a new virtual memory system, a customized transport, and a framework for computation offloading. [2/2]

Clio is an hardware-based (FPGA) memory disaggregation solution with a new virtual memory system, a customized transport, and a framework for computation offloading. [2/2]
Yiying Zhang (@yiying__zhang) 's Twitter Profile Photo

The Third Workshop on Resource Disaggregation and Serverless Computing (WORDS'22) will happen on 11/17/2022. Consider submitting your new or published works! Deadline of paper submission is 9/29/2022. More info can be found at wordsworkshop.org

Yiying Zhang (@yiying__zhang) 's Twitter Profile Photo

Deadline of WORDS'22 extended to 10/6. Consider submitting your new work (<= 5-page) or published work (<=2-page abstract).

Yiying Zhang (@yiying__zhang) 's Twitter Profile Photo

WORDS’22 (Workshop on Resource Disaggregation and Serverless) will happen on Nov 17th, both in person in San Diego, CA and virtually. Registration is free for both options! Check out the program and register here: wordsworkshop.org

Yiying Zhang (@yiying__zhang) 's Twitter Profile Photo

4th Workshop on Resource Disaggregation and Serverless Computing (co-located with SOSP’23). wordsworkshop.org Submissions open, deadline 7/16! Soliciting 5-page workshop papers and 2-page abstracts of recently published works on resource disaggregation/serverless computing.

Yiying Zhang (@yiying__zhang) 's Twitter Profile Photo

Today, LLMs are constantly being augmented with tools, agents, models, RAG, etc. We built InferCept [ICML'24], the first serving framework designed for augmented LLMs. InferCept sustains a 1.6x-2x higher serving load than SOTA LLM serving systems. #AugLLM mlsys.wuklab.io/posts/infercep…

Yiying Zhang (@yiying__zhang) 's Twitter Profile Photo

LLM prompts are getting longer and increasingly shared with agents, tools, documents, etc. We introduce Preble, the first distributed LLM serving system targeting long and shared prompts. Preble reduces latency by 1.5-14.5x over SOTA serving systems. #LLM mlsys.wuklab.io/posts/preble/

Yiying Zhang (@yiying__zhang) 's Twitter Profile Photo

Join us at ICML in Vienna next Thursday 11:30-1pm local time (poster session 5) for our poster on InfeCept (Augmented, or compound, AI serving system) at Hall C 4-9 #709 Know more about InferCept with our newly posted video: youtube.com/watch?v=iOs1b0…

Yiying Zhang (@yiying__zhang) 's Twitter Profile Photo

WukLab's new study reveals CPU scheduling overhead can dominate LLM inference time—up to 50% in systems like vLLM! Scheduling overhead can no longer be ignored as model forwarding speeds increase and more scheduling tasks get added.#LLM #vLLM #SGLang Read tinyurl.com/yk4jeaz8

Yiying Zhang (@yiying__zhang) 's Twitter Profile Photo

Struggling with developing high-quality gen-AI apps? Meet Cognify: an open-source tool for automatically optimizing gen-AI workflows. 48% higher generation quality, 9x lower cost, fully compatible with LangChain, DSPy, Python. Read & try Cognify: tinyurl.com/a8b9cdnj #GenseeAI

Struggling with developing high-quality gen-AI apps? Meet Cognify: an open-source tool for automatically optimizing gen-AI workflows. 48% higher generation quality, 9x lower cost, fully compatible with LangChain, DSPy, Python. Read &amp; try Cognify: tinyurl.com/a8b9cdnj #GenseeAI
Yiying Zhang (@yiying__zhang) 's Twitter Profile Photo

Boost your gen-AI workflow's quality by 2.8x with just $5 in 24 minutes! Check how Cognify autotunes gen-AI workflow’s quality and execution efficiency with a tiny budget in our latest blog post tinyurl.com/4tyvvdks. Paper tinyurl.com/3kx2xjn9. Code tinyurl.com/2tp9bndr.

Yiying Zhang (@yiying__zhang) 's Twitter Profile Photo

Check how Cognify uses only $5 and 24 minutes to cover a search space of $168K and weeks when autotuning gen-AI workflows in the pt.2 of our tech blog: tinyurl.com/yutx334k. Code tinyurl.com/2tp9bndr. Paper tinyurl.com/3kx2xjn9

Check how Cognify uses only $5 and 24 minutes to cover a search space of $168K and weeks when autotuning gen-AI workflows in the pt.2 of our tech blog:  tinyurl.com/yutx334k. Code tinyurl.com/2tp9bndr. Paper tinyurl.com/3kx2xjn9
Yiying Zhang (@yiying__zhang) 's Twitter Profile Photo

We're collecting insights on the current & potential use of AI agents to help build better future infrastructure. Please take our quick 1-2 minute survey: lnkd.in/gcWU9mmQ. Your responses are valuable for our R&D (anonymous option available), and you will receive a $25-$50

Yiying Zhang (@yiying__zhang) 's Twitter Profile Photo

We are excited to launch the free beta of our AI agent/workflow serving platform, designed for intelligent execution optimization; tester.gensee.ai. Send me a direct message for an invitation code if you want to try it out. #AI #AIAgent #GenseeAI #LLMs #Infrastructure