@stasbekman : Inference: 250 tokens per second per user is all you need. The interesting thing about online inference is that unlike normal webserving it doesn't have to be as fast as possible, since it doesn't have to return the full generated response at once. Moreover depending on the • TwiCopy

Stas Bekman

@stasbekman

+ Follow

Toolmaker. Software creator, optimizer and harmonizer.

Makes ML systems work and fly @ Snowflake.

ID: 1068360975898660864

linkhttps://stasosphere.com/machine-learning/ calendar_today30-11-2018 04:28:00

2,2K Tweet

8,8K Takipçi

282 Takip Edilen

Stas Bekman

@stasbekman

10 months ago

Inference: 20 tokens per second per user is all you need. The interesting thing about online inference is that unlike normal webserving it doesn't have to be as fast as possible, since it doesn't have to return the full generated response at once. Moreover depending on the

thumb_up_off_alt42

chat_bubble_outline2

repeat3

shareShare