Stas Bekman (@stasbekman) 's Twitter Profile
Stas Bekman

@stasbekman

Toolmaker. Software creator, optimizer and harmonizer.

Makes things work and fly at @ContextualAI

Training LLM/RAG/Generative AI/Machine Learning/Scalability

ID: 1068360975898660864

linkhttps://stasosphere.com/machine-learning/ calendar_today30-11-2018 04:28:00

1,1K Tweet

7,7K Followers

273 Following

Stas Bekman (@stasbekman) 's Twitter Profile Photo

Inference: 20 tokens per second per user is all you need. The interesting thing about online inference is that unlike normal webserving it doesn't have to be as fast as possible, since it doesn't have to return the full generated response at once. Moreover depending on the

Inference: 20 tokens per second per user is all you need.

The interesting thing about online inference is that unlike normal webserving it doesn't have to be as fast as possible, since it doesn't have to return the full generated response at once.

Moreover depending on the