BentoML - Infrastructure for Building AI Systems (@bentomlai) 's Twitter Profile
BentoML - Infrastructure for Building AI Systems

@bentomlai

๐Ÿฑ Build scalable AI systems with unparalleled speed, on-prem or any cloud.

Join the Bento community ๐Ÿ‘‰ l.bentoml.com/join-slack

ID: 867790559938662400

linkhttps://bentoml.com/ calendar_today25-05-2017 17:12:47

601 Tweet

2,2K Takipรงi

196 Takip Edilen

BentoML - Infrastructure for Building AI Systems (@bentomlai) 's Twitter Profile Photo

#BentoFriday ๐Ÿฑ โ€” Lifecycle Hooks in #BentoML When deploying real-world #AI services, the work doesnโ€™t end at inference. You also need to manage everything that happens before, during, and after a request hits your model. In practice, that often means: ๐Ÿ”ง Setting up global

#BentoFriday ๐Ÿฑ โ€” Lifecycle Hooks in #BentoML

When deploying real-world #AI services, the work doesnโ€™t end at inference. You also need to manage everything that happens before, during, and after a request hits your model.

In practice, that often means:
๐Ÿ”ง Setting up global
BentoML - Infrastructure for Building AI Systems (@bentomlai) 's Twitter Profile Photo

Enterprises canโ€™t scale #AI inference without compromises. The ๐—š๐—ฃ๐—จ ๐—–๐—”๐—ฃ ๐—ง๐—ต๐—ฒ๐—ผ๐—ฟ๐—ฒ๐—บ says you canโ€™t have all three at once: ๐Ÿ”’ ๐—–๐—ผ๐—ป๐˜๐—ฟ๐—ผ๐—น over your models & data and compliance โšก ๐—”๐˜ƒ๐—ฎ๐—ถ๐—น๐—ฎ๐—ฏ๐—ถ๐—น๐—ถ๐˜๐˜† to scale on demand when traffic spikes ๐Ÿ’ฐ ๐—ฃ๐—ฟ๐—ถ๐—ฐ๐—ฒ that keeps

Enterprises canโ€™t scale #AI inference without compromises.

The ๐—š๐—ฃ๐—จ ๐—–๐—”๐—ฃ ๐—ง๐—ต๐—ฒ๐—ผ๐—ฟ๐—ฒ๐—บ says you canโ€™t have all three at once:
๐Ÿ”’ ๐—–๐—ผ๐—ป๐˜๐—ฟ๐—ผ๐—น over your models & data and compliance
โšก ๐—”๐˜ƒ๐—ฎ๐—ถ๐—น๐—ฎ๐—ฏ๐—ถ๐—น๐—ถ๐˜๐˜† to scale on demand when traffic spikes
๐Ÿ’ฐ ๐—ฃ๐—ฟ๐—ถ๐—ฐ๐—ฒ that keeps
BentoML - Infrastructure for Building AI Systems (@bentomlai) 's Twitter Profile Photo

๐Ÿš€ Self-host models with #Triton + #BentoML! NVIDIA Triton Inference Server is a powerful open-source tool for serving models from major ML frameworks like ONNX, PyTorch, and TensorFlow. This project wraps Triton with BentoML, making it easy to: ๐ŸŽฏ Package custom models as

๐Ÿš€ Self-host models with #Triton + #BentoML!

<a href="/nvidia/">NVIDIA</a> Triton Inference Server is a powerful open-source tool for serving models from major ML frameworks like ONNX, PyTorch, and TensorFlow.

This project wraps Triton with BentoML, making it easy to:
๐ŸŽฏ Package custom models as
BentoML - Infrastructure for Building AI Systems (@bentomlai) 's Twitter Profile Photo

#BentoFriday ๐Ÿฑ โ€” 20x Faster Iteration with BentoML Codespaces Modern #AI apps like #RAG or voice agents often require multiple powerful GPUs and complex dependencies. This often leads to: โŒ Painstaking delays with each code change โŒย Challenging environment setups โŒ

BentoML - Infrastructure for Building AI Systems (@bentomlai) 's Twitter Profile Photo

๐Ÿš€ Build CI/CD pipelines for #AI services with #BentoML + #GitHubActions Automate everything with pipelines that: โœ… Deploy services to #BentoCloud โœ… Trigger on code or deployment config changes โœ… Wait until the service is ready โœ… Run test inference ๐Ÿ“˜ Step-by-step guide:

๐Ÿš€ Build CI/CD pipelines for #AI services with #BentoML + #GitHubActions

Automate everything with pipelines that:
โœ… Deploy services to #BentoCloud
โœ… Trigger on code or deployment config changes
โœ… Wait until the service is ready
โœ… Run test inference

๐Ÿ“˜ Step-by-step guide:
BentoML - Infrastructure for Building AI Systems (@bentomlai) 's Twitter Profile Photo

Want to self-host model inference in production? Start with the right model. Weโ€™ve put together a series exploring popular open-source models. Ready to deploy with #BentoML ๐Ÿฑ ๐Ÿ—ฃ๏ธย Text-to-Speech bentoml.com/blog/exploringโ€ฆ ๐Ÿ–ผ๏ธย Image Generation bentoml.com/blog/a-guide-tโ€ฆ ๐Ÿง ย Embedding

Want to self-host model inference in production? Start with the right model.

Weโ€™ve put together a series exploring popular open-source models. Ready to deploy with #BentoML ๐Ÿฑ
๐Ÿ—ฃ๏ธย Text-to-Speech bentoml.com/blog/exploringโ€ฆ
๐Ÿ–ผ๏ธย Image Generation bentoml.com/blog/a-guide-tโ€ฆ
๐Ÿง ย Embedding
BentoML - Infrastructure for Building AI Systems (@bentomlai) 's Twitter Profile Photo

#BentoFriday ๐Ÿฑ โ€” Inference Context with ๐˜ฃ๐˜ฆ๐˜ฏ๐˜ต๐˜ฐ๐˜ฎ๐˜ญ.๐˜Š๐˜ฐ๐˜ฏ๐˜ต๐˜ฆ๐˜น๐˜ต Building #AI/ML APIs isnโ€™t just about calling a model. You need a clean, reliable way to customize your inference service. ๐˜ฃ๐˜ฆ๐˜ฏ๐˜ต๐˜ฐ๐˜ฎ๐˜ญ.๐˜Š๐˜ฐ๐˜ฏ๐˜ต๐˜ฆ๐˜น๐˜ต is one of those abstractions in #BentoML that gives

#BentoFriday ๐Ÿฑ โ€” Inference Context with ๐˜ฃ๐˜ฆ๐˜ฏ๐˜ต๐˜ฐ๐˜ฎ๐˜ญ.๐˜Š๐˜ฐ๐˜ฏ๐˜ต๐˜ฆ๐˜น๐˜ต

Building #AI/ML APIs isnโ€™t just about calling a model. You need a clean, reliable way to customize your inference service.

๐˜ฃ๐˜ฆ๐˜ฏ๐˜ต๐˜ฐ๐˜ฎ๐˜ญ.๐˜Š๐˜ฐ๐˜ฏ๐˜ต๐˜ฆ๐˜น๐˜ต is one of those abstractions in #BentoML that gives
BentoML - Infrastructure for Building AI Systems (@bentomlai) 's Twitter Profile Photo

๐Ÿš€ DeepSeek-R1-0528 just landed! ๐Ÿ” Still no official word โ€” no model card, no benchmarks. #DeepSeek being DeepSeek, as always ๐Ÿ˜… โœ… Good news: #BentoML already supports it. ๐Ÿ‘‰ Deploy it now with our updated example: github.com/bentoml/BentoVโ€ฆ ๐Ÿ‘€ Follow for more updates!

๐Ÿš€ DeepSeek-R1-0528 just landed!

๐Ÿ” Still no official word โ€” no model card, no benchmarks. #DeepSeek being DeepSeek, as always ๐Ÿ˜…

โœ… Good news: #BentoML already supports it.
๐Ÿ‘‰ Deploy it now with our updated example: github.com/bentoml/BentoVโ€ฆ
๐Ÿ‘€ Follow for more updates!
BentoML - Infrastructure for Building AI Systems (@bentomlai) 's Twitter Profile Photo

๐Ÿ‘€ Update on DeepSeek-R1-0528 bentoml.com/blog/the-complโ€ฆ ๐Ÿง  Built on V3 Base ๐Ÿ“ˆ Major reasoning improvements ๐Ÿ›ก๏ธ Reduced hallucination โš™๏ธ Function calling + JSON output ๐Ÿ“ฆ Distilled Qwen3-8B beats much larger models ๐Ÿ“„ Still MIT See our updated blog โฌ‡๏ธ #AI #LLM #BentoML #OpenSource

BentoML - Infrastructure for Building AI Systems (@bentomlai) 's Twitter Profile Photo

Choosing the right #AI deployment platform? Check out our detailed comparison of #BentoML vs #VertexAI to help you make informed decisions. bentoml.com/blog/comparisoโ€ฆ ๐Ÿ” Hereโ€™s what we cover: โœ… Cloud infrastructure flexibility โœ… Scaling and performance โœ… Developer experience and

BentoML - Infrastructure for Building AI Systems (@bentomlai) 's Twitter Profile Photo

#BentoFriday ๐Ÿฑ โ€” Runtime Specs in Pure Python Deploying #AI services isnโ€™t just about your model code; itโ€™s also about getting the right runtime and making sure it is reproducible across environments. That might include: ๐Ÿ Python version ๐Ÿ–ฅ๏ธ OS & system packages ๐Ÿ“ฆ Python

#BentoFriday ๐Ÿฑ โ€” Runtime Specs in Pure Python

Deploying #AI services isnโ€™t just about your model code; itโ€™s also about getting the right runtime and making sure it is reproducible across environments. That might include:

๐Ÿ Python version
๐Ÿ–ฅ๏ธ OS &amp; system packages
๐Ÿ“ฆ Python
BentoML - Infrastructure for Building AI Systems (@bentomlai) 's Twitter Profile Photo

๐Ÿš€ #Magistral, Mistral AIโ€™s first reasoning model, is here and now deployable with #BentoML! This release features two variants: - Magistral Small: 24B parameter open-source version - Magistral Medium: Enterprise-grade, high-performance version Highlights of Magistral Small: ๐Ÿ”ง

BentoML - Infrastructure for Building AI Systems (@bentomlai) 's Twitter Profile Photo

Weโ€™re entering a new era for #LLMInference: moving beyond single-node optimizations to distributed serving strategies that unlock better performance, smarter resource utilization, and real cost savings. In this blog post, we break down the latest techniques for distributed LLM

Weโ€™re entering a new era for #LLMInference: moving beyond single-node optimizations to distributed serving strategies that unlock better performance, smarter resource utilization, and real cost savings.

In this blog post, we break down the latest techniques for distributed LLM
BentoML - Infrastructure for Building AI Systems (@bentomlai) 's Twitter Profile Photo

#BentoFriday ๐Ÿฑ โ€” Add a Web UI with Gradio Real-world #AI apps donโ€™t just need a model. They need interfaces users can interact with. But building a custom frontend is time-consuming and managing it separately from your backend adds unnecessary complexity. ๐Ÿ˜ตโ€๐Ÿ’ซ With #BentoML,

#BentoFriday ๐Ÿฑ โ€” Add a Web UI with <a href="/Gradio/">Gradio</a> 

Real-world #AI apps donโ€™t just need a model. They need interfaces users can interact with.

But building a custom frontend is time-consuming and managing it separately from your backend adds unnecessary complexity. ๐Ÿ˜ตโ€๐Ÿ’ซ

With #BentoML,
BentoML - Infrastructure for Building AI Systems (@bentomlai) 's Twitter Profile Photo

#BentoFriday ๐Ÿฑ โ€” WebSocket Endpoints Real-time #AI apps like voice assistants and live chatbots need more than just REST APIs. They need persistent, low-latency connections. โšก But spinning up a separate #WebSocket server just for that? โŒ Duplicated infra โŒ Complex routing

#BentoFriday ๐Ÿฑ โ€” WebSocket Endpoints

Real-time #AI apps like voice assistants and live chatbots need more than just REST APIs. They need persistent, low-latency connections. โšก

But spinning up a separate #WebSocket server just for that?
โŒ Duplicated infra
โŒ Complex routing
BentoML - Infrastructure for Building AI Systems (@bentomlai) 's Twitter Profile Photo

#BentoFriday ๐Ÿฑ โ€” Lightning-Fast Model Loading When deploying #LLM services, slow model loading can cripple your cold starts. ๐Ÿšจ This leads to delayed autoscaling, missed requests during traffic spikes, and a poor user experience. #BentoML supercharges model loading with speed

#BentoFriday ๐Ÿฑ โ€” Lightning-Fast Model Loading

When deploying #LLM services, slow model loading can cripple your cold starts. ๐Ÿšจ This leads to delayed autoscaling, missed requests during traffic spikes, and a poor user experience.

#BentoML supercharges model loading with speed
BentoML - Infrastructure for Building AI Systems (@bentomlai) 's Twitter Profile Photo

Your LLM demo wowed the execs. Itโ€™s time to ship it. Now comes the hard part: scaling inference efficiently, optimizing inference performance, and managing heterogeneous LLM workflows and compute environments. But this is not an easy feat. It requires a set of best practices

Your LLM demo wowed the execs. Itโ€™s time to ship it.

Now comes the hard part: scaling inference efficiently, optimizing inference performance, and managing heterogeneous LLM workflows and compute environments.

But this is not an easy feat. It requires a set of best practices
BentoML - Infrastructure for Building AI Systems (@bentomlai) 's Twitter Profile Photo

๐Ÿš€ Try Canary Deployments on our #BentoInferencePlatform! Donโ€™t risk regressions or broken user experiences when rolling out new model versions. โœ… Deploy multiple Bento versions at once ๐ŸŽฏ Smart traffic routing strategies ๐Ÿ“Š Real-time monitoring for different versions โฑ๏ธ

๐Ÿš€ Try Canary Deployments on our #BentoInferencePlatform! Donโ€™t risk regressions or broken user experiences when rolling out new model versions.

โœ… Deploy multiple Bento versions at once
๐ŸŽฏ Smart traffic routing strategies
๐Ÿ“Š Real-time monitoring for different versions
โฑ๏ธ
BentoML - Infrastructure for Building AI Systems (@bentomlai) 's Twitter Profile Photo

Modern #AI agents donโ€™t just generate text. They write and run code. But what happens when: โš ๏ธ A prompt injection triggers harmful logic? โš ๏ธ An agent runs unknown scripts from a Git repo? โš ๏ธ It connects to untrusted APIs? Letting agents run arbitrary code without guardrails is

Modern #AI agents donโ€™t just generate text. They write and run code. But what happens when:

โš ๏ธ A prompt injection triggers harmful logic?
โš ๏ธ An agent runs unknown scripts from a Git repo?
โš ๏ธ It connects to untrusted APIs?

Letting agents run arbitrary code without guardrails is