BentoML - Infrastructure for Building AI Systems (@bentomlai) Twitter Tweets • TwiCopy

BentoML - Infrastructure for Building AI Systems

7 months ago

#BentoFriday 🍱 — Lifecycle Hooks in #BentoML When deploying real-world #AI services, the work doesn’t end at inference. You also need to manage everything that happens before, during, and after a request hits your model. In practice, that often means: 🔧 Setting up global

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

BentoML - Infrastructure for Building AI Systems

@bentomlai

7 months ago

Enterprises can’t scale #AI inference without compromises. The 𝗚𝗣𝗨 𝗖𝗔𝗣 𝗧𝗵𝗲𝗼𝗿𝗲𝗺 says you can’t have all three at once: 🔒 𝗖𝗼𝗻𝘁𝗿𝗼𝗹 over your models & data and compliance ⚡ 𝗔𝘃𝗮𝗶𝗹𝗮𝗯𝗶𝗹𝗶𝘁𝘆 to scale on demand when traffic spikes 💰 𝗣𝗿𝗶𝗰𝗲 that keeps

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

BentoML - Infrastructure for Building AI Systems

@bentomlai

7 months ago

🚀 Self-host models with #Triton + #BentoML! NVIDIA Triton Inference Server is a powerful open-source tool for serving models from major ML frameworks like ONNX, PyTorch, and TensorFlow. This project wraps Triton with BentoML, making it easy to: 🎯 Package custom models as

🚀 Self-host models with #Triton + #BentoML!

<a href="/nvidia/">NVIDIA</a> Triton Inference Server is a powerful open-source tool for serving models from major ML frameworks like ONNX, PyTorch, and TensorFlow.

This project wraps Triton with BentoML, making it easy to:
🎯 Package custom models as

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

BentoML - Infrastructure for Building AI Systems

@bentomlai

7 months ago

#BentoFriday 🍱 — 20x Faster Iteration with BentoML Codespaces Modern #AI apps like #RAG or voice agents often require multiple powerful GPUs and complex dependencies. This often leads to: ❌ Painstaking delays with each code change ❌ Challenging environment setups ❌

thumb_up_off_alt1

chat_bubble_outline0

repeat1

shareShare

BentoML - Infrastructure for Building AI Systems

@bentomlai

7 months ago

🚀 Build CI/CD pipelines for #AI services with #BentoML + #GitHubActions Automate everything with pipelines that: ✅ Deploy services to #BentoCloud ✅ Trigger on code or deployment config changes ✅ Wait until the service is ready ✅ Run test inference 📘 Step-by-step guide:

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

BentoML - Infrastructure for Building AI Systems

@bentomlai

7 months ago

Want to self-host model inference in production? Start with the right model. We’ve put together a series exploring popular open-source models. Ready to deploy with #BentoML 🍱 🗣️ Text-to-Speech bentoml.com/blog/exploring… 🖼️ Image Generation bentoml.com/blog/a-guide-t… 🧠 Embedding

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

BentoML - Infrastructure for Building AI Systems

@bentomlai

7 months ago

#BentoFriday 🍱 — Inference Context with 𝘣𝘦𝘯𝘵𝘰𝘮𝘭.𝘊𝘰𝘯𝘵𝘦𝘹𝘵 Building #AI/ML APIs isn’t just about calling a model. You need a clean, reliable way to customize your inference service. 𝘣𝘦𝘯𝘵𝘰𝘮𝘭.𝘊𝘰𝘯𝘵𝘦𝘹𝘵 is one of those abstractions in #BentoML that gives

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

BentoML - Infrastructure for Building AI Systems

@bentomlai

7 months ago

🚀 DeepSeek-R1-0528 just landed! 🔍 Still no official word — no model card, no benchmarks. #DeepSeek being DeepSeek, as always 😅 ✅ Good news: #BentoML already supports it. 👉 Deploy it now with our updated example: github.com/bentoml/BentoV… 👀 Follow for more updates!

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

BentoML - Infrastructure for Building AI Systems

@bentomlai

7 months ago

👀 Update on DeepSeek-R1-0528 bentoml.com/blog/the-compl… 🧠 Built on V3 Base 📈 Major reasoning improvements 🛡️ Reduced hallucination ⚙️ Function calling + JSON output 📦 Distilled Qwen3-8B beats much larger models 📄 Still MIT See our updated blog ⬇️ #AI #LLM #BentoML #OpenSource

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

BentoML - Infrastructure for Building AI Systems

@bentomlai

7 months ago

Choosing the right #AI deployment platform? Check out our detailed comparison of #BentoML vs #VertexAI to help you make informed decisions. bentoml.com/blog/compariso… 🔍 Here’s what we cover: ✅ Cloud infrastructure flexibility ✅ Scaling and performance ✅ Developer experience and

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

BentoML - Infrastructure for Building AI Systems

@bentomlai

6 months ago

#BentoFriday 🍱 — Runtime Specs in Pure Python Deploying #AI services isn’t just about your model code; it’s also about getting the right runtime and making sure it is reproducible across environments. That might include: 🐍 Python version 🖥️ OS & system packages 📦 Python

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

BentoML - Infrastructure for Building AI Systems

@bentomlai

6 months ago

🚀 #Magistral, Mistral AI’s first reasoning model, is here and now deployable with #BentoML! This release features two variants: - Magistral Small: 24B parameter open-source version - Magistral Medium: Enterprise-grade, high-performance version Highlights of Magistral Small: 🔧

thumb_up_off_alt9

chat_bubble_outline0

repeat0

shareShare

BentoML - Infrastructure for Building AI Systems

@bentomlai

6 months ago

We’re entering a new era for #LLMInference: moving beyond single-node optimizations to distributed serving strategies that unlock better performance, smarter resource utilization, and real cost savings. In this blog post, we break down the latest techniques for distributed LLM

thumb_up_off_alt8

chat_bubble_outline1

repeat2

shareShare

BentoML - Infrastructure for Building AI Systems

@bentomlai

6 months ago

#BentoFriday 🍱 — Add a Web UI with Gradio Real-world #AI apps don’t just need a model. They need interfaces users can interact with. But building a custom frontend is time-consuming and managing it separately from your backend adds unnecessary complexity. 😵‍💫 With #BentoML,

#BentoFriday 🍱 — Add a Web UI with <a href="/Gradio/">Gradio</a>

Real-world #AI apps don’t just need a model. They need interfaces users can interact with.

But building a custom frontend is time-consuming and managing it separately from your backend adds unnecessary complexity. 😵‍💫

With #BentoML,

thumb_up_off_alt6

chat_bubble_outline0

repeat2

shareShare

BentoML - Infrastructure for Building AI Systems

@bentomlai

6 months ago

🍱Spreading the word on #BentoML at #KubeCon + #CloudNativeCon 🙌 Big shoutout to FogDong

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

BentoML - Infrastructure for Building AI Systems

@bentomlai

6 months ago

#BentoFriday 🍱 — WebSocket Endpoints Real-time #AI apps like voice assistants and live chatbots need more than just REST APIs. They need persistent, low-latency connections. ⚡ But spinning up a separate #WebSocket server just for that? ❌ Duplicated infra ❌ Complex routing

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

BentoML - Infrastructure for Building AI Systems

@bentomlai

6 months ago

#BentoFriday 🍱 — Lightning-Fast Model Loading When deploying #LLM services, slow model loading can cripple your cold starts. 🚨 This leads to delayed autoscaling, missed requests during traffic spikes, and a poor user experience. #BentoML supercharges model loading with speed

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

BentoML - Infrastructure for Building AI Systems

@bentomlai

6 months ago

Your LLM demo wowed the execs. It’s time to ship it. Now comes the hard part: scaling inference efficiently, optimizing inference performance, and managing heterogeneous LLM workflows and compute environments. But this is not an easy feat. It requires a set of best practices

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

BentoML - Infrastructure for Building AI Systems

@bentomlai

6 months ago

🚀 Try Canary Deployments on our #BentoInferencePlatform! Don’t risk regressions or broken user experiences when rolling out new model versions. ✅ Deploy multiple Bento versions at once 🎯 Smart traffic routing strategies 📊 Real-time monitoring for different versions ⏱️

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

BentoML - Infrastructure for Building AI Systems

@bentomlai

5 months ago

Modern #AI agents don’t just generate text. They write and run code. But what happens when: ⚠️ A prompt injection triggers harmful logic? ⚠️ An agent runs unknown scripts from a Git repo? ⚠️ It connects to untrusted APIs? Letting agents run arbitrary code without guardrails is

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare