Pankaj Gupta (@defpan) 's Twitter Profile
Pankaj Gupta

@defpan

Co-founder @basetenco working on ML model performance

ID: 372020096

linkhttps://www.baseten.co/author/pankaj-gupta/ calendar_today11-09-2011 23:55:54

525 Tweet

259 Followers

906 Following

Baseten (@basetenco) 's Twitter Profile Photo

We're thrilled to be included in the #ForbesAI50! 🎉 Congratulations to everyone who made it, it's great to see so many of our customers and partners here too!

Baseten (@basetenco) 's Twitter Profile Photo

We have day 0 support for #Qwen3 by Alibaba Qwen on Baseten using SGLang. Qwen 3 235B's architecture benefits from both Tensor Parallelism and Expert Parallelism to run Attention and Sparse MoE efficiently across 4 or 8 H100 GPUs depending on quantization. More in 🧵

We have day 0 support for #Qwen3 by Alibaba Qwen on Baseten using SGLang.

Qwen 3 235B's architecture benefits from both Tensor Parallelism and Expert Parallelism to run Attention and Sparse MoE efficiently across 4 or 8 H100 GPUs depending on quantization.  

More in 🧵
Baseten (@basetenco) 's Twitter Profile Photo

"This is the thing about AI — you gotta burn the boats.” Our CEO Tuhin Srivastava sat down with Emma Cosgrove and the Business Insider team to discuss keeping pace with the constant hardware, software, and AI model drops.

"This is the thing about AI — you gotta burn the boats.” 

Our CEO Tuhin Srivastava sat down with Emma Cosgrove and the Business Insider team to discuss keeping pace with the constant hardware, software, and AI model drops.
Elias (@eliasfiz) 's Twitter Profile Photo

People told us they want Orpheus TTS in production. So we partnered with Baseten as our preferred inference provider! Baseten runs Orpheus with: •⁠ ⁠Low latency (<200 ms TTFB) •⁠ ⁠High throughput (up to 48 real-time streams per H100) •⁠ ⁠Secure, worldwide infra

People told us they want Orpheus TTS in production.

So we partnered with <a href="/basetenco/">Baseten</a> as our preferred inference provider!

Baseten runs Orpheus with:

•⁠  ⁠Low latency (&lt;200 ms TTFB)
•⁠  ⁠High throughput (up to 48 real-time streams per H100)
•⁠  ⁠Secure, worldwide infra
Philip Kiely (@philip_kiely) 's Twitter Profile Photo

Deploying and vibe checking Orpheus TTS, an open-source model for generating speech. Our implementation supports up to 48 concurrent real-time users per H100 GPU!

Baseten (@basetenco) 's Twitter Profile Photo

Congrats to our friends at Patronus AI on the new AI agent launch, Percival! Percival can fix other agents across 20+ common failure modes, a very necessary tool in the growing agent landscape. Check it out.

Baseten (@basetenco) 's Twitter Profile Photo

🚀 We've been heads down for months, and now it's finally launch week. Today, we’re releasing our new brand. We believe inference is the foundation of all AI going forward. That's what our new look is all about: 𝗕𝗮𝘀𝗲𝘁𝗲𝗻 𝗶𝘀 𝘁𝗵𝗲 𝗯𝘂𝗶𝗹𝗱𝗶𝗻𝗴 𝗯𝗹𝗼𝗰𝗸𝘀 𝗼𝗳

Baseten (@basetenco) 's Twitter Profile Photo

🚀 Our "technical" marketer might not be looped in, but today is our biggest launch day yet. We're introducing two new products to serve the inference lifecycle: Model APIs and Training. Model APIs are frontier models running on the Baseten Inference Stack, purpose-built for

Baseten (@basetenco) 's Twitter Profile Photo

Our secret sauce? The Baseten Inference Stack. It consists of two core layers: the Inference Runtime and Inference-optimized Infrastructure. Our engineers break down all the levers we pull to optimize each layer in our new white paper.

Our secret sauce? The Baseten Inference Stack. 

It consists of two core layers: the Inference Runtime and Inference-optimized Infrastructure. Our engineers break down all the levers we pull to optimize each layer in our new white paper.
Baseten (@basetenco) 's Twitter Profile Photo

Congrats to our friends at Retool! Agents are a game-changer for automating repetitive tasks (and Retool has automated over 100M hours of labor already). We're thrilled to power Retool Agents with our Model APIs, which support tool usage out of the box!

rime (@rimelabs) 's Twitter Profile Photo

Big news: Rime has raised a $5.5M seed round! 💸💸 We're building the most expressive, lifelike AI voices for real-time conversations, voices that sound truly human. Led by Unusual Ventures with support from Founders You Should Know, Cadenza, and incredible angels like Michael

Big news: Rime has raised a $5.5M seed round! 💸💸

We're building the most expressive, lifelike AI voices for real-time conversations, voices that sound truly human.

Led by Unusual Ventures with support from Founders You Should Know, Cadenza, and incredible angels like Michael
Baseten (@basetenco) 's Twitter Profile Photo

New DeepSeek just dropped. Proud to serve the fastest DeepSeek R1 0528 inference on OpenRouter (#1 on TTFT and TPS) with our Model APIs.

New DeepSeek just dropped.

Proud to serve the fastest DeepSeek R1 0528 inference on OpenRouter (#1 on TTFT and TPS) with our Model APIs.
Wispr Flow (@wisprflow) 's Twitter Profile Photo

It's official — Wispr Flow is now live on the iPhone App Store! We built the first immersive voice keyboard that lets you dictate with incredible accuracy anywhere — 5x faster than typing. Delightful. Effortless. Intelligent. Our mission is to change how people interact with

Baseten (@basetenco) 's Twitter Profile Photo

We’re excited to partner with oxen.ai on their fine-tuning launch. It’s almost too easy — zero-code fine-tuning, from dataset to custom model in a few clicks.

Ian Cairns (@cairns) 's Twitter Profile Photo

🎙️ New Deployed episode with Zed founder Nathan Sobo is live! Nathan's been building better code editors for 10+ years. Now Zed has some of the most impressive agent AI editing features (including real-time streaming edits). His number one piece of advice: "Automate your

Google Cloud (@googlecloud) 's Twitter Profile Photo

AI inference matters. Baseten's revolutionary AI infrastructure platform, built on Google Cloud, optimizes processing even for massive models, gets your AI products to market 50% faster, and slashes costs with 90% savings compared to endpoint vendors ↓

Baseten (@basetenco) 's Twitter Profile Photo

Our customers run AI products where every millisecond and request matter. Over the years, we found fundamental limitations in traditional deployment approaches — single points of failure, regional and cloud-specific capacity constraints, and the operational headache of managing

Baseten (@basetenco) 's Twitter Profile Photo

Forward deployed engineers (FDEs) are core to our company. They work directly with customers, contribute to product development, and shape our roadmap. Vlad, our Head of FDE, wrote a blog to break down what makes FDE special, when to use FDEs, and how to scale a successful team.

Forward deployed engineers (FDEs) are core to our company. They work directly with customers, contribute to product development, and shape our roadmap.

Vlad, our Head of FDE, wrote a blog to break down what makes FDE special, when to use FDEs, and how to scale a successful team.
Baseten (@basetenco) 's Twitter Profile Photo

We're excited to introduce the Baseten Performance Client, a new open-source Python library for up to 12x higher throughput for high-volume embedding tasks! Stand up a new vector database, preprocess text, and run massive workloads in <2 minutes (vs. 15+ with AsyncOpenAI).

We're excited to introduce the Baseten Performance Client, a new open-source Python library for up to 12x higher throughput for high-volume embedding tasks!

Stand up a new vector database, preprocess text, and run massive workloads in &lt;2 minutes (vs. 15+ with AsyncOpenAI).