TheStage AI (@thestageai) Twitter Tweets • TwiCopy

TheStage AI

6 months ago

Wrong model = slow app. We help you pick the right one for your GPU. You can now explore a new Models section on our platform — with performance-optimized versions of open-source models like Qwen, Mistral, Llama, DeepSeek, and Flux. These models are tuned for real tasks: →

thumb_up_off_alt43

chat_bubble_outline2

repeat4

shareShare

TheStage AI

@thestageai

5 months ago

🥐 Bon appétit, developers. New Mistral AI models for self-hosting accelerated by TheStage AI: - New LLM: Mistral Small 24B - New VLM: Mistral Small 3.1 24B - Achieves speeds up to 90 tok/s on a single H100! - Available in our standard 4 tiers: S, M, L, XL Models follow

thumb_up_off_alt7

chat_bubble_outline0

repeat2

shareShare

TheStage AI

@thestageai

5 months ago

Bonjour, Paris 🇫🇷 Just wrapped 2 amazing days at @NVIDIA #GTCParis at Viva Technology — AI infra, agentic systems, and robots walking around. Great convos with ElevenLabs, @mistralai, Nebius, Recraft & more. Still in town — DM us if you wanna talk AI (IRL in Paris ☕🥐)

Bonjour, Paris 🇫🇷

Just wrapped 2 amazing days at @NVIDIA #GTCParis at <a href="/VivaTech/">Viva Technology</a> — AI infra, agentic systems, and robots walking around. Great convos with <a href="/elevenlabsio/">ElevenLabs</a>, @mistralai, <a href="/nebiusai/">Nebius</a>, <a href="/recraftai/">Recraft</a> & more. Still in town — DM us if you wanna talk AI (IRL in Paris ☕🥐)

thumb_up_off_alt10

chat_bubble_outline0

repeat3

shareShare

Kirill Solodskikh

@garchfather

5 months ago

▚▞▚▞ DATA LOG: AI EUROPE ▚▞▚▞ For years, AI talk was all Silicon Valley. After @NVIDIA #GTCParis, one thing became clear: Europe’s AI ecosystem has already kicked into high gear. 🇫🇷 Mistral AI’s dropping open weights that actually run. 🇩🇪 Aleph Alpha building native

thumb_up_off_alt10

chat_bubble_outline0

repeat4

shareShare

Kirill Solodskikh

@garchfather

5 months ago

⌁ EUROPE SIGNAL: ACTIVE ⌁ ↳ Want to accelerate your model’s inference? ↳ These guys sure do. ✦ Berlin: mapped next steps with our investors Christophe Maire and Lukas Erbguth of Atlantic Labs. ✦ Paris: NVIDIA GTC showed us what’s possible. ✦ Germany: more investor talks

thumb_up_off_alt9

chat_bubble_outline0

repeat2

shareShare

Kirill Solodskikh

@garchfather

5 months ago

Meet Elastic MusicGen Large — our optimized fork of AI at Meta's MusicGen, powered by ANNA (TheStage AI’s Automated Neural Network Accelerator): huggingface.co/TheStageAI/Ela… Ye ye used AI for vocals on "Bully," calling it the "next Auto-Tune." He switched up later, but tracks

thumb_up_off_alt185

chat_bubble_outline6

repeat17

shareShare

TheStage AI

@thestageai

4 months ago

🥗 What if you could generate 10,000+ AI images for $1 — each in just 1.2 seconds? We made it happen — 2.4× faster than most RTX 4090 pipelines, at a fraction of the cost. Check it out here: app.thestage.ai/models/FLUX.1-… ⟿ How? We tuned Black Forest Labs's FLUX.1 [schnell] model with our ANNA

thumb_up_off_alt51

chat_bubble_outline1

repeat1

shareShare

TheStage AI

@thestageai

3 months ago

AI engineers and researchers can now use our Quantization API to run accelerated LLMs, VLMs, and diffusion on NVIDIA and edge. Faster, cheaper, same quality. Built by our research lab. Docs & API live — early testers welcome.

thumb_up_off_alt50

chat_bubble_outline2

repeat7

shareShare

Kirill Solodskikh

@garchfather

3 months ago

Our TheStage AI team was happy to gain early access to the NVIDIA B200 from Nebius and establish benchmarking for our optimized diffusion models. We now fully support inference of optimized models on B200 across various AI applications - LLMs, VLMs, Text-to-Image,

thumb_up_off_alt9

chat_bubble_outline0

repeat1

shareShare

Kirill Solodskikh

@garchfather

3 months ago

Our research team took AI at Meta LLaMA-8B, quantized it with QLIP using post-training int8, applied SmoothQuant, and used pre-defined compiler-compatible NVIDIA configs. Why do this? Up to 2× fewer weights and 3.6× faster on one GPU. Try it with our simple Jupyter Notebook.

thumb_up_off_alt203

chat_bubble_outline6

repeat14

shareShare

Kirill Solodskikh

@garchfather

3 months ago

Can LLMs recognize ASCII art? Our tests show accelerated Elastic Models analyze line-by-line features and combine them using statistical patterns. Try it yourself with DeepSeek-Qwen-14B – 120 tok/s on H100, 40 tok/s on L40s, up to 3× faster. Free API token!

thumb_up_off_alt215

chat_bubble_outline9

repeat15

shareShare

Kirill Solodskikh

@garchfather

3 months ago

Self-hosted text-to-image on H100 with TheStage AI Elastic Models, accelerated from FLUX.1-schnell Black Forest Labs. Our fastest model S generates a high-quality image in 0.5 s. Precompiled and ready-to-deploy – minimal cold start. Tutorial + access token inside if you want to try.

thumb_up_off_alt132

chat_bubble_outline1

repeat11

shareShare

TheStage AI

@thestageai

3 months ago

Imagine paying $30 for 10k images when Salad Cloud + ANNA does it for $1 💀 FLUX.1-schnell ~1.2 s/image, high-quality output ANNA auto-tunes models to balance speed and quality OpenAI-compatible API, fully self-hosted. Quick guide shows how to run your own endpoint

thumb_up_off_alt25

chat_bubble_outline1

repeat4

shareShare

Kirill Solodskikh

@garchfather

3 months ago

Quantization delivers speedup but can reduce quality. Our researchers prepared a tutorial showing how ANNA automatically quantizes Flux and accelerates it 2× while keeping quality high. Orig. model latency: 6.4 s. Check the link. DM or comment for early access.

thumb_up_off_alt63

chat_bubble_outline1

repeat6

shareShare

TheStage AI

@thestageai

3 months ago

For AI builders and researchers: get early access to QLIP + ANNA for DNN optimization and acceleration – cloud, self-host, edge. Get a free commercial license. Collaborate with us on research, integrate your algorithms, or simplify deployment. Limited spots – apply today ↓

thumb_up_off_alt41

chat_bubble_outline2

repeat5

shareShare

Kirill Solodskikh

@garchfather

3 months ago

🚀 Early access to ANNA: Automated NNs Accelerator now available! ✨ Get your access here: app.thestage.ai/contact Questions? DM or comment below! 💬 With ANNA, you can: 🔄 Simply upload your model, data, and desired metrics 🎛️ Fine-tune model size, latency, and quality with

thumb_up_off_alt9

chat_bubble_outline0

repeat1

shareShare

Kirill Solodskikh

@garchfather

3 months ago

How to measure the quality of text-to-image models? Our research team TheStage AI put together a comprehensive guide to check perceptual quality, sharpness, color, prompt alignment, and more. All the tricky image quality questions researchers usually ask are covered here↓

thumb_up_off_alt60

chat_bubble_outline0

repeat6

shareShare

Azim K

@quaz1m

2 months ago

Validation is a key step when compressing or accelerating models. It shows if the network still performs well. Our research team TheStage AI shared evaluation methods for sharpness, tone, color, object placement, and more

thumb_up_off_alt35

chat_bubble_outline1

repeat3

shareShare

TheStage AI

@thestageai

2 months ago

Excited to share our MLPerf Inference v5.1 results (MLCommons). We ran Stability AI SDXL on 8×H100 via Nebius with our stack, ANNA. 18.1 img/s in target quality range. Fast, reproducible, world-class performance from our team, submitted alongside top AI players ↓

thumb_up_off_alt33

chat_bubble_outline0

repeat5

shareShare