Kirill Solodskikh (@garchfather) 's Twitter Profile
Kirill Solodskikh

@garchfather

Almost Phd, Almost Founder, Almost Team Lead, Almost Successful, married.

@TheStageAI Co-founder, CEO/CTO, ex Huawei P50 AI cameras

ID: 1577397163189014538

linkhttp://thestage.ai calendar_today04-10-2022 20:36:27

143 Tweet

215 Followers

757 Following

Kirill Solodskikh (@garchfather) 's Twitter Profile Photo

We updated TheWhisper, our open source speech-to-text engine for self-hosted/on-device use. It now supports NVIDIA H100, L40S, RTX 4090, and RTX 5090. Benchmarks vs other Whisper libs show the best Time to First Token and Real-Time Factor. Try it

TheStage AI (@thestageai) 's Twitter Profile Photo

Significant speed and size gains in model inference are possible without hurting output quality. ANNA is our PyTorch framework for automated model acceleration, a new way to think about MLOps. Smaller ckpts, lower cost, faster inference, no retrain. Test demo or request access

Kirill Solodskikh (@garchfather) 's Twitter Profile Photo

There are a lot of releases on ASR! One of them is open-weight and with optimized Apple inference engines. github.com/TheStageAI/The…

Kirill Solodskikh (@garchfather) 's Twitter Profile Photo

Good weekend! I spent time testing our releases more extensively and writing usage guides during my tests. Suddenly Akshat Bubna and Charles 🎉 Frye from Modal liked my notebook. While testing TheWhisper with Azim K, I found that Mati Staniszewski started following me! Quietly motivating!

TheStage AI (@thestageai) 's Twitter Profile Photo

Are you a big fan of jacket potato? This is an open-source, real-time multilingual ASR for live speech. It stays robust in heavy noise – even at SNR 0 dB. That’s why it understands speech where people struggle to hear. Use it for transcription, research, and multilingual apps

TheStage AI (@thestageai) 's Twitter Profile Photo

Proud to team up with Brilliant Labs and Neuphonic on Halo’s on-device privacy engine. Coming to Brilliant Labs’ Halo smart glasses: real-time voice + vision, POV stays private. ANNA + GPU/NPU SDK + memory manager for wake word, STT, TTS, diarization. SDK demo 👇

TheStage AI (@thestageai) 's Twitter Profile Photo

How do you make text-to-music run in real time in production? The model has to keep audio generation ahead of playback. Our new case study with Mirelo.AI shows how inference optimization delivered up to 2.4х higher throughput. See the full case study ↓

Kirill Solodskikh (@garchfather) 's Twitter Profile Photo

Open-source experiments dashboard for AI researchers. Cool comparison overlays across modalities. What add next? S3 integration, authentication, model registry? github.com/TheStageAI/Spi…

Open-source experiments dashboard for AI researchers. Cool comparison overlays across modalities. What add next? S3 integration, authentication, model registry?

github.com/TheStageAI/Spi…
Kirill Solodskikh (@garchfather) 's Twitter Profile Photo

Actually, comparing 1-bit with 16-bit has no sense. Everyone is using 4-bit weights with MLX. And the speed will be around 150-180 tok/s on M4 Pro. Moreover, 4-bit quantization in MLX can be done as block quantization what preserve quality for the most cases.

Kirill Solodskikh (@garchfather) 's Twitter Profile Photo

Self-hosted AGI starts with inference infra teams can actually run. Well. Elastic Models v0.2.0 is much more self-serve: world’s fastest whisper-large-v3-turbo, Wan2.2 generating 5s of video in 34s on H100, and instant FLUX LoRA switching. Explore v0.2.0

TheStage AI (@thestageai) 's Twitter Profile Photo

Beyoncé heard cursing. TheWhisper heard Arsenal. The fastest Whisper in the world. Open-source real-time ASR. Top 5 on OpenASR benchmarks. 1800 RTFx. Built for live captions, transcription, and voice apps. See the repo