Nick Comly US (@ncomly_nvidia) 's Twitter Profile
Nick Comly US

@ncomly_nvidia

ID: 1537914501818814464

calendar_today17-06-2022 21:46:17

3 Tweet

6 Followers

44 Following

NVIDIA AI Developer (@nvidiaaidev) 's Twitter Profile Photo

👀 Accelerate performance of AI at Meta Llama 4 Maverick and Llama 4 Scout using our optimizations in #opensource TensorRT-LLM.⚡ ✅ NVIDIA Blackwell B200 delivers over 42,000 tokens per second on Llama 4 Scout, over 32,000 tokens per seconds on Llama 4 Maverick. ✅ 3.4X more

👀 Accelerate performance of <a href="/AIatMeta/">AI at Meta</a> Llama 4 Maverick and Llama 4 Scout using our optimizations in #opensource TensorRT-LLM.⚡

✅ NVIDIA Blackwell B200 delivers over 42,000 tokens per second on Llama 4 Scout, over 32,000 tokens per seconds on Llama 4 Maverick.

✅ 3.4X more
NVIDIA AI Developer (@nvidiaaidev) 's Twitter Profile Photo

🎉 A new generation of the AI at Meta Llama models is here with Llama 4 Scout and Llama 4 Maverick.🦙 ⚡ Accelerated for TensorRT-LLM, you can achieve over 40K output tokens per second on NVIDIA Blackwell B200 GPUs. Tech blog to learn more ➡️ developer.nvidia.com/blog/nvidia-ac…

🎉 A new generation of the <a href="/AIatMeta/">AI at Meta</a> Llama models is here with Llama 4 Scout and Llama 4 Maverick.🦙

⚡ Accelerated for TensorRT-LLM, you can achieve over 40K output tokens per second on NVIDIA Blackwell B200 GPUs.

Tech blog to learn more ➡️ developer.nvidia.com/blog/nvidia-ac…
NVIDIA AI Developer (@nvidiaaidev) 's Twitter Profile Photo

🔢 ✨ Bring your data and try out the new Llama 4 Maverick and Scout multimodal, multilingual MoE models from AI at Meta. 🎉 Available now on the free multimodal playground for Llama 4 using our NVIDIA NIM demo environment on the API catalog. ➡️ build.nvidia.com/meta

Modal (@modal_labs) 's Twitter Profile Photo

When you're serving tokens in a chatbot, low latency is everything. We just optimized our TensorRT-LLM example to achieve a 4x speedup -- and wrote up the key steps we took. Read on for the tl;dr. We promise it'll be fast 😉

Baseten (@basetenco) 's Twitter Profile Photo

🚀 You can now use NVIDIA B200s on Baseten and get higher throughput, lower latency, and better cost per token! 🚀 From benchmarks on models like DeepSeek R1, Llama 4, and Qwen, we’re already seeing: • 5x higher throughput • Over 2x better cost per token • 38% lower latency

🚀 You can now use NVIDIA B200s on Baseten and get higher throughput, lower latency, and better cost per token! 🚀

From benchmarks on models like DeepSeek R1, Llama 4, and Qwen, we’re already seeing:

• 5x higher throughput
• Over 2x better cost per token
• 38% lower latency
vLLM (@vllm_project) 's Twitter Profile Photo

vLLM🤝🤗! You can now deploy any Hugging Face language model with vLLM's speed. This integration makes it possible for one consistent implementation of the model in HF for both training and inference. 🧵 blog.vllm.ai/2025/04/11/tra…

dstack (@dstackai) 's Twitter Profile Photo

TensorRT-LLM delivers fast, flexible LLM inference - but the full pipeline can be complex. This dstack example simplifies it end-to-end: build the container, convert the model, and deploy on any cloud or on-prem. Showcases both DeepSeek R1 and its distilled Llama version.

Baseten (@basetenco) 's Twitter Profile Photo

We’ve seen a lot of interest in B200s after our launch. Our lead DevRel, Philip Kiely, wrote a blog explaining some of their performance benefits and the components needed to build an inference platform on top of B200 GPUs. More details in 🧵

We’ve seen a lot of interest in B200s after our launch.

Our lead DevRel, <a href="/philip_kiely/">Philip Kiely</a>, wrote a blog explaining some of their performance benefits and the components needed to build an inference platform on top of B200 GPUs.

More details in 🧵
LMSYS Org (@lmsysorg) 's Twitter Profile Photo

Thank you NVIDIA AI Developer Nebius and DataCrunch_io for providing the development machines H100 and H200. Your support greatly contributed to the fast execution speed of SGLang's optimization!

NVIDIA AI Developer (@nvidiaaidev) 's Twitter Profile Photo

🎉 Huge congrats to LMSYS Org on 5x faster DeepSeek R1 performance on NVIDIA Hopper with disaggregated serving, large-scale expert parallelism, and more. Great to see collaboration across the industry to redefine what's possible on NVIDIA.

LMSYS Org (@lmsysorg) 's Twitter Profile Photo

The SGLang team is honored to receive recognition from the NVIDIA team for optimizing the performance of DeepSeek R1! 🤗26x speedup🚀SGLang rocks🚀