Tyler Michael Smith (@tms_jr) 's Twitter Profile
Tyler Michael Smith

@tms_jr

High Performance Computing @neuralmagic | Committer @vllm_project | PhD @UTAustin | Music Enjoyer

ID: 378399509

calendar_today23-09-2011 04:11:08

1,1K Tweet

131 Takipçi

293 Takip Edilen

oldfriend99 (@oldfriend99) 's Twitter Profile Photo

The surface of Mars is covered with vast areas. Some of the areas that have been found on Mars span 600,000 square miles — that's over twice the size of Texas

The surface of Mars is covered with vast areas. Some of the areas that have been found on Mars span 600,000 square miles — that's over twice the size of Texas
Red Hat AI (@redhat_ai) 's Twitter Profile Photo

Neural Magic is expanding to GPUs! Complementing our existing efforts with CPUs and model compression, we just launched nm-vllm, our initial community release to support GPU inference serving for LLMs. github.com/neuralmagic/nm… Details 👇

Delip Rao e/σ (@deliprao) 's Twitter Profile Photo

The ML gods will punish you for hubris if you fail to test every small change incrementally, regardless of your experience with what you are doing.

Red Hat AI (@redhat_ai) 's Twitter Profile Photo

EXCITING NEWS: Neural Magic and Anyscale contributed FP8 quantization support to the vLLM, making LLM inference more efficient. FP8 reduces latency on NVIDIA GPUs by 2x with >99% accuracy preservation. Cheers to NVIDIA AI Developer for validating our results. 1/6

EXCITING NEWS: Neural Magic and <a href="/anyscalecompute/">Anyscale</a> contributed FP8 quantization support to the <a href="/vllm_project/">vLLM</a>, making LLM inference more efficient.

FP8 reduces latency on NVIDIA GPUs by 2x with &gt;99% accuracy preservation.

Cheers to <a href="/NVIDIAAIDev/">NVIDIA AI Developer</a> for validating our results.

1/6
Red Hat AI (@redhat_ai) 's Twitter Profile Photo

🎉 Exciting news! Tyler Smith, one of our many talented engineers, is now Neural Magic's 3rd vLLM project committer! Check out Tyler's contributions: github.com/tlrmchlsmth. We’re proud to be a leading contributor to vLLM. 🚀 Cheers to Tyler and the team!

Tyler Michael Smith (@tms_jr) 's Twitter Profile Photo

Join if you want to find out about how we're using CUTLASS to support quantization in vLLM -- specifically w8a8 for compute speedups, a deep dive into how we handle zero points for int8 asymmetric quantization, and how we put it all together to support FP8 Llama 3.1 405b.

Tyler Michael Smith (@tms_jr) 's Twitter Profile Photo

me: "looks like i need to calculate the variance of this distributed tensor -- what's that called again? oh! Welford's online algorithm" my brain for the next 3 days: "Wilford Brimley's online algorithm"

me: "looks like i need to calculate the variance of this distributed tensor -- what's that called again? oh! Welford's online algorithm" 

my brain for the next 3 days: "Wilford Brimley's online algorithm"
vLLM (@vllm_project) 's Twitter Profile Photo

A month ago, we announced our performance roadmap. Today, we are happy to share that the latest release achieves 🚀2.7x higher throughput and is 5x faster for output latency on Llama 8B, and 1.8x higher throughput and 2x faster on Llama 70B for H100s. blog.vllm.ai/2024/09/05/per…

Red Hat AI (@redhat_ai) 's Twitter Profile Photo

Last week's vLLM office hours recording is ready! 🎥 Tyler Michael Smith showed how to use NVIDIA CUTLASS for high-performance inference in vLLM. We also explored the exciting vLLM v0.6.0 updates that led to a 2.7x throughput boost and 5x latency improvement. Recording & slides 👇

Last week's vLLM office hours recording is ready! 🎥 <a href="/tms_jr/">Tyler Michael Smith</a> showed how to use NVIDIA CUTLASS for high-performance inference in <a href="/vllm_project/">vLLM</a>. We also explored the exciting vLLM v0.6.0 updates that led to a 2.7x throughput boost and 5x latency improvement. Recording &amp; slides 👇
Marc Sun (@_marcsun) 's Twitter Profile Photo

Quantization update! Transformers is now compatible with models quantized with llm-compressor library from vLLM or models in compressed-tensors format. This means that you can also enjoy high quality quantized models from the Red Hat AI (formerly Neural Magic) team!

Tyler Michael Smith (@tms_jr) 's Twitter Profile Photo

Read to learn about Machete, which will serve as a foundation for mixed-input quantized GEMMs on NVIDIA GPUs (Hopper and later!) inside of vLLM Excellent work and stellar animations by Lucas Wilkinson (github.com/LucasWilkinson)

roon (@tszzl) 's Twitter Profile Photo

a fact of the world that we have to live with: models when “jailbroken” seem to have a distinct personality and artistic capability well beyond anything they produce in their default mood this might be the most important alignment work in the world and is mostly done on discord

brian stevens (@addvin) 's Twitter Profile Photo

I’m thrilled to announce that Neural Magic has signed a definitive agreement to join forces with Red Hat, Inc. At Neural Magic our vision is that the future of AI is open, and we have been on a mission to enable enterprises to capture the powerful innovation from AI, while at

I’m thrilled to announce that Neural Magic has signed a definitive agreement to join forces with Red Hat, Inc.

At Neural Magic our vision is that the future of AI is open, and we have been on a mission to enable enterprises to capture the powerful innovation from AI, while at
NYC Sanitation (@nycsanitation) 's Twitter Profile Photo

In 1991, David Lynch showed the world the alienation and innate horror of a dirty street, directing this unforgettable anti-littering ad for the City of New York. RIP to a visionary filmmaker and a pioneer of the Trash Revolution.