@embeddedllm : 🚨 vLLM Blog Alert! @vllm_project introduces PTPC-FP8 quantization on AMD ROCm, delivering near-BF16 accuracy at FP8 speeds. Run LLMs faster on @AMD MI3500X GPUs – no pre-quantization required! Why PTPC-FP8 rocks: - Per-Token Activation Scaling: Each token gets its own scaling • TwiCopy

EmbeddedLLM

@embeddedllm

+ Follow

Your open-source AI ally. We specialize in integrating LLM into your business.

ID: 1716394660636295168

calendar_today23-10-2023 10:02:43

303 Tweet

621 Followers

1,1K Following

EmbeddedLLM

@embeddedllm

8 months ago

🚨 vLLM Blog Alert! vLLM introduces PTPC-FP8 quantization on AMD ROCm, delivering near-BF16 accuracy at FP8 speeds. Run LLMs faster on AMD MI300X GPUs – no pre-quantization required! Why PTPC-FP8 rocks: - Per-Token Activation Scaling: Each token gets its own scaling

thumb_up_off_alt47

chat_bubble_outline0

repeat13

shareShare