Harry Mellor (@hmellor_) Twitter Tweets • TwiCopy

Casper Hansen

6 months ago

vLLM is finally addressing a long-standing problem: startup times 35s -> 2s for CUDA graph capture is a great reduction!

thumb_up_off_alt612

chat_bubble_outline12

repeat43

shareShare

The Hugging Face Transformers ↔️ vLLM integration just leveled up: Vision-Language Models are now supported out of the box! If the model is integrated into Transformers, you can now run it directly with vLLM. github.com/vllm-project/v… Great work Raushan Turganbay 👏

The <a href="/huggingface/">Hugging Face</a> Transformers ↔️ <a href="/vllm_project/">vLLM</a> integration just leveled up: Vision-Language Models are now supported out of the box!

If the model is integrated into Transformers, you can now run it directly with vLLM.

github.com/vllm-project/v…

Great work <a href="/RTurganbay/">Raushan Turganbay</a> 👏

thumb_up_off_alt268

chat_bubble_outline7

repeat47

shareShare

Lysandre

@lysandrejik

5 months ago

The new transformers release comes w/ a surprise: kernels support ⚡️ It integrates deeply with precompiled kernels on the HF Hub. - opt-in, automatic kernels for your hardware and software - kernels like FA2/3 w/o compilation - community-built kernels, for inference & training

thumb_up_off_alt101

chat_bubble_outline7

repeat17

shareShare

merve

@mervenoyann

5 months ago

We have recently merged fast processors for many models, the speed-up in Qwen-VL series is 🔥 you get speed-up up to 3x on CPU and 26x on GPU 🤯 you don't have to do anything, this is enabled by default 🥳

thumb_up_off_alt328

chat_bubble_outline5

repeat25

shareShare

Aritra R G

@arig23498

5 months ago

Did you know you can now run your own AI Job on the Hugging Face infrastructure? Introducing `hf jobs`, the latest addition to the Hugging Face CLI. A quick thread to get you all started! 🧵⤵️

thumb_up_off_alt16

chat_bubble_outline3

repeat4

shareShare

clem 🤗

@clementdelangue

5 months ago

When Sam Altman told me at the AI summit in Paris that they were serious about releasing open-source models & asked what would be useful, I couldn’t believe it. But six months of collaboration later, here it is: Welcome to OSS-GPT on Hugging Face! It comes in two sizes, for both

When <a href="/sama/">Sam Altman</a> told me at the AI summit in Paris that they were serious about releasing open-source models & asked what would be useful, I couldn’t believe it.

But six months of collaboration later, here it is: Welcome to OSS-GPT on <a href="/huggingface/">Hugging Face</a>! It comes in two sizes, for both

thumb_up_off_alt2,2K

chat_bubble_outline91

repeat263

shareShare

Xuan-Son Nguyen

@ngxson

5 months ago

Welcome back, OpenAI ! Day-0 support llama.cpp with MXFP4, let it rock 🚀🤘

Welcome back, <a href="/OpenAI/">OpenAI</a> !

Day-0 support llama.cpp with MXFP4, let it rock 🚀🤘

thumb_up_off_alt30

chat_bubble_outline3

repeat6

shareShare

dylan

@dylan_ebert_

5 months ago

OpenAI just released GPT-OSS: An Open Source Language Model on Hugging Face Open source meaning: 💸 Free 🔒 Private 🔧 Customizable

thumb_up_off_alt214

chat_bubble_outline14

repeat39

shareShare

Harry Mellor

@hmellor_

5 months ago

Head over to gpt-oss.com to try the official demo of OpenAI's gpt-oss models powered by Hugging Face!

Head over to gpt-oss.com to try the official demo of <a href="/OpenAI/">OpenAI</a>'s gpt-oss models powered by <a href="/huggingface/">Hugging Face</a>!

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Vaibhav (VB) Srivastav

@reach_vb

5 months ago

For all you lazy lads! OpenAI's latest model in a short video by Dylan! 💥

thumb_up_off_alt54

chat_bubble_outline0

repeat7

shareShare

clem 🤗

@clementdelangue

5 months ago

And just like that, OpenAI gpt-oss is now the number one trending model on Hugging Face, out of almost 2M open models 🚀 People sometimes forget that they've already transformed the field: GPT-2, released back in 2019 is HF's most downloaded text-generation model ever, and

And just like that, <a href="/OpenAI/">OpenAI</a> gpt-oss is now the number one trending model on <a href="/huggingface/">Hugging Face</a>, out of almost 2M open models 🚀

People sometimes forget that they've already transformed the field: GPT-2, released back in 2019 is HF's most downloaded text-generation model ever, and

thumb_up_off_alt609

chat_bubble_outline22

repeat94

shareShare

dylan

@dylan_ebert_

5 months ago

Hugging Face Explained in 45 seconds

thumb_up_off_alt125

chat_bubble_outline2

repeat29

shareShare

Sergio Paniego

@sergiopaniego

3 months ago

Want to deploy open models using vLLM as the inference engine? We just released a step-by-step guide on how to do it with Hugging Face Inference Endpoints, now available in the vLLM docs. let the gpus go brrr

Want to deploy open models using vLLM as the inference engine?
We just released a step-by-step guide on how to do it with <a href="/huggingface/">Hugging Face</a> Inference Endpoints, now available in the vLLM docs.

let the gpus go brrr

thumb_up_off_alt22

chat_bubble_outline1

repeat10

shareShare

Harry Mellor

@hmellor_

3 months ago

I've been wanting to do this for a really long time... vLLM is now fully formatted using ruff! 🚀 This change makes the codebase more readable and uses stronger tooling to keep it that way. Kudos to Python Software Foundation for the Black code format and to Astral for ruff!

I've been wanting to do this for a really long time... <a href="/vllm_project/">vLLM</a> is now fully formatted using ruff! 🚀

This change makes the codebase more readable and uses stronger tooling to keep it that way.

Kudos to <a href="/ThePSF/">Python Software Foundation</a> for the Black code format and to <a href="/astral_sh/">Astral</a> for ruff!

thumb_up_off_alt16

chat_bubble_outline1

repeat1

shareShare

Harry Mellor

@hmellor_

3 months ago

It's not as exciting as BERT support... but the Hugging Face Transformers backend for vLLM now supports mixture-of-expert (MoE) models at full speed! 🚀 Install both packages from source and take it for a spin!

It's not as exciting as BERT support... but the <a href="/huggingface/">Hugging Face</a> Transformers backend for <a href="/vllm_project/">vLLM</a> now supports mixture-of-expert (MoE) models at full speed! 🚀

Install both packages from source and take it for a spin!

thumb_up_off_alt41

chat_bubble_outline1

repeat9

shareShare

vLLM

@vllm_project

3 months ago

🚀 Excited to share our work on batch-invariant inference in vLLM! Now you can get identical results regardless of batch size with just one flag: VLLM_BATCH_INVARIANT=1 No more subtle differences between bs=1 and bs=N (including prefill!). Let's dive into how we built this 🧵👇

thumb_up_off_alt276

chat_bubble_outline2

repeat43

shareShare

célina

@hanouticelina

2 months ago

🔥 We're thrilled to announce 𝚑𝚞𝚐𝚐𝚒𝚗𝚐𝚏𝚊𝚌𝚎_𝚑𝚞𝚋 v1.0! After five years of development, this foundational release is packed with A fully modernized HTTP backend and a complete, from-the-ground-up CLI revamp! $ pip install huggingface_hub --upgrade 🧵highly recommend

thumb_up_off_alt319

chat_bubble_outline8

repeat41

shareShare

Harry Mellor

@hmellor_

2 months ago

If you missed my talk at Ray Summit last week, fear not! I'll be giving it again in Paris at next week's vLLM meetup 🎙️

thumb_up_off_alt5

chat_bubble_outline0

repeat1

shareShare