Harry Mellor (@hmellor_) 's Twitter Profile
Harry Mellor

@hmellor_

ML Engineer @huggingface maintaining @vllm_project, prev @graphcoreai, @uniofoxford

ID: 1570092590204129287

calendar_today14-09-2022 16:50:39

20 Tweet

111 Followers

21 Following

vLLM (@vllm_project) 's Twitter Profile Photo

The Hugging Face Transformers ↔️ vLLM integration just leveled up: Vision-Language Models are now supported out of the box! If the model is integrated into Transformers, you can now run it directly with vLLM. github.com/vllm-project/v… Great work Raushan Turganbay πŸ‘

The <a href="/huggingface/">Hugging Face</a> Transformers ↔️ <a href="/vllm_project/">vLLM</a> integration just leveled up: Vision-Language Models are now supported out of the box!  

If the model is integrated into Transformers, you can now run it directly with vLLM.

github.com/vllm-project/v…

Great work <a href="/RTurganbay/">Raushan Turganbay</a> πŸ‘
Lysandre (@lysandrejik) 's Twitter Profile Photo

The new transformers release comes w/ a surprise: kernels support ⚑️ It integrates deeply with precompiled kernels on the HF Hub. - opt-in, automatic kernels for your hardware and software - kernels like FA2/3 w/o compilation - community-built kernels, for inference & training

merve (@mervenoyann) 's Twitter Profile Photo

We have recently merged fast processors for many models, the speed-up in Qwen-VL series is πŸ”₯ you get speed-up up to 3x on CPU and 26x on GPU 🀯 you don't have to do anything, this is enabled by default πŸ₯³

We have recently merged fast processors for many models, the speed-up in Qwen-VL series is πŸ”₯

you get speed-up up to 3x on CPU and 26x on GPU 🀯

you don't have to do anything, this is enabled by default πŸ₯³
Aritra R G (@arig23498) 's Twitter Profile Photo

Did you know you can now run your own AI Job on the Hugging Face infrastructure? Introducing `hf jobs`, the latest addition to the Hugging Face CLI. A quick thread to get you all started! 🧡‡️

Did you know you can now run your own AI Job on the Hugging Face infrastructure?

Introducing `hf jobs`, the latest addition to the Hugging Face CLI.

A quick thread to get you all started! 🧡‡️
clem πŸ€— (@clementdelangue) 's Twitter Profile Photo

When Sam Altman told me at the AI summit in Paris that they were serious about releasing open-source models & asked what would be useful, I couldn’t believe it. But six months of collaboration later, here it is: Welcome to OSS-GPT on Hugging Face! It comes in two sizes, for both

When <a href="/sama/">Sam Altman</a> told me at the AI summit in Paris that they were serious about releasing open-source models &amp; asked what would be useful, I couldn’t believe it. 

But six months of collaboration later, here it is: Welcome to OSS-GPT on <a href="/huggingface/">Hugging Face</a>! It comes in two sizes, for both
dylan (@dylan_ebert_) 's Twitter Profile Photo

OpenAI just released GPT-OSS: An Open Source Language Model on Hugging Face Open source meaning: πŸ’Έ Free πŸ”’ Private πŸ”§ Customizable

clem πŸ€— (@clementdelangue) 's Twitter Profile Photo

And just like that, OpenAI gpt-oss is now the number one trending model on Hugging Face, out of almost 2M open models πŸš€ People sometimes forget that they've already transformed the field: GPT-2, released back in 2019 is HF's most downloaded text-generation model ever, and

And just like that, <a href="/OpenAI/">OpenAI</a> gpt-oss is now the number one trending model on <a href="/huggingface/">Hugging Face</a>, out of almost 2M open models πŸš€

People sometimes forget that they've already transformed the field: GPT-2, released back in 2019 is HF's most downloaded text-generation model ever, and
Sergio Paniego (@sergiopaniego) 's Twitter Profile Photo

Want to deploy open models using vLLM as the inference engine? We just released a step-by-step guide on how to do it with Hugging Face Inference Endpoints, now available in the vLLM docs. let the gpus go brrr

Want to deploy open models using vLLM as the inference engine?
We just released a step-by-step guide on how to do it with <a href="/huggingface/">Hugging Face</a> Inference Endpoints, now available in the vLLM docs.

let the gpus go brrr
Harry Mellor (@hmellor_) 's Twitter Profile Photo

I've been wanting to do this for a really long time... vLLM is now fully formatted using ruff! πŸš€ This change makes the codebase more readable and uses stronger tooling to keep it that way. Kudos to Python Software Foundation for the Black code format and to Astral for ruff!

I've been wanting to do this for a really long time... <a href="/vllm_project/">vLLM</a> is now fully formatted using ruff! πŸš€

This change makes the codebase more readable and uses stronger tooling to keep it that way.

Kudos to <a href="/ThePSF/">Python Software Foundation</a> for the Black code format and to <a href="/astral_sh/">Astral</a> for ruff!
Harry Mellor (@hmellor_) 's Twitter Profile Photo

It's not as exciting as BERT support... but the Hugging Face Transformers backend for vLLM now supports mixture-of-expert (MoE) models at full speed! πŸš€ Install both packages from source and take it for a spin!

It's not as exciting as BERT support... but the <a href="/huggingface/">Hugging Face</a> Transformers backend for <a href="/vllm_project/">vLLM</a>  now supports mixture-of-expert (MoE) models at full speed! πŸš€

Install both packages from source and take it for a spin!
vLLM (@vllm_project) 's Twitter Profile Photo

πŸš€ Excited to share our work on batch-invariant inference in vLLM! Now you can get identical results regardless of batch size with just one flag: VLLM_BATCH_INVARIANT=1 No more subtle differences between bs=1 and bs=N (including prefill!). Let's dive into how we built this πŸ§΅πŸ‘‡

πŸš€ Excited to share our work on batch-invariant inference in vLLM! 
Now you can get identical results regardless of batch size with just one flag: VLLM_BATCH_INVARIANT=1
No more subtle differences between bs=1 and bs=N (including prefill!). Let's dive into how we built this πŸ§΅πŸ‘‡
cΓ©lina (@hanouticelina) 's Twitter Profile Photo

πŸ”₯ We're thrilled to announce πš‘πšžπšπšπš’πš—πšπšπšŠπšŒπšŽ_πš‘πšžπš‹ v1.0! After five years of development, this foundational release is packed with A fully modernized HTTP backend and a complete, from-the-ground-up CLI revamp! $ pip install huggingface_hub --upgrade 🧡highly recommend

πŸ”₯ We're thrilled to announce πš‘πšžπšπšπš’πš—πšπšπšŠπšŒπšŽ_πš‘πšžπš‹ v1.0!

After five years of development, this foundational release is packed with A fully modernized HTTP backend and a complete, from-the-ground-up CLI revamp!

$ pip install huggingface_hub --upgrade

🧡highly recommend
Harry Mellor (@hmellor_) 's Twitter Profile Photo

If you missed my talk at Ray Summit last week, fear not! I'll be giving it again in Paris at next week's vLLM meetup πŸŽ™οΈ