Marc Sun (@_marcsun) 's Twitter Profile
Marc Sun

@_marcsun

Machine Learning Engineer @huggingface Open Source team

ID: 1623720711591174146

calendar_today09-02-2023 16:29:25

458 Tweet

1,1K Takipçi

421 Takip Edilen

Eric Hartford (@cognitivecompai) 's Twitter Profile Photo

I was unable to quant DeepSeek-R1-0528 using llm-compressor - but I got it working on AutoAWQ, using mi300x generously lent to me by Hot Aisle. DeepSeek-R1-0528-AWQ will be published tomorrow.

I was unable to quant DeepSeek-R1-0528 using llm-compressor - but I got it working on AutoAWQ, using mi300x generously lent to me by <a href="/HotAisle/">Hot Aisle</a>.  DeepSeek-R1-0528-AWQ will be published tomorrow.
Tony Wu (@tonywu_71) 's Twitter Profile Photo

🚀 ColQwen2 just dropped in Transformers! 🤗 Say goodbye to brittle OCR pipelines — now you can retrieve documents directly in the visual space with just a few lines of code. Perfect for your visual RAG workflows. Smarter, simpler, faster. Let's dive in! 👇 (1/N 🧵)

🚀 ColQwen2 just dropped in Transformers! 🤗

Say goodbye to brittle OCR pipelines — now you can retrieve documents directly in the visual space with just a few lines of code. Perfect for your visual RAG workflows.

Smarter, simpler, faster. Let's dive in! 👇 (1/N 🧵)
Awni Hannun (@awnihannun) 's Twitter Profile Photo

The latest mlx-lm has a new dynamic quantization method (made with Angelos Katharopoulos). It consistently results in better model quality with no increase in size. Some perplexity results (lower is better) for a few Qwen3 base models:

The latest mlx-lm has a new dynamic quantization method (made with <a href="/angeloskath/">Angelos Katharopoulos</a>). It consistently results in better model quality with no increase in size. 

Some perplexity results (lower is better) for a few Qwen3 base models:
Unsloth AI (@unslothai) 's Twitter Profile Photo

We made a repo with 100+ Fine-tuning notebooks all in once place! Has guides & examples for: • Tool-calling, Classification, Synthetic data • BERT, TTS, Vision LLMs • GRPO, DPO, SFT, CPT • Dataprep, eval, saving • Llama, Qwen, Gemma, Phi, DeepSeek 🔗github.com/unslothai/note…

We made a repo with 100+ Fine-tuning notebooks all in once place!

Has guides &amp; examples for:
• Tool-calling, Classification, Synthetic data
• BERT, TTS, Vision LLMs
• GRPO, DPO, SFT, CPT
• Dataprep, eval, saving
• Llama, Qwen, Gemma, Phi, DeepSeek

🔗github.com/unslothai/note…
Alex Zhang (@a1zhang) 's Twitter Profile Photo

super high alpha learning material just dropped: Nemo wrote up their design process + code for one of the fastest fp8 GEMM implementations in the entire the $100K AMD kernel challenge — enjoy :) 🔗: akashkarnatak.github.io/amd-challenge/

super high alpha learning material just dropped:

<a href="/xkxxhk/">Nemo</a> wrote up their design process + code for one of the fastest fp8 GEMM implementations in the entire the $100K AMD kernel challenge — enjoy :)

🔗: akashkarnatak.github.io/amd-challenge/
Albert Tseng (@tsengalb99) 's Twitter Profile Photo

📣Introducing our latest work: Yet Another Quantization Algorithm! YAQA directly minimizes the KL divergence to the original model during rounding, cutting it by >30% over prior PTQ methods and giving an even closer model than Google’s QAT on Gemma! 🤯 arxiv.org/abs/2505.22988👇

📣Introducing our latest work: Yet Another Quantization Algorithm!

YAQA directly minimizes the KL divergence to the original model during rounding, cutting it by &gt;30% over prior PTQ methods and giving an even closer model than Google’s QAT on Gemma! 🤯

arxiv.org/abs/2505.22988👇
Red Hat AI (@redhat_ai) 's Twitter Profile Photo

LLM Compressor just got way easier to use. You can now compress most LLMs directly from their Hugging Face model definition. No need to write custom wrappers. This new autowrapper supports 95% of multimodal and decoder models out of the box. Let’s break it down 🧵:

Han Guo (@hanguo97) 's Twitter Profile Photo

We know Attention and its linear-time variants, such as linear attention and State Space Models. But what lies in between? Introducing Log-Linear Attention with: - Log-linear time training - Log-time inference (in both time and memory) - Hardware-efficient Triton kernels

We know Attention and its linear-time variants, such as linear attention and State Space Models. But what lies in between?

Introducing Log-Linear Attention with:

- Log-linear time training
- Log-time inference (in both time and memory)
- Hardware-efficient Triton kernels
Sayak Paul (@risingsayak) 's Twitter Profile Photo

Bitsandbytes latest works with `torch.compile(fullgraph=True)` and you should put it to good use 🔥 For example, when applied to Flux, it beefs up the performance quite a bit. Code: gist.github.com/sayakpaul/0db9… Enjoy 🔥

Bitsandbytes latest works with `torch.compile(fullgraph=True)` and you should put it to good use 🔥

For example, when applied to Flux, it beefs up the performance quite a bit.

Code:
gist.github.com/sayakpaul/0db9…

Enjoy 🔥
Alex Zhang (@a1zhang) 's Twitter Profile Photo

More learning alpha for GPU / ML enthusiasts from the conclusion of our AMD x GPU MODE kernel writing competition Here's the write-up to the 🥈 solution (out of 163 other extremely talented teams) by Seb-v, which details how he refined his kernel! 🔗 below!

More learning alpha for GPU / ML enthusiasts from the conclusion of our <a href="/AMD/">AMD</a> x <a href="/GPU_MODE/">GPU MODE</a> kernel writing competition

Here's the write-up to the 🥈 solution (out of 163 other extremely talented teams) by Seb-v, which details how he refined his kernel!

🔗 below!
Lysandre (@lysandrejik) 's Twitter Profile Photo

Selecting any MCP Space through hf.co/mcp to use it in MCP Client is now possible I see roughly 900 MCP Spaces already, w/ images (flux), video (ltx), audio, code, ... Side-note: embedding AI as MCP servers in AI as MCP clients is really meta - is this AGI? 😀

Selecting any MCP Space through hf.co/mcp to use it in MCP Client is now possible

I see roughly 900 MCP Spaces already, w/ images (flux), video (ltx), audio, code, ...

Side-note: embedding AI as MCP servers in AI as MCP clients is really meta - is this AGI? 😀
Colaboratory (@googlecolab) 's Twitter Profile Photo

🤗The future of ML is accessible & collaborative 🤝 We’ve partnered with Hugging Face to add “Open in Colab” support for all models on the Hugging Face Hub. Now you can directly launch a Colab notebook from any model card, making it easier than ever to experiment with and

PyTorch (@pytorch) 's Twitter Profile Photo

#PyTorch Distributed Checkpointing now supports Hugging Face safetensors—making it easier to save/load checkpoints across ecosystems. New APIs let you read/write safetensors via fsspec paths. First adopter: torchtune, with a smoother checkpointing flow. 📚 Learn more:

#PyTorch Distributed Checkpointing now supports <a href="/huggingface/">Hugging Face</a> safetensors—making it easier to save/load checkpoints across ecosystems.

New APIs let you read/write safetensors via fsspec paths. First adopter: torchtune, with a smoother checkpointing flow.

📚 Learn more:
Lysandre (@lysandrejik) 's Twitter Profile Photo

I have bittersweet news to share. Yesterday we merged a PR deprecating TensorFlow and Flax support in transformers. Going forward, we're focusing all our efforts on PyTorch to remove a lot of the bloating in the transformers library. Expect a simpler toolkit, across the board.

I have bittersweet news to share.

Yesterday we merged a PR deprecating TensorFlow and Flax support in transformers.

Going forward, we're focusing all our efforts on PyTorch to remove a lot of the bloating in the transformers library. Expect a simpler toolkit, across the board.
Zihan Wang - on RAGEN (@wzihanw) 's Twitter Profile Photo

DeepSeek researcher 俞星凯 releases nano-vLLM — a minimal, fully readable vLLM implementation in just ~1200 lines of code. Open-source goes beyond free access — a resource to learn from, not just use! github.com/GeeeekExplorer…

DeepSeek researcher <a href="/xingkaiyu/">俞星凯</a> releases nano-vLLM — a minimal, fully readable vLLM implementation in just ~1200 lines of code.

Open-source goes beyond free access — a resource to learn from, not just use!

github.com/GeeeekExplorer…