Marc Sun (@_marcsun) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Eric Hartford

@cognitivecompai

18 days ago

I was unable to quant DeepSeek-R1-0528 using llm-compressor - but I got it working on AutoAWQ, using mi300x generously lent to me by Hot Aisle. DeepSeek-R1-0528-AWQ will be published tomorrow.

I was unable to quant DeepSeek-R1-0528 using llm-compressor - but I got it working on AutoAWQ, using mi300x generously lent to me by <a href="/HotAisle/">Hot Aisle</a>. DeepSeek-R1-0528-AWQ will be published tomorrow.

thumb_up_off_alt98

chat_bubble_outline9

repeat9

shareShare

Eric Hartford

@cognitivecompai

17 days ago

DeepSeek-R1-0528-AWQ quantization completed, tested, and published to Hugging Face. Compute for this quant was generously provided by Hot Aisle

DeepSeek-R1-0528-AWQ quantization completed, tested, and published to <a href="/huggingface/">Hugging Face</a>.
Compute for this quant was generously provided by <a href="/HotAisle/">Hot Aisle</a>

thumb_up_off_alt172

chat_bubble_outline8

repeat21

shareShare

🚀 ColQwen2 just dropped in Transformers! 🤗 Say goodbye to brittle OCR pipelines — now you can retrieve documents directly in the visual space with just a few lines of code. Perfect for your visual RAG workflows. Smarter, simpler, faster. Let's dive in! 👇 (1/N 🧵)

thumb_up_off_alt577

chat_bubble_outline7

repeat95

shareShare

Awni Hannun

@awnihannun

16 days ago

The latest mlx-lm has a new dynamic quantization method (made with Angelos Katharopoulos). It consistently results in better model quality with no increase in size. Some perplexity results (lower is better) for a few Qwen3 base models:

The latest mlx-lm has a new dynamic quantization method (made with <a href="/angeloskath/">Angelos Katharopoulos</a>). It consistently results in better model quality with no increase in size.

Some perplexity results (lower is better) for a few Qwen3 base models:

thumb_up_off_alt78

chat_bubble_outline4

repeat9

shareShare

Unsloth AI

@unslothai

14 days ago

We made a repo with 100+ Fine-tuning notebooks all in once place! Has guides & examples for: • Tool-calling, Classification, Synthetic data • BERT, TTS, Vision LLMs • GRPO, DPO, SFT, CPT • Dataprep, eval, saving • Llama, Qwen, Gemma, Phi, DeepSeek 🔗github.com/unslothai/note…

thumb_up_off_alt1,1K

chat_bubble_outline28

repeat273

shareShare

Alex Zhang

@a1zhang

14 days ago

super high alpha learning material just dropped: Nemo wrote up their design process + code for one of the fastest fp8 GEMM implementations in the entire the $100K AMD kernel challenge — enjoy :) 🔗: akashkarnatak.github.io/amd-challenge/

super high alpha learning material just dropped:

<a href="/xkxxhk/">Nemo</a> wrote up their design process + code for one of the fastest fp8 GEMM implementations in the entire the $100K AMD kernel challenge — enjoy :)

🔗: akashkarnatak.github.io/amd-challenge/

thumb_up_off_alt134

chat_bubble_outline2

repeat23

shareShare

Albert Tseng

@tsengalb99

13 days ago

📣Introducing our latest work: Yet Another Quantization Algorithm! YAQA directly minimizes the KL divergence to the original model during rounding, cutting it by >30% over prior PTQ methods and giving an even closer model than Google’s QAT on Gemma! 🤯 arxiv.org/abs/2505.22988👇

thumb_up_off_alt98

chat_bubble_outline6

repeat26

shareShare

Red Hat AI

@redhat_ai

13 days ago

LLM Compressor just got way easier to use. You can now compress most LLMs directly from their Hugging Face model definition. No need to write custom wrappers. This new autowrapper supports 95% of multimodal and decoder models out of the box. Let’s break it down 🧵:

thumb_up_off_alt88

chat_bubble_outline4

repeat7

shareShare

Eldar Kurtic

@_eldarkurtic

13 days ago

Hugging Face Transformers is now a first-class citizen in vLLM LLM-Compressor!

thumb_up_off_alt10

chat_bubble_outline0

repeat2

shareShare

Han Guo

@hanguo97

13 days ago

We know Attention and its linear-time variants, such as linear attention and State Space Models. But what lies in between? Introducing Log-Linear Attention with: - Log-linear time training - Log-time inference (in both time and memory) - Hardware-efficient Triton kernels

thumb_up_off_alt1,1K

chat_bubble_outline14

repeat185

shareShare

Sayak Paul

@risingsayak

12 days ago

Bitsandbytes latest works with `torch.compile(fullgraph=True)` and you should put it to good use 🔥 For example, when applied to Flux, it beefs up the performance quite a bit. Code: gist.github.com/sayakpaul/0db9… Enjoy 🔥

thumb_up_off_alt133

chat_bubble_outline8

repeat20

shareShare

Alex Zhang

@a1zhang

12 days ago

More learning alpha for GPU / ML enthusiasts from the conclusion of our AMD x GPU MODE kernel writing competition Here's the write-up to the 🥈 solution (out of 163 other extremely talented teams) by Seb-v, which details how he refined his kernel! 🔗 below!

More learning alpha for GPU / ML enthusiasts from the conclusion of our <a href="/AMD/">AMD</a> x <a href="/GPU_MODE/">GPU MODE</a> kernel writing competition

Here's the write-up to the 🥈 solution (out of 163 other extremely talented teams) by Seb-v, which details how he refined his kernel!

🔗 below!

thumb_up_off_alt35

chat_bubble_outline1

repeat6

shareShare

Lysandre

@lysandrejik

12 days ago

Selecting any MCP Space through hf.co/mcp to use it in MCP Client is now possible I see roughly 900 MCP Spaces already, w/ images (flux), video (ltx), audio, code, ... Side-note: embedding AI as MCP servers in AI as MCP clients is really meta - is this AGI? 😀

thumb_up_off_alt18

chat_bubble_outline0

repeat3

shareShare

Colaboratory

@googlecolab

12 days ago

🤗The future of ML is accessible & collaborative 🤝 We’ve partnered with Hugging Face to add “Open in Colab” support for all models on the Hugging Face Hub. Now you can directly launch a Colab notebook from any model card, making it easier than ever to experiment with and

thumb_up_off_alt1,1K

chat_bubble_outline37

repeat306

shareShare

PyTorch

@pytorch

12 days ago

#PyTorch Distributed Checkpointing now supports Hugging Face safetensors—making it easier to save/load checkpoints across ecosystems. New APIs let you read/write safetensors via fsspec paths. First adopter: torchtune, with a smoother checkpointing flow. 📚 Learn more:

#PyTorch Distributed Checkpointing now supports <a href="/huggingface/">Hugging Face</a> safetensors—making it easier to save/load checkpoints across ecosystems.

New APIs let you read/write safetensors via fsspec paths. First adopter: torchtune, with a smoother checkpointing flow.

📚 Learn more:

thumb_up_off_alt106

chat_bubble_outline6

repeat21

shareShare

Lysandre

@lysandrejik

6 days ago

I have bittersweet news to share. Yesterday we merged a PR deprecating TensorFlow and Flax support in transformers. Going forward, we're focusing all our efforts on PyTorch to remove a lot of the bloating in the transformers library. Expect a simpler toolkit, across the board.

thumb_up_off_alt769

chat_bubble_outline27

repeat68

shareShare

Zihan Wang - on RAGEN

@wzihanw

6 days ago

DeepSeek researcher 俞星凯 releases nano-vLLM — a minimal, fully readable vLLM implementation in just ~1200 lines of code. Open-source goes beyond free access — a resource to learn from, not just use! github.com/GeeeekExplorer…

DeepSeek researcher <a href="/xingkaiyu/">俞星凯</a> releases nano-vLLM — a minimal, fully readable vLLM implementation in just ~1200 lines of code.

Open-source goes beyond free access — a resource to learn from, not just use!

github.com/GeeeekExplorer…

thumb_up_off_alt644

chat_bubble_outline6

repeat95

shareShare

Marc Sun

Gate.io

Eric Hartford

Eric Hartford

Tony Wu

Awni Hannun

Unsloth AI

Alex Zhang

Albert Tseng

Red Hat AI

Eldar Kurtic

Han Guo

Sayak Paul

Alex Zhang

Lysandre

Colaboratory

PyTorch

Lysandre

Zihan Wang - on RAGEN