Zach Mueller(@TheZachMueller) 's Twitter Profileg
Zach Mueller

@TheZachMueller

🤗 Technical Lead for the Accelerate Project | Passionate about Open Source | Nerd who enjoys touching the grass | #ADHD | He/Him

ID:721018777664626688

linkhttps://muellerzr.github.io/ calendar_today15-04-2016 16:54:07

14,6K Tweets

9,5K Followers

388 Following

Jonathan Whitaker(@johnowhitaker) 's Twitter Profile Photo

QDoRA strikes a nice balance - efficient like QLoRA but performs more like full finetuning.

I hope 'quant. base + trainable adapters' becomes the default way to share models. We can train QDoRA w/ FSDP now, the next piece is fast inference without merging in adapters...

account_circle
Clémentine Fourrier 🍊(@clefourrier) 's Twitter Profile Photo

⚠️We've decided to pause the Open LLM Leaderboard temporarily (hopefully till the end of day) to prevent evaluation failures due to network problems on the hub.

If your model failed this morning, tell us, we'll relaunch once everything's good.

Infra/hub teams are on it! 💪

account_circle
Hamel Husain(@HamelHusain) 's Twitter Profile Photo

Who is creating a pytest-like tool for testing LLMs (where you can also track metrics, version data alongside code, etc)?

looking for OSS

account_circle
anton(@abacaj) 's Twitter Profile Photo

Zuck releasing a billion dollar model is actually wild, like really undermining what OAI is doing. flexing compute like “yea we can do that not a big deal”

account_circle
Zach Mueller(@TheZachMueller) 's Twitter Profile Photo

For those curious, biggest drive I could find at Micro Center is 18TB, you’d need 3 of those so that’s near $1,000 to download a dataset 🤯🤯🤯

microcenter.com/product/637516…

account_circle
Guilherme Penedo(@gui_penedo) 's Twitter Profile Photo

We have just released 🍷 FineWeb: 15 trillion tokens of high quality web data.
We filtered and deduplicated all CommonCrawl between 2013 and 2024.
Models trained on FineWeb outperform RefinedWeb, C4, DolmaV1.6, The Pile and SlimPajama!

We have just released 🍷 FineWeb: 15 trillion tokens of high quality web data. We filtered and deduplicated all CommonCrawl between 2013 and 2024. Models trained on FineWeb outperform RefinedWeb, C4, DolmaV1.6, The Pile and SlimPajama!
account_circle
Xenova(@xenovacom) 's Twitter Profile Photo

Meta's Segment Anything Model (SAM) can now run in your browser w/ WebGPU (+ fp16), meaning up to 8x faster image encoding (10s → 1.25s)! 🤯⚡️

Video is not sped up! Everything runs 100% locally thanks to 🤗 Transformers.js and onnxruntime-web!

🔗 Demo: hf.co/spaces/Xenova/…

account_circle
Gergely Orosz(@GergelyOrosz) 's Twitter Profile Photo

I know I am late to the party but HuggingFace is such an amazing platform for LLMs.

If I had to describe my impression after using it for a little time:

“GitHub, but for AI models.”

account_circle
Zach Mueller(@TheZachMueller) 's Twitter Profile Photo

There is an art to being truly helpful on forums. It’s a careful balance of:

1. What is the critical information a user needs answers to (in as short and direct answer as possible)
2. What information can you give them to go investigate and learn more of on their own (spark

account_circle