William J.B. Mattingly (@wjb_mattingly) 's Twitter Profile
William J.B. Mattingly

@wjb_mattingly

Digital Nomad · Historian · Data Scientist · NLP · Machine Learning Cultural Heritage Data Scientist @Yale Former @SIDataScience @huggingface Fellow 🤗

ID: 1257733645277769735

linkhttps://linktr.ee/wjbmattingly calendar_today05-05-2020 18:07:31

3,3K Tweet

3,3K Followers

198 Following

William J.B. Mattingly (@wjb_mattingly) 's Twitter Profile Photo

New free HTR app nearly ready to share! Ok, so I'm not trying to hop on this vibe-coding train, but I was able to whip this up in about 2 hours. This is a simple local app that creates a local database that lets you create projects which have document images that you can then

Jeff Boudier 🤗 (@jeffboudier) 's Twitter Profile Photo

Transcribing 1 hour of audio for less than $0.01 🤯 The Hugging Face team cooked with 8x faster Whisper speech recognition - OpenAI whisper-large-v3-turbo transcribes at 100x real time on a $0.80/hr L4 GPU!

Transcribing 1 hour of audio for less than $0.01 🤯

The <a href="/huggingface/">Hugging Face</a>  team cooked with 8x faster Whisper speech recognition - <a href="/OpenAI/">OpenAI</a>  whisper-large-v3-turbo  transcribes at 100x real time on a $0.80/hr L4 GPU!
William J.B. Mattingly (@wjb_mattingly) 's Twitter Profile Photo

Using Pandas for statistical analysis is great and easy, but with larger datasets, Pandas can be a real bottleneck. If you have a GPU available, try cuDF which lets you do the same analysis in Pandas with zero coding changes! NVIDIA AI Developer Video: youtu.be/qdZkPduLxhw

Kasey Zhang (@_weexiao) 's Twitter Profile Photo

Don't use structured output mode for reasoning tasks. We’re open sourcing Osmosis-Structure-0.6B: an extremely small model that can turn any unstructured data into any format (e.g. JSON schema). Use it with any model - download and blog below!

Daniel van Strien (@vanstriendaniel) 's Twitter Profile Photo

“AI Scraping Bots Are Breaking Open Libraries, Archives, and Museums” – interesting piece via 404 Media. Not a perfect fix, but making ML-ready datasets from collections can help. If you want help getting your data on Hugging Face, happy to help.

“AI Scraping Bots Are Breaking Open Libraries, Archives, and Museums” – interesting piece via <a href="/404mediaco/">404 Media</a>.

Not a perfect fix, but making ML-ready datasets from collections can help.

If you want help getting your data on <a href="/huggingface/">Hugging Face</a>, happy to help.