Stella Biderman(@BlancheMinerva) 's Twitter Profileg
Stella Biderman

@BlancheMinerva

Open source LLMs and interpretability research at @BoozAllen and @AiEleuther. My employers disown my tweets. She/her

ID:1125849026308575239

linkhttp://www.stellabiderman.com calendar_today07-05-2019 19:44:59

11,6K Tweets

14,5K Followers

748 Following

EleutherAI(@AiEleuther) 's Twitter Profile Photo

We are excited to see torchtune, a newly announced PyTorch-native finetuning library, integrate with our LM Evaluation Harness library for standardized, reproducible evaluations!

Read more here:
Blog: pytorch.org/blog/torchtune…
Thread:

account_circle
Quentin Anthony(@QuentinAnthon15) 's Twitter Profile Photo

Zyphra is pleased to announce Zamba-7B:
- 7B Mamba/Attention hybrid
- Competitive with Mistral-7B and Gemma-7B on only 1T fully open training tokens
- Outperforms Llama-2 7B and OLMo-7B
- All checkpoints across training to be released (Apache 2.0)
- Achieved by 7 people, on 128…

Zyphra is pleased to announce Zamba-7B: - 7B Mamba/Attention hybrid - Competitive with Mistral-7B and Gemma-7B on only 1T fully open training tokens - Outperforms Llama-2 7B and OLMo-7B - All checkpoints across training to be released (Apache 2.0) - Achieved by 7 people, on 128…
account_circle
Apoorv Khandelwal(@apoorvkh) 's Twitter Profile Photo

Calling all academic AI researchers! 🚨
We are conducting a survey on compute resources. We want to help the community better understand our capabilities+needs. We hope that this will help us all advocate for the resources we need!

Please contribute at: forms.gle/3hEie4hj999fiS…

account_circle
Aran Komatsuzaki(@arankomatsuzaki) 's Twitter Profile Photo

🚀 Introducing Pile-T5!

🔗 We (EleutherAI) are thrilled to open-source our latest T5 model trained on 2T tokens from the Pile using the Llama tokenizer.

✨ Featuring intermediate checkpoints and a significant boost in benchmark performance.

Work done by Lintang Sutawika, me…

🚀 Introducing Pile-T5! 🔗 We (EleutherAI) are thrilled to open-source our latest T5 model trained on 2T tokens from the Pile using the Llama tokenizer. ✨ Featuring intermediate checkpoints and a significant boost in benchmark performance. Work done by @lintangsutawika, me…
account_circle
Stella Biderman(@BlancheMinerva) 's Twitter Profile Photo

I've been brain-dumping what I know about how LLMs work for several months now into an accessible general audience book! Check out the pre-release at the link.

account_circle
Stella Biderman(@BlancheMinerva) 's Twitter Profile Photo

Excited about this for many reasons, but the biggest are
1. T5 is very very widely used IRL and better models are a good thing.
2. Checkpoints saved every 10,000 steps enabling research on learning dynamics and interp for s2s models like what Pythia has done for decoder models.

account_circle
Stella Biderman(@BlancheMinerva) 's Twitter Profile Photo

Perhaps the highest praise one can give a data auditing paper: working on this paper fundamentally changed how I think about web-crawled datasets, assumptions I made about them, and how I practice data collection and cleaning.

account_circle
Manning Publications(@ManningBooks) 's Twitter Profile Photo

📣Deal of the Day📣 Apr 2

45% off TODAY ONLY!

How GPT Works & selected titles: mng.bz/WrEx Drew Farris Stella Biderman Edward Raff

New MEAP! Learn how large language models like and work under the hood in plain English.

📣Deal of the Day📣 Apr 2 45% off TODAY ONLY! How GPT Works & selected titles: mng.bz/WrEx @drewfarris @BlancheMinerva @EdwardRaffML #LLMs #AI #ML New MEAP! Learn how large language models like #GPT and #Gemini work under the hood in plain English.
account_circle