Stella Biderman (@BlancheMinerva) Twitter Tweets • TwiCopy

repeat7

account_circle

Quentin Anthony

@QuentinAnthon15

1 week ago

Zyphra is pleased to announce Zamba-7B:
- 7B Mamba/Attention hybrid
- Competitive with Mistral-7B and Gemma-7B on only 1T fully open training tokens
- Outperforms Llama-2 7B and OLMo-7B
- All checkpoints across training to be released (Apache 2.0)
- Achieved by 7 people, on 128…

account_circle

Apoorv Khandelwal

@apoorvkh

1 week ago

Calling all academic AI researchers! 🚨
We are conducting a survey on compute resources. We want to help the community better understand our capabilities+needs. We hope that this will help us all advocate for the resources we need!

Please contribute at: forms.gle/3hEie4hj999fiS…

account_circle

Aran Komatsuzaki

@arankomatsuzaki

1 week ago

🚀 Introducing Pile-T5!

🔗 We (EleutherAI) are thrilled to open-source our latest T5 model trained on 2T tokens from the Pile using the Llama tokenizer.

✨ Featuring intermediate checkpoints and a significant boost in benchmark performance.

Work done by Lintang Sutawika, me…

account_circle

Stella Biderman

1 week ago

I've been brain-dumping what I know about how LLMs work for several months now into an accessible general audience book! Check out the pre-release at the link.

account_circle

Nathan Lambert

@natolambert

1 week ago

we did it

thumb_up_off_alt44

repeat3

account_circle

Stella Biderman

1 week ago

Excited about this for many reasons, but the biggest are
1. T5 is very very widely used IRL and better models are a good thing.
2. Checkpoints saved every 10,000 steps enabling research on learning dynamics and interp for s2s models like what Pythia has done for decoder models.

account_circle

Stella Biderman

2 weeks ago

A good review thread of very good work! Thanks for the shout-outs Shayne.

thumb_up_off_alt10

repeat2

account_circle

Stella Biderman

2 weeks ago

Perhaps the highest praise one can give a data auditing paper: working on this paper fundamentally changed how I think about web-crawled datasets, assumptions I made about them, and how I practice data collection and cleaning.

account_circle

Stella Biderman

2 weeks ago

Some really interesting stuff in these results 👀

thumb_up_off_alt86

repeat5

account_circle

Stella Biderman

3 weeks ago

Hot damn, cohere Cohere For AI is really bringing it

account_circle

Manning Publications

@ManningBooks

3 weeks ago

📣Deal of the Day📣 Apr 2

45% off TODAY ONLY!

How GPT Works & selected titles: mng.bz/WrEx Drew Farris Stella Biderman Edward Raff #LLMs #AI #ML

New MEAP! Learn how large language models like #GPT and #Gemini work under the hood in plain English.

📣Deal of the Day📣 Apr 2 45% off TODAY ONLY! How GPT Works & selected titles: mng.bz/WrEx @drewfarris @BlancheMinerva @EdwardRaffML #LLMs #AI #ML New MEAP! Learn how large language models like #GPT and #Gemini work under the hood in plain English.

thumb_up_off_alt7

repeat1