Stella Biderman(@BlancheMinerva) 's Twitter Profileg
Stella Biderman

@BlancheMinerva

Open source LLMs and interpretability research at @BoozAllen and @AiEleuther. My employers disown my tweets. She/her

ID:1125849026308575239

linkhttp://www.stellabiderman.com calendar_today07-05-2019 19:44:59

11,6K Tweets

14,5K Followers

748 Following

Hailey Schoelkopf(@haileysch__) 's Twitter Profile Photo

Tamay Besiroglu but current models don’t allocate parameters to rotary embs!

this means the Chinchilla D=20*N is skewed already for the actual param counts of most models, even if it held across datasets! If we disregarded the pos. encoding params the coefficients would change

account_circle
Hailey Schoelkopf(@haileysch__) 's Twitter Profile Photo

Tamay Besiroglu a super-fun arcane historical detail:

Gopher (and by extension Chinchilla) use Transformer-XL style position encodings. This means they spend 20B params (Gopher) and 5B params (Chinchilla) on just rel. position encoding!

account_circle
near(@nearcyan) 's Twitter Profile Photo

the best TTS available from 2020-2021 was done by a single unemployed guy who supported every single my little pony voice and literally nothing else

the best TTS from 2022-2024 was also done by a single (different) unemployed guy w a custom-built 6x3090 rig in his basement (1/2)

account_circle
Edward Raff @AISTATS(@EdwardRaffML) 's Twitter Profile Photo

, book w/ Stella Biderman Drew Farris is already up to # 3 on the best sellers! We are removing all the mystery behind WTF a language model is, how they work, but in an accessible way for people without any AI/ML training. Manning Publications manning.com/books/how-gpt-…

#HowGPTWorks, book w/ @BlancheMinerva @drewfarris is already up to # 3 on the best sellers! We are removing all the mystery behind WTF a language model is, how they work, but in an accessible way for people without any AI/ML training. @ManningBooks manning.com/books/how-gpt-…
account_circle
Stella Biderman(@BlancheMinerva) 's Twitter Profile Photo

This seems clearly correct to me and is something I've personally experienced.

Probably the easiest way to see this is true is to realize that people don't know the logical closure of their beliefs, but given time and a pencil can work many things in said logical closure out.

account_circle
Stella Biderman(@BlancheMinerva) 's Twitter Profile Photo

100%! People are really bad at understanding the logical close of their beliefs.

(Proof: if they weren't, we would know if ZFC was consistent!)

account_circle
Stella Biderman(@BlancheMinerva) 's Twitter Profile Photo

Really amazing work by the Hugging Face team! Infrastructure work, including dataset work, evaluations work, and building libraries, is the single highest-leverage thing you can do in AI. This will provide dividends for the broader AI community for years to come.

account_circle
EleutherAI(@AiEleuther) 's Twitter Profile Photo

An essential blocker to training LLMs on public domain books is not knowing which books are in the public domain. We're working on it, but it's slow and costly... if you're interested in providing support reach out!

account_circle
Stella Biderman(@BlancheMinerva) 's Twitter Profile Photo

SSMs + long sequence analysis + malware detection with LLMs is all the buzzwords you need to decide to check our paper out, right?

arxiv.org/abs/2403.17978

account_circle
Stella Biderman(@BlancheMinerva) 's Twitter Profile Photo

Training data transparency is an unambiguous win for society, but all the incentives are against companies doing it right now. We need to fix this as soon as possible.

account_circle
EleutherAI(@AiEleuther) 's Twitter Profile Photo

We are excited to see torchtune, a newly announced PyTorch-native finetuning library, integrate with our LM Evaluation Harness library for standardized, reproducible evaluations!

Read more here:
Blog: pytorch.org/blog/torchtune…
Thread:

account_circle
Quentin Anthony(@QuentinAnthon15) 's Twitter Profile Photo

Zyphra is pleased to announce Zamba-7B:
- 7B Mamba/Attention hybrid
- Competitive with Mistral-7B and Gemma-7B on only 1T fully open training tokens
- Outperforms Llama-2 7B and OLMo-7B
- All checkpoints across training to be released (Apache 2.0)
- Achieved by 7 people, on 128…

Zyphra is pleased to announce Zamba-7B: - 7B Mamba/Attention hybrid - Competitive with Mistral-7B and Gemma-7B on only 1T fully open training tokens - Outperforms Llama-2 7B and OLMo-7B - All checkpoints across training to be released (Apache 2.0) - Achieved by 7 people, on 128…
account_circle
Apoorv Khandelwal(@apoorvkh) 's Twitter Profile Photo

Calling all academic AI researchers! 🚨
We are conducting a survey on compute resources. We want to help the community better understand our capabilities+needs. We hope that this will help us all advocate for the resources we need!

Please contribute at: forms.gle/3hEie4hj999fiS…

account_circle
Aran Komatsuzaki(@arankomatsuzaki) 's Twitter Profile Photo

🚀 Introducing Pile-T5!

🔗 We (EleutherAI) are thrilled to open-source our latest T5 model trained on 2T tokens from the Pile using the Llama tokenizer.

✨ Featuring intermediate checkpoints and a significant boost in benchmark performance.

Work done by Lintang Sutawika, me…

🚀 Introducing Pile-T5! 🔗 We (EleutherAI) are thrilled to open-source our latest T5 model trained on 2T tokens from the Pile using the Llama tokenizer. ✨ Featuring intermediate checkpoints and a significant boost in benchmark performance. Work done by @lintangsutawika, me…
account_circle
Stella Biderman(@BlancheMinerva) 's Twitter Profile Photo

I've been brain-dumping what I know about how LLMs work for several months now into an accessible general audience book! Check out the pre-release at the link.

account_circle