Stella Biderman (@BlancheMinerva) Twitter Tweets • TwiCopy

2 weeks ago

TIL: exactly one question in ARC has five choices. The rest have four.

thumb_up_off_alt61

repeat0

account_circle

Tamay Besiroglu but current models don’t allocate parameters to rotary embs!

this means the Chinchilla D=20*N is skewed already for the actual param counts of most models, even if it held across datasets! If we disregarded the pos. encoding params the coefficients would change

thumb_up_off_alt56

repeat2

account_circle

Hailey Schoelkopf

@haileysch__

2 weeks ago

Tamay Besiroglu a super-fun arcane historical detail:

Gopher (and by extension Chinchilla) use Transformer-XL style position encodings. This means they spend 20B params (Gopher) and 5B params (Chinchilla) on just rel. position encoding!

thumb_up_off_alt79

repeat6

account_circle

near

@nearcyan

2 weeks ago

the best TTS available from 2020-2021 was done by a single unemployed guy who supported every single my little pony voice and literally nothing else

the best TTS from 2022-2024 was also done by a single (different) unemployed guy w a custom-built 6x3090 rig in his basement (1/2)

account_circle

Edward Raff @AISTATS

@EdwardRaffML

3 weeks ago

#HowGPTWorks , book w/ Stella Biderman Drew Farris is already up to # 3 on the best sellers! We are removing all the mystery behind WTF a language model is, how they work, but in an accessible way for people without any AI/ML training. Manning Publications manning.com/books/how-gpt-…

#HowGPTWorks, book w/ @BlancheMinerva @drewfarris is already up to # 3 on the best sellers! We are removing all the mystery behind WTF a language model is, how they work, but in an accessible way for people without any AI/ML training. @ManningBooks manning.com/books/how-gpt-…

thumb_up_off_alt16

repeat4

account_circle

Stella Biderman

3 weeks ago

This seems clearly correct to me and is something I've personally experienced.

Probably the easiest way to see this is true is to realize that people don't know the logical closure of their beliefs, but given time and a pencil can work many things in said logical closure out.

thumb_up_off_alt59

repeat1

account_circle

Stella Biderman

3 weeks ago

100%! People are really bad at understanding the logical close of their beliefs.

(Proof: if they weren't, we would know if ZFC was consistent!)

thumb_up_off_alt4

repeat0

account_circle

Stella Biderman

3 weeks ago

Really amazing work by the Hugging Face team! Infrastructure work, including dataset work, evaluations work, and building libraries, is the single highest-leverage thing you can do in AI. This will provide dividends for the broader AI community for years to come.

account_circle

EleutherAI

@AiEleuther

3 weeks ago

An essential blocker to training LLMs on public domain books is not knowing which books are in the public domain. We're working on it, but it's slow and costly... if you're interested in providing support reach out!

account_circle

Stella Biderman

3 weeks ago

SSMs + long sequence analysis + malware detection with LLMs is all the buzzwords you need to decide to check our paper out, right?

arxiv.org/abs/2403.17978

thumb_up_off_alt31

repeat2

account_circle

Stella Biderman

3 weeks ago

Training data transparency is an unambiguous win for society, but all the incentives are against companies doing it right now. We need to fix this as soon as possible.

account_circle

EleutherAI

@AiEleuther

3 weeks ago

We are excited to see torchtune, a newly announced PyTorch-native finetuning library, integrate with our LM Evaluation Harness library for standardized, reproducible evaluations!

Read more here:
Blog: pytorch.org/blog/torchtune…
Thread:

thumb_up_off_alt55

repeat7

account_circle

Quentin Anthony

@QuentinAnthon15

3 weeks ago

Zyphra is pleased to announce Zamba-7B:
- 7B Mamba/Attention hybrid
- Competitive with Mistral-7B and Gemma-7B on only 1T fully open training tokens
- Outperforms Llama-2 7B and OLMo-7B
- All checkpoints across training to be released (Apache 2.0)
- Achieved by 7 people, on 128…

account_circle

Apoorv Khandelwal

@apoorvkh

4 weeks ago

Calling all academic AI researchers! 🚨
We are conducting a survey on compute resources. We want to help the community better understand our capabilities+needs. We hope that this will help us all advocate for the resources we need!

Please contribute at: forms.gle/3hEie4hj999fiS…

account_circle

Aran Komatsuzaki

@arankomatsuzaki

4 weeks ago

🚀 Introducing Pile-T5!

🔗 We (EleutherAI) are thrilled to open-source our latest T5 model trained on 2T tokens from the Pile using the Llama tokenizer.

✨ Featuring intermediate checkpoints and a significant boost in benchmark performance.

Work done by Lintang Sutawika, me…

account_circle

Stella Biderman