Stella Biderman (@blancheminerva) Twitter Tweets • TwiCopy

2 months ago

Very good take

thumb_up_off_alt24

repeat0

What do BPE tokenizers reveal about their training data?🧐 We develop an attack🗡️ that uncovers the training data mixtures📊 of commercial LLM tokenizers (incl. GPT-4o), using their ordered merge lists! Co-1⃣st Jonathan Hayase arxiv.org/abs/2407.16607 🧵⬇️

thumb_up_off_alt354

chat_bubble_outline12

repeat69

caden

@kh4dien

2 months ago

Sparse autoencoders recover a diversity of interpretable features but present an intractable problem of scale to human labelers. We build new automated pipelines to close the gap, scaling our understanding to GPT-2 and LLama-3 8b features. @goncaloSpaulo Jacob Drori Nora Belrose

EleutherAI

@aieleuther

2 months ago

As models become larger and more unwieldy, auto-interp methods have becoming increasingly important. We are excited to be releasing the most comprehensive auto interp library to enable wider research on this topic. github.com/EleutherAI/sae…

Stella Biderman

2 months ago

Very cool paper that shows impressive performance with ternary LLMs. Discovering new papers that use EleutherAI's GPT-NeoX library in the wild is always a treat as well :D

thumb_up_off_alt17

repeat3

Stella Biderman

2 months ago

One of the best and least-acknowledged use cases for LLMs is in data processing. This is already making waves behind the scenes at companies and it's great to see pleias and Alexander Doria making it happen.

thumb_up_off_alt67

repeat8

Stella Biderman

a month ago

If you're looking to learn about training large language models, this cookbook lead by Quentin Anthony details essential information often glossed over in papers and resources for learning.

EleutherAI

@aieleuther

a month ago

Excellent work from the people who brought you Zambda, and co-authored by Quentin Anthony and our long-time community member nshepperd!

thumb_up_off_alt15

chat_bubble_outline0

repeat1

Stella Biderman

a month ago

This was an incredibly cool paper by Jonathan Hayase Alisa Liu et al. and I'm excited to attend the talk!

thumb_up_off_alt35

chat_bubble_outline0

repeat3

Luca Soldaini 🎀

a month ago

repeat5

Stella Biderman

23 days ago

Wake up babe, R\Q just dropped its newest single

thumb_up_off_alt16

chat_bubble_outline0

repeat0

RWKV

@rwkv_ai

18 days ago

The RWKV v6 Finch lines of models are here Scaling from 1.6B all the way to 14B Pushing the boundary for an Attention-free transformer, and Multi-lingual models. Cleanly licensedm Apache 2, under The Linux Foundation Find out more from the writeup here: blog.rwkv.com/p/rwkv-v6-finc…

Nathan

@nathanhabib1011

17 days ago

The Open LLM Leaderboard is now the most liked repo on all of HuggingFace 👀 - open-llm-leaderboard/open_llm_leaderboard: 11,236 - stabilityai/stable-diffusion: 10,618 - jbilcke-hf/ai-comic-factory: 7,911 - CompVis/stable-diffusion-v1-4: 6,431

thumb_up_off_alt37

repeat7

Stella Biderman

16 days ago

This is important: only 25% of respondents that chose an answer got it right. I suspect the rate would be lower among a random sample of AI audiences too. If ppl don't know what a tool does they won't use it correctly. And if they wrongly think it's a watermark, that's worse.

thumb_up_off_alt4

repeat0

Stella Biderman

16 days ago

GPT-4 can't draw a basic diagram, but by telling the model to draw it in ascii you (I suspect) bypass the diffusion model call and use something that's mostly the language model which does know what I want.

thumb_up_off_alt69

chat_bubble_outline11

repeat5