Naomi Saphra(@nsaphra) 's Twitter Profileg
Naomi Saphra

@nsaphra

Waiting on a robot body. ML/NLP. All opinions are universal and held by both employers and family. Same username on every lifeboat off this sinking ship.

ID:215113195

linkhttp://nsaphra.github.io/ calendar_today13-11-2010 01:15:56

17,0K تغريدات

7,2K متابعون

1,2K التالية

Stanford NLP Group(@stanfordnlp) 's Twitter Profile Photo

For this week’s NLP Seminar, we are thrilled to host Naomi Saphra to talk about 'Interpreting Training'!

When: 05/02 Thurs 11am PT
Non-Stanford affiliates registration form (closed at 9am PT on the talk day): forms.gle/XJUQQZTEn6QLqR…

For this week’s NLP Seminar, we are thrilled to host @nsaphra to talk about 'Interpreting Training'! When: 05/02 Thurs 11am PT Non-Stanford affiliates registration form (closed at 9am PT on the talk day): forms.gle/XJUQQZTEn6QLqR…
account_circle
Malte Elson(@maltoesermalte) 's Twitter Profile Photo

just one more measure bro. i promise bro just one more measure and it'll fix everything bro. bro... just one more measure. please just one more. one more measure and we can fix psychology bro. bro c'mon just give me one more measure i promise bro. bro bro please i just

account_circle
Naomi Saphra(@nsaphra) 's Twitter Profile Photo

Anyone want to hang out at ICLR next week and chat about empirical training dynamics, AI for science, and LM interpretability?

account_circle
Kabir(@kabirahuja004) 's Twitter Profile Photo

📢 New Paper!

Ever wondered why transformers are able to capture hierarchical structure of human language without incorporating an explicit 🌲 structure in their architecture?

In this work we delve deep into understanding hierarchical generalization in transformers.

(1/n)

📢 New Paper! Ever wondered why transformers are able to capture hierarchical structure of human language without incorporating an explicit 🌲 structure in their architecture? In this work we delve deep into understanding hierarchical generalization in transformers. (1/n)
account_circle
Naomi Saphra(@nsaphra) 's Twitter Profile Photo

lmao this is a misinfo nightmare beyond any of the politically salient ones funraniumlabs.com/2024/04/phil-v…

account_circle
Naomi Saphra(@nsaphra) 's Twitter Profile Photo

I'm getting kind of tired of interp research that sees explaining a model as the final endpoint. That's one set of parameters buddy. You want to show me something about a specific matrix? Nah. Show me what it tells you about learning. Show me what it tells you about the data.

account_circle
Nathan Godey(@nthngdy) 's Twitter Profile Photo

🤏 Why do small Language Models underperform?

We prove empirically and theoretically that the LM head on top of language models can limit performance through the softmax bottleneck phenomenon, especially when the hidden dimension <1000.

📄Paper: arxiv.org/pdf/2404.07647…
(1/10)

🤏 Why do small Language Models underperform? We prove empirically and theoretically that the LM head on top of language models can limit performance through the softmax bottleneck phenomenon, especially when the hidden dimension <1000. 📄Paper: arxiv.org/pdf/2404.07647… (1/10)
account_circle
Paula Rodríguez Díaz(@paularodrid) 's Twitter Profile Photo

Here's an idea: instead of making the research opportunity gap wider, support research initiatives in the Global South so that at least research at the *undergrad level* becomes more accessible and equitable.

account_circle
Josh Barro(@jbarro) 's Twitter Profile Photo

That's because the anti-test campaign is led by people who hated the tests because they were bad at math, not by people who are trying to promote equality

account_circle
Chomba Bupe(@ChombaBupe) 's Twitter Profile Photo

It turns out the data bottleneck problem is more dire than initially thought:

AI model performance - which can be largely attributed to the presence of test concepts within their vast pretraining datasets - increases linearly with exponentially more data.

RIP: Scaling laws

It turns out the data bottleneck problem is more dire than initially thought: AI model performance - which can be largely attributed to the presence of test concepts within their vast pretraining datasets - increases linearly with exponentially more data. RIP: Scaling laws
account_circle
Naomi Saphra(@nsaphra) 's Twitter Profile Photo

The fire alarm in my apartment building spontaneously combusted last night and filled the whole building with smoke. As I stood in my bathrobe watching the fire brigade, I resolved to be kinder to myself about my own mistakes.

The fire alarm in my apartment building spontaneously combusted last night and filled the whole building with smoke. As I stood in my bathrobe watching the fire brigade, I resolved to be kinder to myself about my own mistakes.
account_circle
Delip Rao e/σ(@deliprao) 's Twitter Profile Photo

If this were a science paper, you would expect a country that picks its science workforce at random as a “weak baseline” and a leading nation like the US to actively experiment towards state-of-the-art, or at least beat the baseline.

Not providing a guaranteed path for…

account_circle
Denny Zhou(@denny_zhou) 's Twitter Profile Photo

Welcome to the new era of AI: 'Deep' was once the buzzword at AI conferences, but it's no longer the case in COLM.

Welcome to the new era of AI: 'Deep' was once the buzzword at AI conferences, but it's no longer the case in COLM.
account_circle
Kempner Institute at Harvard University(@KempnerInst) 's Twitter Profile Photo

Conversations about fairness and AI assistive technology need to include disabled people… Research Fellow Naomi Saphra discusses fairness and disability in this important new article from the Harvard Gazette.
bit.ly/4aijrfB

Conversations about #AI fairness and AI assistive technology need to include disabled people… #KempnerInstitute Research Fellow @nsaphra discusses fairness and disability in this important new article from the Harvard Gazette. bit.ly/4aijrfB
account_circle