Sang Michael Xie (@sangmichaelxie) 's Twitter Profile
Sang Michael Xie

@sangmichaelxie

Research Scientist at Meta GenAI / LLaMA. AI + ML + NLP + data. Prev: CS PhD @StanfordAILab @StanfordNLP @Stanford, @GoogleAI Brain/DeepMind

ID: 1133476937668542465

linkhttp://cs.stanford.edu/~eix calendar_today28-05-2019 20:55:35

365 Tweet

3,3K Followers

727 Following

Sang Michael Xie (@sangmichaelxie) 's Twitter Profile Photo

Data selection for LMs (GPT-3, PaLM) is done with heuristics that select data by training a classifier for high-quality text. Can we do better? Turns out we can boost downstream GLUE acc by 2+% by adapting the classic importance resampling algorithm.. arxiv.org/abs/2302.03169 🧵

Data selection for LMs (GPT-3, PaLM) is done with heuristics that select data by training a classifier for high-quality text. Can we do better?

Turns out we can boost downstream GLUE acc by 2+% by adapting the classic importance resampling algorithm..

arxiv.org/abs/2302.03169
🧵