UW NLP (@uwnlp) 's Twitter Profile
UW NLP

@uwnlp

The NLP group at the University of Washington.

ID: 3716745856

calendar_today20-09-2015 10:26:25

1,1K Tweet

11,11K Followers

162 Following

Swabha Swayamdipta (@swabhz) 's Twitter Profile Photo

Huge congratulations and thanks for the gracious shoutout, A. Feder Cooper and team! Our paper was recently accepted to Conference on Language Modeling, so please come check it out if you're attending!

Alisa Liu (@alisawuffles) 's Twitter Profile Photo

What do BPE tokenizers reveal about their training data?🧐 We develop an attack🗡️ that uncovers the training data mixtures📊 of commercial LLM tokenizers (incl. GPT-4o), using their ordered merge lists! Co-1⃣st Jonathan Hayase arxiv.org/abs/2407.16607 🧵⬇️

What do BPE tokenizers reveal about their training data?🧐

We develop an attack🗡️ that uncovers the training data mixtures📊 of commercial LLM tokenizers (incl. GPT-4o), using their ordered merge lists!

Co-1⃣st <a href="/JonathanHayase/">Jonathan Hayase</a>
arxiv.org/abs/2407.16607 🧵⬇️
Ken Liu (@kenziyuliu) 's Twitter Profile Photo

a very cool form of training data inference, especially considering the importance of data mixtures (arxiv.org/abs/2305.10429)!

Jonathan Hayase (@jonathanhayase) 's Twitter Profile Photo

Tokenizers and autogregressive LMs are both trained to compress text, but tokenizer training is deterministic and we know exactly how it works! This makes inverse problems wrt the data much easier. There's a wealth of info lurking in public tokenizers waiting to be extracted!

Sewon Min (@sewon__min) 's Twitter Profile Photo

📣 After graduating from @UWCSE, I am joining UC Berkeley as an Assistant Professor (affiliated w Berkeley AI Research BerkeleyNLP) and Ai2 as a Research Scientist. I'm looking forward to tackling exciting challenges in NLP & generative AI together with new colleagues! 🐻✨

Ofir Press (@ofirpress) 's Twitter Profile Photo

Join us on August 14th at 3PM Eastern / 12PM Pacific to learn about the three new benchmarks we've recently released: SciCode, AssistantBench and CiteMe. We will also have some SWE-bench updates. The event will be on Zoom. lu.ma/4240w5us

Hila Gonen (@hila_gonen) 's Twitter Profile Photo

Do you like yellow? Then, according to LLMs, you are probably a school bus driver! Excited to share our new paper about Semantic Leakage in Language Models! Joint work with my wonderful collaborators @terra Alisa Liu luke Noah A. Smith Paper: gonenhila.github.io/files/Semantic… 1/10

Do you like yellow? Then, according to LLMs, you are probably a school bus driver!
Excited to share our new paper about Semantic Leakage in Language Models!
Joint work with my wonderful collaborators @terra <a href="/alisawuffles/">Alisa Liu</a> <a href="/luke/">luke</a> <a href="/nlpnoah/">Noah A. Smith</a>

Paper: gonenhila.github.io/files/Semantic…

1/10
Kyle Lo (@kylelostat) 's Twitter Profile Photo

data is the secret sauce when cooking LMs; come learn our recipes!🫕 happening now in World Ballroom A on 23rd floor🤩

Nouha Dziri (@nouhadziri) 's Twitter Profile Photo

📢Super excited that our workshop "System 2 Reasoning At Scale" was accepted to #NeurIPS24, Vancouver! 🎉 🎯 how can we equip LMs with reasoning, moving beyond just scaling parameters and data? Organized w. Stanford NLP Group Massachusetts Institute of Technology (MIT) Princeton University Ai2 UW NLP 🗓️ when? Dec 15 2024

📢Super excited that our workshop "System 2 Reasoning At Scale" was accepted to #NeurIPS24, Vancouver! 🎉
🎯 how can we equip LMs with reasoning, moving beyond just scaling parameters and data?

Organized w. <a href="/stanfordnlp/">Stanford NLP Group</a> <a href="/MIT/">Massachusetts Institute of Technology (MIT)</a> <a href="/Princeton/">Princeton University</a> <a href="/allen_ai/">Ai2</a> <a href="/uwnlp/">UW NLP</a> 

🗓️ when? Dec 15 2024
Ofir Press (@ofirpress) 's Twitter Profile Photo

OpenAI just released a small subset of SWE-bench tasks, verified by humans to be solvable. I would treat this subset as "SWE-bench Easy"- useful for debugging your system. But eventually when you're ready for launch, we still recommend running on SWE-bench Lite or the full set

Xiaochuang Han (@xiaochuanghan) 's Twitter Profile Photo

👽Have you ever accidentally opened a .jpeg file with a text editor (or a hex editor)? Your language model can learn from these seemingly gibberish bytes and generate images with them! Introducing *JPEG-LM* - an image generator that uses exactly the same architecture as LLMs

👽Have you ever accidentally opened a .jpeg file with a text editor (or a hex editor)?

Your language model can learn from these seemingly gibberish bytes and generate images with them!

Introducing *JPEG-LM* - an image generator that uses exactly the same architecture as LLMs
Pang Wei Koh (@pangweikoh) 's Twitter Profile Photo

Check out JPEG-LM, a fun idea led by Xiaochuang Han -- we generate images simply by training an LM on raw JPEG bytes and show that it outperforms much more complicated VQ models, especially on rare inputs.

Marjan Ghazvininejad (@gh_marjan) 's Twitter Profile Photo

Can we train an LM on raw JPEG bytes and generate images with that? Yes we can. Check out JPEG-LM (arxiv.org/abs/2408.08459), a cool work lead by @XiaochuangHano to learn more.

Prithviraj (Raj) Ammanabrolu (@rajammanabrolu) 's Twitter Profile Photo

One of my personal most fav lines of work recently!! Work led by Nikita Haduong at Allen School UW NLP with Irene Wang, Roy Lu, and Noah A. Smith Watch out for Nikita, she'll be on the job market soon w/ a v unique set of expertises at the intersection of NLP, Games, Edu, and HCI

Skyler Hallinan (@skylerhallinan) 's Twitter Profile Photo

🚨 New Paper Alert! 🚨 Introducing 💿 StyleRemix: a novel, interpretable method for authorship obfuscation which perturbs fine-grained *style-elements* in the input text 🎨✍️. 📄 Paper: arxiv.org/abs/2408.15666 🧑‍💻 Code: github.com/jfisher52/Styl… 🚀 Demo: huggingface.co/spaces/hallisk…

🚨 New Paper Alert! 🚨

Introducing 💿 StyleRemix: a novel, interpretable method for authorship obfuscation which perturbs fine-grained *style-elements* in the input text 🎨✍️. 

📄 Paper: arxiv.org/abs/2408.15666
🧑‍💻 Code: github.com/jfisher52/Styl…
🚀 Demo: huggingface.co/spaces/hallisk…