Clara Isabel Meister (@clara__meister) 's Twitter Profile
Clara Isabel Meister

@clara__meister

PhD student in the ML Institute at ETH Zurich.

Still figuring out how Twitter works... 🤦‍♀️

ID: 1141006043218108419

linkhttp://cimeister.github.io calendar_today18-06-2019 15:33:34

102 Tweet

1,1K Followers

49 Following

JHU CLSP (@jhuclsp) 's Twitter Profile Photo

"A Measure-Theoretic Characterization of Tight Language Models" Draft: arxiv.org/abs/2212.10502 By Leo Du (JHU) Lucas Torroba-Hennigen Tiago Pimentel Clara Isabel Meister Jason Eisner (JHU) @ryandcotterell TLDR; Formalizes LMs' distribution (i.e., whether generative process terminates with prob 1.)

JHU CLSP (@jhuclsp) 's Twitter Profile Photo

"Tokenization and the Noiseless Channel" Draft: coming soon! By Vilém Zouhar Clara Isabel Meister Gianni Gastaldi @giannig.bsky.social Leo Du (JHU) Mrinmaya Sachan @ryandcotterell TLDR; Develops information-theoretic efficiency measures of subword tokenization alg + theoretical bounds for such measures.

Thomas Hikaru Clark (@thomashikaru) 's Twitter Profile Photo

Has a language’s word order been affected by a pressure for uniform information density? We investigate this Q in our upcoming TACL paper 🧵 with Clara Isabel Meister Tiago Pimentel Michael Hahn @ryandcotterell Richard Futrell Roger Levy arxiv.org/abs/2306.03734

Has a language’s word order been affected by a pressure for uniform information density? We investigate this Q in our upcoming TACL paper 🧵
with <a href="/clara__meister/">Clara Isabel Meister</a> <a href="/tpimentelms/">Tiago Pimentel</a> <a href="/mhahn29/">Michael Hahn</a> @ryandcotterell <a href="/rljfutrell/">Richard Futrell</a> <a href="/roger_p_levy/">Roger Levy</a>
arxiv.org/abs/2306.03734
Vilém Zouhar (@zouharvi) 's Twitter Profile Photo

I'm elated to present our two latest projects on tokenization. 🧩🧩 The first formalizes Byte-Pair Encoding and finds a nice bound to its greediness. arxiv.org/abs/2306.16837 youtube.com/watch?v=aB7oaS…

I'm elated to present our two latest projects on tokenization. 🧩🧩 
The first formalizes Byte-Pair Encoding and finds a nice bound to its greediness.
arxiv.org/abs/2306.16837
youtube.com/watch?v=aB7oaS…
Kyle Mahowald (@kmahowald) 's Twitter Profile Photo

Computational psycholinguists and friends take note: Marten van Schijndel and I are co-editing, under the aegis of @AdrianBStaub, a special issue of JML on language models and psycholinguistics! Call: sciencedirect.com/journal/journa…. I'm happy to chat about this in Toronto at #ACL2023NLP!

Clara Isabel Meister (@clara__meister) 's Twitter Profile Photo

Come to our ACL tutorial tomorrow at 14h on generating text from language models! Material will be online here: rycolab.io/classes/acl-20… w/ Tiago Pimentel Afra Amini John Hewitt Luca Malagutti @ryandcotterell

Ethan Gotlieb Wilcox (@wegotlieb) 's Twitter Profile Photo

🚨🚨New Paper Announcement (to appear in TACL) 📜 from me, Tiago Pimentel, Clara Isabel Meister, @ryandcotterell and Roger Levy: Testing the Predictions of Surprisal Theory in 11 Languages arxiv.org/abs/2307.03667 🌎

Clara Isabel Meister (@clara__meister) 's Twitter Profile Photo

If you're at #ACL2023NLP, stop by the poster session tomorrow @ 16h for our paper "On the Efficacy of Sampling Adapters"! Hope to see some of you there :) arxiv.org/pdf/2307.03749… Tiago Pimentel Luca Malagutti Ethan Gotlieb Wilcox @ryandcotterell

ZurichAI (@zurichnlp) 's Twitter Profile Photo

We have an absolutely stellar meetup coming up this October 19th @ 6:00 PM! Jonas Pfeiffer from Google DeepMind will be presenting followed by Eiso Kant co-founder and CTO of poolside. Last meetup we ran out of spots! RSVP: zurich-nlp.ch/event/zurich-n…

Tiago Pimentel (@tpimentelms) 's Twitter Profile Photo

Are you interested in word lengths and natural language’s efficiency? If yes, our new #EMNLP2023 paper has everything you need: drama, suspense, a new derivation of Zipf’s law, an update to Piantadosi et al’s classic word length paper, transformers... 🧵 arxiv.org/abs/2312.03897

Are you interested in word lengths and natural language’s efficiency? If yes, our new #EMNLP2023 paper has everything you need: drama, suspense, a new derivation of Zipf’s law, an update to Piantadosi et al’s classic word length paper, transformers... 🧵

arxiv.org/abs/2312.03897
Ethan Gotlieb Wilcox (@wegotlieb) 's Twitter Profile Photo

Thank you to #EMNLP2023 chairs for the 😱 two 😱 outstanding paper awards! I am so grateful to have worked on these projects with wonderful colleagues — Tiago Pimentel (who is the first author on one of the papers!), Clara Isabel Meister, Kyle Mahowald and @ryandcotterell

Thank you to #EMNLP2023 chairs for the 😱 two 😱 outstanding paper awards! I am so grateful to have worked on these projects with wonderful colleagues — <a href="/tpimentelms/">Tiago Pimentel</a> (who is the first author on one of the papers!), <a href="/clara__meister/">Clara Isabel Meister</a>, <a href="/kmahowald/">Kyle Mahowald</a> and @ryandcotterell
ZurichAI (@zurichnlp) 's Twitter Profile Photo

New year, new meetup! Join us on January 16th 18:00 - 20:00 for talks from: ➡ Martina Forster and Luca Campanella from TypewiseTiago Pimentel from ETH Zurich We filled out 50/150 RSVP's yesterday, spots are filling up fast! 🚀 zurich-nlp.ch/event/zurich-n…

Ethan Gotlieb Wilcox (@wegotlieb) 's Twitter Profile Photo

🔔🌟 New Preprint Alert 🔔🌟 “An Information-Theoretic Analysis of Targeted Regressions during Reading” with Tiago Pimentel , Clara Isabel Meister , @ryandcotterell - Psycholinguistics 🧠 Computational Modeling 🤖 Crosslinguistic Studies 🌍 Information Theory 📡 osf.io/preprints/psya…

Pietro Lesci (@pietro_lesci) 's Twitter Profile Photo

Happy to share our #ACL2024 paper: "Causal Estimation of Memorisation Profiles" 🎉 Drawing from econometrics, we propose a principled and efficient method to estimate memorisation using only observational data! See 🧵 +Clara Isabel Meister, Thomas Hofmann, Andreas Vlachos, Tiago Pimentel

Happy to share our #ACL2024 paper: "Causal Estimation of Memorisation Profiles" 🎉

Drawing from econometrics, we propose a principled and efficient method to estimate memorisation using only observational data! See 🧵

+<a href="/clara__meister/">Clara Isabel Meister</a>, Thomas Hofmann, <a href="/vlachos_nlp/">Andreas Vlachos</a>, <a href="/tpimentelms/">Tiago Pimentel</a>
Tiago Pimentel (@tpimentelms) 's Twitter Profile Photo

Do you want to quantify your model’s counterfactual memorisation using only observational data? Our #ACL2024NLP paper proposes an efficient method to do it :) No interventions required! You can also see how memorisation evolves across training! Check out Pietro's🧵for details :)

Tiago Pimentel (@tpimentelms) 's Twitter Profile Photo

Hey #NLProc and #psycholing Twitter :) We found a bug in how we're all computing contextual word probabilities and wrote a paper about it! It's a very easy fix, so please check it out! +Clara Isabel Meister

Hey #NLProc and #psycholing Twitter :) We found a bug in how we're all computing contextual word probabilities and wrote a paper about it! It's a very easy fix, so please check it out!

+<a href="/clara__meister/">Clara Isabel Meister</a>
Pietro Lesci (@pietro_lesci) 's Twitter Profile Photo

Super excited and grateful that our paper received the best paper award at #ACL2024 🎉 Huge thanks to my fantastic co-authors — Clara Isabel Meister, Thomas Hofmann, Andreas Vlachos, and Tiago Pimentel — the reviewers that recommended our paper, and the award committee #ACL2024NLP

Super excited and grateful that our paper received the best paper award at #ACL2024 🎉

Huge thanks to my fantastic co-authors — <a href="/clara__meister/">Clara Isabel Meister</a>, Thomas Hofmann, <a href="/vlachos_nlp/">Andreas Vlachos</a>, and <a href="/tpimentelms/">Tiago Pimentel</a> — the reviewers that recommended our paper, and the award committee #ACL2024NLP
Tiago Pimentel (@tpimentelms) 's Twitter Profile Photo

A string may get 17 times less probability if tokenised as two symbols (e.g., ⟨he, llo⟩) than as one (e.g., ⟨hello⟩)—by an LM trained from scratch in each situation! Our #acl2025nlp paper proposes an observational method to estimate this causal effect! Longer thread soon!

A string may get 17 times less probability if tokenised as two symbols (e.g., ⟨he, llo⟩) than as one (e.g., ⟨hello⟩)—by an LM trained from scratch in each situation! Our #acl2025nlp paper proposes an observational method to estimate this causal effect! Longer thread soon!