profile-img
UKP Lab

@UKPLab

The Ubiquitous Knowledge Processing Lab researches Natural Language Processing, Text Mining, eLearning, and Digital Humanities · @CS_TUDarmstadt · @TUDarmstadt

calendar_today17-06-2014 09:30:24

1,0K Tweets

2,3K Followers

398 Following

UKP Lab(@UKPLab) 's Twitter Profile Photo

🌐 Unleash the Power of Multilingual Pretrained Language Models (mPLMs) in with UROMAN 🚀

Discover more about our latest Findings paper in this thread! (1/🧵)

📃arxiv.org/abs/2304.08865

🌐 Unleash the Power of Multilingual Pretrained Language Models (mPLMs) in #NLProc with UROMAN 🚀 Discover more about our latest #EMNLP2023 Findings paper in this thread! (1/🧵) 📃arxiv.org/abs/2304.08865
account_circle
UKP Lab(@UKPLab) 's Twitter Profile Photo

Large mPLMs are the gold standard for cross-lingual transfer. But deployment to many languages faces challenges – like pretraining data scarcity, vocabulary size, and parameter limitations. (2/🧵)

account_circle
UKP Lab(@UKPLab) 's Twitter Profile Photo

🤔 Breaking Down the Challenges 🌐
The Vocabulary size and parameter budget are real roadblocks to deploying mPLMs across diverse languages.

The key? 🗝️ Transliteration on a massive scale! Using UROMAN (Hermjakob et al. 2018)! A massive transliteration library! (3/🧵)

🤔 Breaking Down the Challenges 🌐 The Vocabulary size and parameter budget are real roadblocks to deploying mPLMs across diverse languages. The key? 🗝️ Transliteration on a massive scale! Using UROMAN (Hermjakob et al. 2018)! A massive transliteration library! (3/🧵) #EMNLP2023
account_circle
UKP Lab(@UKPLab) 's Twitter Profile Photo

📊 vs. the Rest: A Showdown of Transliteration Titans 💪

We're putting UROMAN to the test against language-specific and manually curated transliterators. Which one emerges victorious for adapting multilingual PLMs? Stay tuned for the showdown results! (4/🧵)

account_circle
UKP Lab(@UKPLab) 's Twitter Profile Photo

🔄 Data and Parameter Efficiency: The Magic Behind Adaptation 🧙‍♂️
How do we adapt mPLMs to romanized and non-romanized corpora of 14 low-resource languages? We explore a plethora of strategies – data and parameter-efficient ones! (5/🧵)

account_circle
UKP Lab(@UKPLab) 's Twitter Profile Photo

💡 Results Are In: UROMAN Shines in the Toughest Scenarios! 🌟

UROMAN-based transliteration offers strong performance, especially in challenging setups - languages with unseen scripts and limited training data. Discover how this tool outshines the competition! (6/🧵)

💡 Results Are In: UROMAN Shines in the Toughest Scenarios! 🌟 UROMAN-based transliteration offers strong performance, especially in challenging setups - languages with unseen scripts and limited training data. Discover how this tool outshines the competition! (6/🧵) #EMNLP2023
account_circle
UKP Lab(@UKPLab) 's Twitter Profile Photo

🪄 The Tokenizer Twist: Outperforming Non-Transliteration Methods 🔄
But wait, there's more! An improved tokenizer based on romanized data can even outperform non-transliteration methods in most languages. (7/🧵)

account_circle
UKP Lab(@UKPLab) 's Twitter Profile Photo

Excited about the potential of UROMAN in reshaping ? Share your thoughts, questions, and insights 🗣️. Let's spark a conversation about the future of multilingual language models and cross-lingual transfer in NLP 🤝 ! (8/🧵)

account_circle