UKPLab : 🌐 Unleash the Power of Multilingual Pretrained Lan • TwiCopy

4 months ago

🌐 Unleash the Power of Multilingual Pretrained Language Models (mPLMs) in #NLProc with UROMAN 🚀

Discover more about our latest #EMNLP2023 Findings paper in this thread! (1/🧵)

📃arxiv.org/abs/2304.08865

thumb_up_off_alt13

repeat2

account_circle

UKP Lab

4 months ago

Large mPLMs are the gold standard for cross-lingual transfer. But deployment to many languages faces challenges – like pretraining data scarcity, vocabulary size, and parameter limitations. (2/🧵) #NLProcessing #Multilingual #EMNLP2023

account_circle

UKP Lab

4 months ago

🤔 Breaking Down the Challenges 🌐
The Vocabulary size and parameter budget are real roadblocks to deploying mPLMs across diverse languages.

The key? 🗝️ Transliteration on a massive scale! Using UROMAN (Hermjakob et al. 2018)! A massive transliteration library! (3/🧵) #EMNLP2023

account_circle

UKP Lab

4 months ago

📊 #UROMAN vs. the Rest: A Showdown of Transliteration Titans 💪

We're putting UROMAN to the test against language-specific and manually curated transliterators. Which one emerges victorious for adapting multilingual PLMs? Stay tuned for the showdown results! (4/🧵) #EMNLP2023

account_circle

UKP Lab

4 months ago

🔄 Data and Parameter Efficiency: The Magic Behind Adaptation 🧙‍♂️
How do we adapt mPLMs to romanized and non-romanized corpora of 14 low-resource languages? We explore a plethora of strategies – data and parameter-efficient ones! (5/🧵) #EMNLP2023

account_circle

UKP Lab

4 months ago

💡 Results Are In: UROMAN Shines in the Toughest Scenarios! 🌟

UROMAN-based transliteration offers strong performance, especially in challenging setups - languages with unseen scripts and limited training data. Discover how this tool outshines the competition! (6/🧵) #EMNLP2023

account_circle

UKP Lab

4 months ago

🪄 The Tokenizer Twist: Outperforming Non-Transliteration Methods 🔄
But wait, there's more! An improved tokenizer based on romanized data can even outperform non-transliteration methods in most languages. (7/🧵) #EMNLP2023

account_circle

UKP Lab

4 months ago

Excited about the potential of UROMAN in reshaping #NLProc ? Share your thoughts, questions, and insights 🗣️. Let's spark a conversation about the future of multilingual language models and cross-lingual transfer in NLP 🤝 ! (8/🧵) #EMNLP2023