Holger Schwenk (@schwenkholger) 's Twitter Profile
Holger Schwenk

@schwenkholger

Full professor and senior research scientist at Meta AI Research

ID: 1078719052263104512

calendar_today28-12-2018 18:27:18

26 Tweet

666 Takipçi

54 Takip Edilen

Graham Neubig (@gneubig) 's Twitter Profile Photo

Excellent categorized machine translation reading list by Tsinghua University NLP group: github.com/THUNLP-MT/MT-R… Excellent coverage of modern papers -- it should be a good first stop if you want to learn about the state-of-the-art in a particular sub-topic of MT.

Holger Schwenk (@schwenkholger) 's Twitter Profile Photo

WikiMatrix: large-scale bitext extraction from Wikipedia: 1620 language pairs in 85 languages, 135M parallel sentences, Systematic evaluation on TED With @VishravC , Shuo Sun, Hongyu and Paco Guzmán Paper: arxiv.org/abs/1907.05791 Data: github.com/facebookresear…

WikiMatrix: large-scale bitext extraction from Wikipedia:
1620 language pairs in 85 languages, 135M parallel sentences,
Systematic evaluation on TED
With @VishravC , Shuo Sun, Hongyu and <a href="/guzmanhe/">Paco Guzmán</a> 
Paper: arxiv.org/abs/1907.05791
Data: github.com/facebookresear…
Holger Schwenk (@schwenkholger) 's Twitter Profile Photo

I'm looking for a good and freely available SENTENCE SPLITTER FOR THAI. Any experience or recommendations ? This would enable us to include Thai in the Wikimatrix project (arxiv.org/abs/1907.05791) and provide bitexts with Thai and potentially 95 other languages !

Holger Schwenk (@schwenkholger) 's Twitter Profile Photo

We are happy to share MLQA: Evaluating Cross-lingual Extractive Question Answering. A new multi-way aligned extractive QA evaluation benchmark in 7 languages. We also provided several baselines for zero-short transfer from English.

AI at Meta (@aiatmeta) 's Twitter Profile Photo

We’re sharing a new benchmark called MLQA to help extend performance improvements in extractive question-answering (QA) to more languages. It contains thousands of QA instances in Arabic, German, Hindi, Spanish, Vietnamese, and Simplified Chinese. ai.facebook.com/blog/mlqa-eval…

We’re sharing a new benchmark called MLQA to help extend performance improvements in extractive question-answering (QA) to more languages. It contains thousands of QA instances in Arabic, German, Hindi, Spanish, Vietnamese, and Simplified Chinese. ai.facebook.com/blog/mlqa-eval…
Holger Schwenk (@schwenkholger) 's Twitter Profile Photo

Very happy to be part of a great team which used the CCMatrix data to train an NMT system for 100x100 languages. We’re releasing the model, training, and evaluation setup to help other researchers reproduce and further advance multilingual models

Holger Schwenk (@schwenkholger) 's Twitter Profile Photo

LASER3 multilingual sentence encoders for 200 languages are available here: github.com/facebookresear… together with a new mining library stopes and training code This is part of the NLLB project to translate among 200 languages github.com/facebookresear…

Paul-Ambroise Duquenne (@duquenne_pa) 's Twitter Profile Photo

We release SpeechMatrix, 418k hours of parallel speech in 136 langs mined from European Parliament recordings. We provide bilingual speech-to-speech baselines as well as multilingual training with mixture-of-experts, shorturl.at/amu28 Holger Schwenk Hongyu Gong Benoît Sagot

We release SpeechMatrix, 418k hours of parallel speech in 136 langs mined from European Parliament recordings. We provide bilingual speech-to-speech baselines as well as multilingual training with mixture-of-experts, shorturl.at/amu28
<a href="/SchwenkHolger/">Holger Schwenk</a> <a href="/AnnFirst111/">Hongyu Gong</a> <a href="/bensagot/">Benoît Sagot</a>