Holger Schwenk (@schwenkholger) Twitter Tweets • TwiCopy

Holger Schwenk

@schwenkholger

+ Follow

Full professor and senior research scientist at Meta AI Research

ID: 1078719052263104512

calendar_today28-12-2018 18:27:18

26 Tweet

666 Takipçi

54 Takip Edilen

Holger Schwenk

@schwenkholger

7 years ago

I think that it is time to have my own Twitter account ... #myfirstTweet

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Excellent categorized machine translation reading list by Tsinghua University NLP group: github.com/THUNLP-MT/MT-R… Excellent coverage of modern papers -- it should be a good first stop if you want to learn about the state-of-the-art in a particular sub-topic of MT.

thumb_up_off_alt359

chat_bubble_outline2

repeat120

shareShare

Holger Schwenk

@schwenkholger

7 years ago

The code and models to calculate multilingual sentence embeddings for 93 languages is now available, joint work with Mikel Artetxe

thumb_up_off_alt62

chat_bubble_outline1

repeat11

shareShare

Holger Schwenk

@schwenkholger

6 years ago

WikiMatrix: large-scale bitext extraction from Wikipedia: 1620 language pairs in 85 languages, 135M parallel sentences, Systematic evaluation on TED With @VishravC , Shuo Sun, Hongyu and Paco Guzmán Paper: arxiv.org/abs/1907.05791 Data: github.com/facebookresear…

thumb_up_off_alt217

chat_bubble_outline1

repeat79

shareShare

Holger Schwenk

@schwenkholger

6 years ago

I'm looking for a good and freely available SENTENCE SPLITTER FOR THAI. Any experience or recommendations ? This would enable us to include Thai in the Wikimatrix project (arxiv.org/abs/1907.05791) and provide bitexts with Thai and potentially 95 other languages !

thumb_up_off_alt10

chat_bubble_outline3

repeat2

shareShare

Holger Schwenk

@schwenkholger

6 years ago

We are happy to share MLQA: Evaluating Cross-lingual Extractive Question Answering. A new multi-way aligned extractive QA evaluation benchmark in 7 languages. We also provided several baselines for zero-short transfer from English.

thumb_up_off_alt14

chat_bubble_outline1

repeat2

shareShare

AI at Meta

@aiatmeta

6 years ago

We’re sharing a new benchmark called MLQA to help extend performance improvements in extractive question-answering (QA) to more languages. It contains thousands of QA instances in Arabic, German, Hindi, Spanish, Vietnamese, and Simpliﬁed Chinese. ai.facebook.com/blog/mlqa-eval…

thumb_up_off_alt200

chat_bubble_outline2

repeat60

shareShare

Holger Schwenk

@schwenkholger

5 years ago

Very happy to be part of a great team which used the CCMatrix data to train an NMT system for 100x100 languages. We’re releasing the model, training, and evaluation setup to help other researchers reproduce and further advance multilingual models

thumb_up_off_alt56

chat_bubble_outline2

repeat10

shareShare

Holger Schwenk

@schwenkholger

3 years ago

Almost 1400h of freely available speech-to-speech translations in 4 languages !

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

Holger Schwenk

@schwenkholger

3 years ago

LASER3 multilingual sentence encoders for 200 languages are available here: github.com/facebookresear… together with a new mining library stopes and training code This is part of the NLLB project to translate among 200 languages github.com/facebookresear…

thumb_up_off_alt39

chat_bubble_outline0

repeat7

shareShare

Paul-Ambroise Duquenne

@duquenne_pa

3 years ago

We release SpeechMatrix, 418k hours of parallel speech in 136 langs mined from European Parliament recordings. We provide bilingual speech-to-speech baselines as well as multilingual training with mixture-of-experts, shorturl.at/amu28 Holger Schwenk Hongyu Gong Benoît Sagot

thumb_up_off_alt18

chat_bubble_outline0

repeat3

shareShare

Holger Schwenk

Holger Schwenk

Graham Neubig

Holger Schwenk

Holger Schwenk

Holger Schwenk

Holger Schwenk

AI at Meta

Holger Schwenk

Holger Schwenk

Holger Schwenk

Paul-Ambroise Duquenne