2021
pdf
bib
abs
MKQA: A Linguistically Diverse Benchmark for Multilingual Open Domain Question Answering
Shayne Longpre
|
Yi Lu
|
Joachim Daiber
Transactions of the Association for Computational Linguistics, Volume 9
Progress in cross-lingual modeling depends on challenging, realistic, and diverse evaluation sets. We introduce Multilingual Knowledge Questions and Answers (MKQA), an open- domain question answering evaluation set comprising 10k question-answer pairs aligned across 26 typologically diverse languages (260k question-answer pairs in total). Answers are based on heavily curated, language- independent data representation, making results comparable across languages and independent of language-specific passages. With 26 languages, this dataset supplies the widest range of languages to-date for evaluating question answering. We benchmark a variety of state- of-the-art methods and baselines for generative and extractive question answering, trained on Natural Questions, in zero shot and translation settings. Results indicate this dataset is challenging even in English, but especially in low-resource languages.1
2016
pdf
bib
abs
Universal Reordering via Linguistic Typology
Joachim Daiber
|
Miloš Stanojević
|
Khalil Sima’an
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
In this paper we explore the novel idea of building a single universal reordering model from English to a large number of target languages. To build this model we exploit typological features of word order for a large number of target languages together with source (English) syntactic features and we train this model on a single combined parallel corpus representing all (22) involved language pairs. We contribute experimental evidence for the usefulness of linguistically defined typological features for building such a model. When the universal reordering model is used for preordering followed by monotone translation (no reordering inside the decoder), our experiments show that this pipeline gives comparable or improved translation performance with a phrase-based baseline for a large number of language pairs (12 out of 22) from diverse language families.
pdf
bib
abs
The Denoised Web Treebank: Evaluating Dependency Parsing under Noisy Input Conditions
Joachim Daiber
|
Rob van der Goot
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
We introduce the Denoised Web Treebank: a treebank including a normalization layer and a corresponding evaluation metric for dependency parsing of noisy text, such as Tweets. This benchmark enables the evaluation of parser robustness as well as text normalization methods, including normalization as machine translation and unsupervised lexical normalization, directly on syntactic trees. Experiments show that text normalization together with a combination of domain-specific and generic part-of-speech taggers can lead to a significant improvement in parsing accuracy on this test set.
pdf
bib
Examining the Relationship between Preordering and Word Order Freedom in Machine Translation
Joachim Daiber
|
Miloš Stanojević
|
Wilker Aziz
|
Khalil Sima’an
Proceedings of the First Conference on Machine Translation: Volume 1, Research Papers
2015
pdf
bib
Machine translation with source-predicted target morphology
Joachim Daiber
|
Khalil Sima’an
Proceedings of Machine Translation Summit XV: Papers
pdf
bib
Splitting Compounds by Semantic Analogy
Joachim Daiber
|
Lautaro Quiroz
|
Roger Wechsler
|
Stella Frank
Proceedings of the 1st Deep Machine Translation Workshop
pdf
bib
Delimiting Morphosyntactic Search Space with Source-Side Reordering Models
Joachim Daiber
|
Khalil Sima’an
Proceedings of the 1st Deep Machine Translation Workshop
2012
pdf
bib
abs
Evaluating the Impact of Phrase Recognition on Concept Tagging
Pablo Mendes
|
Joachim Daiber
|
Rohana Rajapakse
|
Felix Sasaki
|
Christian Bizer
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
We have developed DBpedia Spotlight, a flexible concept tagging system that is able to annotate entities, topics and other terms in natural language text. The system starts by recognizing phrases to annotate in the input text, and subsequently disambiguates them to a reference knowledge base extracted from Wikipedia. In this paper we evaluate the impact of the phrase recognition step on the ability of the system to correctly reproduce the annotations of a gold standard in an unsupervised setting. We argue that a combination of techniques is needed, and we evaluate a number of alternatives according to an existing evaluation set.