Novel elicitation and annotation schemes for sentential and sub-sentential alignments of bitexts

Yong Xu, François Yvon


Abstract
Resources for evaluating sentence-level and word-level alignment algorithms are unsatisfactory. Regarding sentence alignments, the existing data is too scarce, especially when it comes to difficult bitexts, containing instances of non-literal translations. Regarding word-level alignments, most available hand-aligned data provide a complete annotation at the level of words that is difficult to exploit, for lack of a clear semantics for alignment links. In this study, we propose new methodologies for collecting human judgements on alignment links, which have been used to annotate 4 new data sets, at the sentence and at the word level. These will be released online, with the hope that they will prove useful to evaluate alignment software and quality estimation tools for automatic alignment. Keywords: Parallel corpora, Sentence Alignments, Word Alignments, Confidence Estimation
Anthology ID:
L16-1099
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
628–635
Language:
URL:
https://aclanthology.org/L16-1099
DOI:
Bibkey:
Cite (ACL):
Yong Xu and François Yvon. 2016. Novel elicitation and annotation schemes for sentential and sub-sentential alignments of bitexts. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 628–635, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
Novel elicitation and annotation schemes for sentential and sub-sentential alignments of bitexts (Xu & Yvon, LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1099.pdf
Data
Europarl