Variations on quantitative comparability measures and their evaluations on synthetic French-English comparable corpora

Guiyao Ke, Pierre-Francois Marteau, Gildas Menier


Abstract
Following the pioneering work by (CITATION), we address in this paper the analysis of a family of quantitative comparability measures dedicated to the construction and evaluation of topical comparable corpora. After recalling the definition of the quantitative comparability measure proposed by (CITATION), we develop some variants of this measure based primarily on the consideration that the occurrence frequencies of lexical entries and the number of their translations are important. We compare the respective advantages and disadvantages of these variants in the context of an evaluation framework that is based on the progressive degradation of the Europarl parallel corpus. The degradation is obtained by replacing either deterministically or randomly a varying amount of lines in blocks that compose partitions of the initial Europarl corpus. The impact of the coverage of bilingual dictionaries on these measures is also discussed and perspectives are finally presented.
Anthology ID:
L14-1139
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
133–139
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/120_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Guiyao Ke, Pierre-Francois Marteau, and Gildas Menier. 2014. Variations on quantitative comparability measures and their evaluations on synthetic French-English comparable corpora. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 133–139, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
Variations on quantitative comparability measures and their evaluations on synthetic French-English comparable corpora (Ke et al., LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/120_Paper.pdf