Corpus Based Analysis for Multilingual Terminology Entry Compounding

Andrejs Vasiljevs, Kaspars Balodis


Abstract
This paper proposes statistical analysis methods for improvement of terminology entry compounding. Terminology entry compounding is a mechanism that identifies matching entries across multiple multilingual terminology collections. Bilingual or trilingual term entries are unified in compounded multilingual entry. We suggest that corpus analysis can improve entry compounding results by analysing contextual terms of given term in the corpus data. Proposed algorithm is described. It is implemented in an experimental setup. Results of experiment on compounding of Latvian and Lithuanian terminology resources are provided. These results encourage further research for different language pairs and in different domains.
Anthology ID:
L10-1591
Volume:
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Month:
May
Year:
2010
Address:
Valletta, Malta
Editors:
Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/855_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Andrejs Vasiljevs and Kaspars Balodis. 2010. Corpus Based Analysis for Multilingual Terminology Entry Compounding. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), Valletta, Malta. European Language Resources Association (ELRA).
Cite (Informal):
Corpus Based Analysis for Multilingual Terminology Entry Compounding (Vasiljevs & Balodis, LREC 2010)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/855_Paper.pdf