Fusion of linguistic, neural and sentence-transformer features for improved term alignment

Andraz Repar, Senja Pollak, Matej Ulčar, Boshko Koloski


Abstract
Crosslingual terminology alignment task has many practical applications. In this work, we propose an aligning method for the shared task of the 15th Workshop on Building and Using Comparable Corpora. Our method combines several different approaches into one cohesive machine learning model, based on SVM. From shared-task specific and external sources, we crafted four types of features: cognate-based, dictionary-based, embedding-based, and combined features, which combine aspects of the other three types. We added a post-processing re-scoring method, which reducess the effect of hubness, where some terms are nearest neighbours of many other terms. We achieved the average precision score of 0.833 on the English-French training set of the shared task.
Anthology ID:
2022.bucc-1.9
Volume:
Proceedings of the BUCC Workshop within LREC 2022
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Reinhard Rapp, Pierre Zweigenbaum, Serge Sharoff
Venue:
BUCC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
61–66
Language:
URL:
https://aclanthology.org/2022.bucc-1.9
DOI:
Bibkey:
Cite (ACL):
Andraz Repar, Senja Pollak, Matej Ulčar, and Boshko Koloski. 2022. Fusion of linguistic, neural and sentence-transformer features for improved term alignment. In Proceedings of the BUCC Workshop within LREC 2022, pages 61–66, Marseille, France. European Language Resources Association.
Cite (Informal):
Fusion of linguistic, neural and sentence-transformer features for improved term alignment (Repar et al., BUCC 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.bucc-1.9.pdf