Towards Functionally Similar Corpus Resources for Translation

Maria Kunilovskaya, Serge Sharoff


Abstract
The paper describes a computational approach to produce functionally comparable monolingual corpus resources for translation studies and contrastive analysis. We exploit a text-external approach, based on a set of Functional Text Dimensions to model text functions, so that each text can be represented as a vector in a multidimensional space of text functions. These vectors can be used to find reasonably homogeneous subsets of functionally similar texts across different corpora. Our models for predicting text functions are based on recurrent neural networks and traditional feature-based machine learning approaches. In addition to using the categories of the British National Corpus as our test case, we investigated the functional comparability of the English parts from the two parallel corpora: CroCo (English-German) and RusLTC (English-Russian) and applied our models to define functionally similar clusters in them. Our results show that the Functional Text Dimensions provide a useful description for text categories, while allowing a more flexible representation for texts with hybrid functions.
Anthology ID:
R19-1069
Volume:
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)
Month:
September
Year:
2019
Address:
Varna, Bulgaria
Editors:
Ruslan Mitkov, Galia Angelova
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
583–592
Language:
URL:
https://aclanthology.org/R19-1069
DOI:
10.26615/978-954-452-056-4_069
Bibkey:
Cite (ACL):
Maria Kunilovskaya and Serge Sharoff. 2019. Towards Functionally Similar Corpus Resources for Translation. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019), pages 583–592, Varna, Bulgaria. INCOMA Ltd..
Cite (Informal):
Towards Functionally Similar Corpus Resources for Translation (Kunilovskaya & Sharoff, RANLP 2019)
Copy Citation:
PDF:
https://aclanthology.org/R19-1069.pdf