Weighted Set-Theoretic Alignment of Comparable Sentences

Andoni Azpeitia, Thierry Etchegoyhen, Eva Martínez Garcia


Abstract
This article presents the STACCw system for the BUCC 2017 shared task on parallel sentence extraction from comparable corpora. The original STACC approach, based on set-theoretic operations over bags of words, had been previously shown to be efficient and portable across domains and alignment scenarios. Wedescribe an extension of this approach with a new weighting scheme and show that it provides significant improvements on the datasets provided for the shared task.
Anthology ID:
W17-2508
Volume:
Proceedings of the 10th Workshop on Building and Using Comparable Corpora
Month:
August
Year:
2017
Address:
Vancouver, Canada
Editors:
Serge Sharoff, Pierre Zweigenbaum, Reinhard Rapp
Venue:
BUCC
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
41–45
Language:
URL:
https://aclanthology.org/W17-2508
DOI:
10.18653/v1/W17-2508
Bibkey:
Cite (ACL):
Andoni Azpeitia, Thierry Etchegoyhen, and Eva Martínez Garcia. 2017. Weighted Set-Theoretic Alignment of Comparable Sentences. In Proceedings of the 10th Workshop on Building and Using Comparable Corpora, pages 41–45, Vancouver, Canada. Association for Computational Linguistics.
Cite (Informal):
Weighted Set-Theoretic Alignment of Comparable Sentences (Azpeitia et al., BUCC 2017)
Copy Citation:
PDF:
https://aclanthology.org/W17-2508.pdf