gENder-IT: An Annotated English-Italian Parallel Challenge Set for Cross-Linguistic Natural Gender Phenomena

Eva Vanmassenhove, Johanna Monti


Abstract
Languages differ in terms of the absence or presence of gender features, the number of gender classes and whether and where gender features are explicitly marked. These cross-linguistic differences can lead to ambiguities that are difficult to resolve, especially for sentence-level MT systems. The identification of ambiguity and its subsequent resolution is a challenging task for which currently there aren’t any specific resources or challenge sets available. In this paper, we introduce gENder-IT, an English–Italian challenge set focusing on the resolution of natural gender phenomena by providing word-level gender tags on the English source side and multiple gender alternative translations, where needed, on the Italian target side.
Anthology ID:
2021.gebnlp-1.1
Volume:
Proceedings of the 3rd Workshop on Gender Bias in Natural Language Processing
Month:
August
Year:
2021
Address:
Online
Editors:
Marta Costa-jussa, Hila Gonen, Christian Hardmeier, Kellie Webster
Venue:
GeBNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–7
Language:
URL:
https://aclanthology.org/2021.gebnlp-1.1
DOI:
10.18653/v1/2021.gebnlp-1.1
Bibkey:
Cite (ACL):
Eva Vanmassenhove and Johanna Monti. 2021. gENder-IT: An Annotated English-Italian Parallel Challenge Set for Cross-Linguistic Natural Gender Phenomena. In Proceedings of the 3rd Workshop on Gender Bias in Natural Language Processing, pages 1–7, Online. Association for Computational Linguistics.
Cite (Informal):
gENder-IT: An Annotated English-Italian Parallel Challenge Set for Cross-Linguistic Natural Gender Phenomena (Vanmassenhove & Monti, GeBNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.gebnlp-1.1.pdf
Data
gENder-IT