Mandarinograd: A Chinese Collection of Winograd Schemas

Timothée Bernard, Ting Han


Abstract
This article introduces Mandarinograd, a corpus of Winograd Schemas in Mandarin Chinese. Winograd Schemas are particularly challenging anaphora resolution problems, designed to involve common sense reasoning and to limit the biases and artefacts commonly found in natural language understanding datasets. Mandarinograd contains the schemas in their traditional form, but also as natural language inference instances (ENTAILMENT or NO ENTAILMENT pairs) as well as in their fully disambiguated candidate forms. These two alternative representations are often used by modern solvers but existing datasets present automatically converted items that sometimes contain syntactic or semantic anomalies. We detail the difficulties faced when building this corpus and explain how weavoided the anomalies just mentioned. We also show that Mandarinograd is resistant to a statistical method based on a measure of word association.
Anthology ID:
2020.lrec-1.3
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
21–26
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.3
DOI:
Bibkey:
Cite (ACL):
Timothée Bernard and Ting Han. 2020. Mandarinograd: A Chinese Collection of Winograd Schemas. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 21–26, Marseille, France. European Language Resources Association.
Cite (Informal):
Mandarinograd: A Chinese Collection of Winograd Schemas (Bernard & Han, LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.3.pdf