Building Dataset for Grounding of Formulae — Annotating Coreference Relations Among Math Identifiers

Takuto Asakura, Yusuke Miyao, Akiko Aizawa


Abstract
Grounding the meaning of each symbol in math formulae is important for automated understanding of scientific documents. Generally speaking, the meanings of math symbols are not necessarily constant, and the same symbol is used in multiple meanings. Therefore, coreference relations between symbols need to be identified for grounding, and the task has aspects of both description alignment and coreference analysis. In this study, we annotated 15 papers selected from arXiv.org with the grounding information. In total, 12,352 occurrences of math identifiers in these papers were annotated, and all coreference relations between them were made explicit in each paper. The constructed dataset shows that regardless of the ambiguity of symbols in math formulae, coreference relations can be labeled with a high inter-annotator agreement. The constructed dataset enables us to achieve automation of formula grounding, and in turn, make deeper use of the knowledge in scientific documents using techniques such as math information extraction. The built grounding dataset is available at https://sigmathling.kwarc.info/resources/grounding- dataset/.
Anthology ID:
2022.lrec-1.519
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
4851–4858
Language:
URL:
https://aclanthology.org/2022.lrec-1.519
DOI:
Bibkey:
Cite (ACL):
Takuto Asakura, Yusuke Miyao, and Akiko Aizawa. 2022. Building Dataset for Grounding of Formulae — Annotating Coreference Relations Among Math Identifiers. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 4851–4858, Marseille, France. European Language Resources Association.
Cite (Informal):
Building Dataset for Grounding of Formulae — Annotating Coreference Relations Among Math Identifiers (Asakura et al., LREC 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.lrec-1.519.pdf
Code
 wtsnjp/MioGatto