Modeling Mathematical Notation Semantics in Academic Papers

Hwiyeol Jo, Dongyeop Kang, Andrew Head, Marti A. Hearst


Abstract
Natural language models often fall short when understanding and generating mathematical notation. What is not clear is whether these shortcomings are due to fundamental limitations of the models, or the absence of appropriate tasks. In this paper, we explore the extent to which natural language models can learn semantics between mathematical notation and their surrounding text. We propose two notation prediction tasks, and train a model that selectively masks notation tokens and encodes left and/or right sentences as context. Compared to baseline models trained by masked language modeling, our method achieved significantly better performance at the two tasks, showing that this approach is a good first step towards modeling mathematical texts. However, the current models rarely predict unseen symbols correctly, and token-level predictions are more accurate than symbol-level predictions, indicating more work is needed to represent structural patterns. Based on the results, we suggest future works toward modeling mathematical texts.
Anthology ID:
2021.findings-emnlp.266
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2021
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Venue:
Findings
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
3102–3115
Language:
URL:
https://aclanthology.org/2021.findings-emnlp.266
DOI:
10.18653/v1/2021.findings-emnlp.266
Bibkey:
Cite (ACL):
Hwiyeol Jo, Dongyeop Kang, Andrew Head, and Marti A. Hearst. 2021. Modeling Mathematical Notation Semantics in Academic Papers. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 3102–3115, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Modeling Mathematical Notation Semantics in Academic Papers (Jo et al., Findings 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.findings-emnlp.266.pdf
Video:
 https://aclanthology.org/2021.findings-emnlp.266.mp4
Data
S2ORC