Masked Measurement Prediction: Learning to Jointly Predict Quantities and Units from Textual Context

Daniel Spokoyny, Ivan Lee, Zhao Jin, Taylor Berg-Kirkpatrick


Abstract
Physical measurements constitute a large portion of numbers in academic papers, engineering reports, and web tables. Current benchmarks fall short of properly evaluating numeracy of pretrained language models on measurements, hindering research on developing new methods and applying them to numerical tasks. To that end, we introduce a novel task, Masked Measurement Prediction (MMP), where a model learns to reconstruct a number together with its associated unit given masked text. MMP is useful for both training new numerically informed models as well as evaluating numeracy of existing systems. To address this task, we introduce a new Generative Masked Measurement (GeMM) model that jointly learns to predict numbers along with their units. We perform fine-grained analyses comparing our model with various ablations and baselines. We use linear probing of traditional pretrained transformer models (RoBERTa) to show that they significantly underperform jointly trained number-unit models, highlighting the difficulty of this new task and the benefits of our proposed pretraining approach. We hope this framework accelerates the progress towards building more robust numerical reasoning systems in the future.
Anthology ID:
2022.findings-naacl.2
Volume:
Findings of the Association for Computational Linguistics: NAACL 2022
Month:
July
Year:
2022
Address:
Seattle, United States
Editors:
Marine Carpuat, Marie-Catherine de Marneffe, Ivan Vladimir Meza Ruiz
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
17–29
Language:
URL:
https://aclanthology.org/2022.findings-naacl.2
DOI:
10.18653/v1/2022.findings-naacl.2
Bibkey:
Cite (ACL):
Daniel Spokoyny, Ivan Lee, Zhao Jin, and Taylor Berg-Kirkpatrick. 2022. Masked Measurement Prediction: Learning to Jointly Predict Quantities and Units from Textual Context. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 17–29, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):
Masked Measurement Prediction: Learning to Jointly Predict Quantities and Units from Textual Context (Spokoyny et al., Findings 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.findings-naacl.2.pdf
Video:
 https://aclanthology.org/2022.findings-naacl.2.mp4
Data
WikiConvert