Extracting Victim Counts from Text

Mian Zhong, Shehzaad Dhuliawala, Niklas Stoehr


Abstract
Decision-makers in the humanitarian sector rely on timely and exact information during crisis events. Knowing how many civilians were injured during an earthquake is vital to allocate aids properly. Information about such victim counts are however often only available within full-text event descriptions from newspapers and other reports. Extracting numbers from text is challenging: numbers have different formats and may require numeric reasoning. This renders purely tagging approaches insufficient. As a consequence, fine-grained counts of injured, displaced, or abused victims beyond fatalities are often not extracted and remain unseen. We cast victim count extraction as a question answering (QA) task with a regression or classification objective. We compare tagging approaches: regex, dependency parsing, semantic role labeling, and advanced text-to-text models. Beyond model accuracy, we analyze extraction reliability and robustness which are key for this sensitive task. In particular, we discuss model calibration and investigate out-of-distribution and few-shot performance. Ultimately, we make a comprehensive recommendation on which model to select for different desiderata and data domains. Our work is among the first to apply numeracy-focused large language models in a real-world use case with a positive impact.
Anthology ID:
2023.eacl-main.141
Volume:
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics
Month:
May
Year:
2023
Address:
Dubrovnik, Croatia
Editors:
Andreas Vlachos, Isabelle Augenstein
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1925–1940
Language:
URL:
https://aclanthology.org/2023.eacl-main.141
DOI:
10.18653/v1/2023.eacl-main.141
Bibkey:
Cite (ACL):
Mian Zhong, Shehzaad Dhuliawala, and Niklas Stoehr. 2023. Extracting Victim Counts from Text. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 1925–1940, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):
Extracting Victim Counts from Text (Zhong et al., EACL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.eacl-main.141.pdf
Video:
 https://aclanthology.org/2023.eacl-main.141.mp4