The Typology of Ellipsis: A Corpus for Linguistic Analysis and Machine Learning Applications

Damir Cavar, Ludovic Mompelat, Muhammad Abdo


Abstract
State-of-the-art (SotA) Natural Language Processing (NLP) technology faces significant challenges with constructions that contain ellipses. Although theoretically well-documented and understood, there needs to be more sufficient cross-linguistic language resources to document, study, and ultimately engineer NLP solutions that can adequately provide analyses for ellipsis constructions. This article describes the typological data set on ellipsis that we created for currently seventeen languages. We demonstrate how SotA parsers based on a variety of syntactic frameworks fail to parse sentences with ellipsis, and in fact, probabilistic, neural, and Large Language Models (LLM) do so, too. We demonstrate experiments that focus on detecting sentences with ellipsis, predicting the position of elided elements, and predicting elided surface forms in the appropriate positions. We show that cross-linguistic variation of ellipsis-related phenomena has different consequences for the architecture of NLP systems.
Anthology ID:
2024.sigtyp-1.6
Volume:
Proceedings of the 6th Workshop on Research in Computational Linguistic Typology and Multilingual NLP
Month:
March
Year:
2024
Address:
St. Julian's, Malta
Editors:
Michael Hahn, Alexey Sorokin, Ritesh Kumar, Andreas Shcherbakov, Yulia Otmakhova, Jinrui Yang, Oleg Serikov, Priya Rani, Edoardo M. Ponti, Saliha Muradoğlu, Rena Gao, Ryan Cotterell, Ekaterina Vylomova
Venues:
SIGTYP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
46–54
Language:
URL:
https://aclanthology.org/2024.sigtyp-1.6
DOI:
Bibkey:
Cite (ACL):
Damir Cavar, Ludovic Mompelat, and Muhammad Abdo. 2024. The Typology of Ellipsis: A Corpus for Linguistic Analysis and Machine Learning Applications. In Proceedings of the 6th Workshop on Research in Computational Linguistic Typology and Multilingual NLP, pages 46–54, St. Julian's, Malta. Association for Computational Linguistics.
Cite (Informal):
The Typology of Ellipsis: A Corpus for Linguistic Analysis and Machine Learning Applications (Cavar et al., SIGTYP-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.sigtyp-1.6.pdf