Ellipsis in Chinese AMR Corpus

Yihuan Liu, Bin Li, Peiyi Yan, Li Song, Weiguang Qu


Abstract
Ellipsis is very common in language. It’s necessary for natural language processing to restore the elided elements in a sentence. However, there’s only a few corpora annotating the ellipsis, which draws back the automatic detection and recovery of the ellipsis. This paper introduces the annotation of ellipsis in Chinese sentences, using a novel graph-based representation Abstract Meaning Representation (AMR), which has a good mechanism to restore the elided elements manually. We annotate 5,000 sentences selected from Chinese TreeBank (CTB). We find that 54.98% of sentences have ellipses. 92% of the ellipses are restored by copying the antecedents’ concepts. and 12.9% of them are the new added concepts. In addition, we find that the elided element is a word or phrase in most cases, but sometimes only the head of a phrase or parts of a phrase, which is rather hard for the automatic recovery of ellipsis.
Anthology ID:
W19-3310
Volume:
Proceedings of the First International Workshop on Designing Meaning Representations
Month:
August
Year:
2019
Address:
Florence, Italy
Editors:
Nianwen Xue, William Croft, Jan Hajic, Chu-Ren Huang, Stephan Oepen, Martha Palmer, James Pustejovksy
Venue:
DMR
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
92–99
Language:
URL:
https://aclanthology.org/W19-3310
DOI:
10.18653/v1/W19-3310
Bibkey:
Cite (ACL):
Yihuan Liu, Bin Li, Peiyi Yan, Li Song, and Weiguang Qu. 2019. Ellipsis in Chinese AMR Corpus. In Proceedings of the First International Workshop on Designing Meaning Representations, pages 92–99, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Ellipsis in Chinese AMR Corpus (Liu et al., DMR 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-3310.pdf