Construction of an English Dependency Corpus incorporating Compound Function Words

Akihiko Kato, Hiroyuki Shindo, Yuji Matsumoto


Abstract
The recognition of multiword expressions (MWEs) in a sentence is important for such linguistic analyses as syntactic and semantic parsing, because it is known that combining an MWE into a single token improves accuracy for various NLP tasks, such as dependency parsing and constituency parsing. However, MWEs are not annotated in Penn Treebank. Furthermore, when converting word-based dependency to MWE-aware dependency directly, one could combine nodes in an MWE into a single node. Nevertheless, this method often leads to the following problem: A node derived from an MWE could have multiple heads and the whole dependency structure including MWE might be cyclic. Therefore we converted a phrase structure to a dependency structure after establishing an MWE as a single subtree. This approach can avoid an occurrence of multiple heads and/or cycles. In this way, we constructed an English dependency corpus taking into account compound function words, which are one type of MWEs that serve as functional expressions. In addition, we report experimental results of dependency parsing using a constructed corpus.
Anthology ID:
L16-1263
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
1667–1671
Language:
URL:
https://aclanthology.org/L16-1263
DOI:
Bibkey:
Cite (ACL):
Akihiko Kato, Hiroyuki Shindo, and Yuji Matsumoto. 2016. Construction of an English Dependency Corpus incorporating Compound Function Words. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 1667–1671, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
Construction of an English Dependency Corpus incorporating Compound Function Words (Kato et al., LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1263.pdf