Using hyperlinks to improve multilingual partial parsers

Anders Søgaard


Abstract
Syntactic annotation is costly and not available for the vast majority of the world’s languages. We show that sometimes we can do away with less labeled data by exploiting more readily available forms of mark-up. Specifically, we revisit an idea from Valentin Spitkovsky’s work (2010), namely that hyperlinks typically bracket syntactic constituents or chunks. We strengthen his results by showing that not only can hyperlinks help in low resource scenarios, exemplified here by Quechua, but learning from hyperlinks can also improve state-of-the-art NLP models for English newswire. We also present out-of-domain evaluation on English Ontonotes 4.0.
Anthology ID:
W17-6310
Volume:
Proceedings of the 15th International Conference on Parsing Technologies
Month:
September
Year:
2017
Address:
Pisa, Italy
Venues:
IWPT | WS
SIG:
SIGPARSE
Publisher:
Association for Computational Linguistics
Note:
Pages:
67–71
Language:
URL:
https://aclanthology.org/W17-6310
DOI:
Bibkey:
Cite (ACL):
Anders Søgaard. 2017. Using hyperlinks to improve multilingual partial parsers. In Proceedings of the 15th International Conference on Parsing Technologies, pages 67–71, Pisa, Italy. Association for Computational Linguistics.
Cite (Informal):
Using hyperlinks to improve multilingual partial parsers (Søgaard, 2017)
Copy Citation:
PDF:
https://aclanthology.org/W17-6310.pdf
Code
 soegaard/hyperlink-iwpt17
Data
Penn Treebank