Revisiting Supertagging for faster HPSG parsing

Olga Zamaraeva, Carlos Gómez-Rodríguez


Abstract
We present new supertaggers trained on English HPSG-based treebanks and test the effects of the best tagger on parsing speed and accuracy. HPSG treebanks are produced automatically by large manually built grammars and feature high-quality annotation based on a well-developed linguistic theory. The English Resource Grammar treebanks include diverse and challenging test datasets, beyond the usual WSJ section 23 and Wikipedia data. HPSG supertagging has previously relied on MaxEnt-based models. We use SVM and neural CRF- and BERT-based methods and show that both SVM and neural supertaggers achieve considerably higher accuracy compared to the baseline and lead to an increase not only in the parsing speed but also the parser accuracy with respect to gold dependency structures. Our fine-tuned BERT-based tagger achieves 97.26% accuracy on 950 sentences from WSJ23 and 93.88% on the out-of-domain technical essay The Cathedral and the Bazaar. We present experiments with integrating the best supertagger into an HPSG parser and observe a speedup of a factor of 3 with respect to the system which uses no tagging at all, as well as large recall gains and an overall precision gain. We also compare our system to an existing integrated tagger and show that although the well-integrated tagger remains the fastest, our experimental system can be more accurate. Finally, we hope that the diverse and difficult datasets we used for evaluation will gain more popularity in the field: we show that results can differ depending on the dataset, even if it is an in-domain one. We contribute the complete datasets reformatted for Huggingface token classification.
Anthology ID:
2024.emnlp-main.635
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
11359–11374
Language:
URL:
https://aclanthology.org/2024.emnlp-main.635
DOI:
Bibkey:
Cite (ACL):
Olga Zamaraeva and Carlos Gómez-Rodríguez. 2024. Revisiting Supertagging for faster HPSG parsing. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 11359–11374, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Revisiting Supertagging for faster HPSG parsing (Zamaraeva & Gómez-Rodríguez, EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-main.635.pdf
Software:
 2024.emnlp-main.635.software.zip