Evaluating the Utility of Hand-crafted Features in Sequence Labelling

Minghao Wu, Fei Liu, Trevor Cohn


Abstract
Conventional wisdom is that hand-crafted features are redundant for deep learning models, as they already learn adequate representations of text automatically from corpora. In this work, we test this claim by proposing a new method for exploiting handcrafted features as part of a novel hybrid learning approach, incorporating a feature auto-encoder loss component. We evaluate on the task of named entity recognition (NER), where we show that including manual features for part-of-speech, word shapes and gazetteers can improve the performance of a neural CRF model. We obtain a F 1 of 91.89 for the CoNLL-2003 English shared task, which significantly outperforms a collection of highly competitive baseline models. We also present an ablation study showing the importance of auto-encoding, over using features as either inputs or outputs alone, and moreover, show including the autoencoder components reduces training requirements to 60%, while retaining the same predictive accuracy.
Anthology ID:
D18-1310
Volume:
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Month:
October-November
Year:
2018
Address:
Brussels, Belgium
Editors:
Ellen Riloff, David Chiang, Julia Hockenmaier, Jun’ichi Tsujii
Venue:
EMNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
2850–2856
Language:
URL:
https://aclanthology.org/D18-1310
DOI:
10.18653/v1/D18-1310
Bibkey:
Cite (ACL):
Minghao Wu, Fei Liu, and Trevor Cohn. 2018. Evaluating the Utility of Hand-crafted Features in Sequence Labelling. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2850–2856, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):
Evaluating the Utility of Hand-crafted Features in Sequence Labelling (Wu et al., EMNLP 2018)
Copy Citation:
PDF:
https://aclanthology.org/D18-1310.pdf
Video:
 https://aclanthology.org/D18-1310.mp4
Code
 minghao-wu/CRF-AE
Data
CoNLLCoNLL 2003