A k-Nearest Neighbor Approach towards Multi-level Sequence Labeling

Yue Chen, John Chen


Abstract
In this paper we present a new method for intent recognition for complex dialog management in low resource situations. Complex dialog management is required because our target domain is real world mixed initiative food ordering between agents and their customers, where individual customer utterances may contain multiple intents and refer to food items with complex structure. For example, a customer might say “Can I get a deluxe burger with large fries and oh put extra mayo on the burger would you?” We approach this task as a multi-level sequence labeling problem, with the constraint of limited real training data. Both traditional methods like HMM, MEMM, or CRF and newer methods like DNN or BiLSTM use only homogeneous feature sets. Newer methods perform better but also require considerably more data. Previous research has done pseudo-data synthesis to obtain the required amounts of training data. We propose to use a k-NN learner with heterogeneous feature set. We used windowed word n-grams, POS tag n-grams and pre-trained word embeddings as features. For the experiments we perform a comparison between using pseudo-data and real world data. We also perform semi-supervised self-training to obtain additional labeled data, in order to better model real world scenarios. Instead of using massive pseudo-data, we show that with only less than 1% of the data size, we can achieve better result than any of the methods above by annotating real world data. We achieve labeled bracketed F-scores of 75.46, 52.84 and 49.66 for the three levels of sequence labeling where each level has a longer word span than its previous level. Overall we achieve 60.71F. In comparison, two previous systems, MEMM and DNN-ELMO, achieved 52.32 and 45.25 respectively.
Anthology ID:
N19-2019
Volume:
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Industry Papers)
Month:
June
Year:
2019
Address:
Minneapolis, Minnesota
Editors:
Anastassia Loukina, Michelle Morales, Rohit Kumar
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
149–156
Language:
URL:
https://aclanthology.org/N19-2019
DOI:
10.18653/v1/N19-2019
Bibkey:
Cite (ACL):
Yue Chen and John Chen. 2019. A k-Nearest Neighbor Approach towards Multi-level Sequence Labeling. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Industry Papers), pages 149–156, Minneapolis, Minnesota. Association for Computational Linguistics.
Cite (Informal):
A k-Nearest Neighbor Approach towards Multi-level Sequence Labeling (Chen & Chen, NAACL 2019)
Copy Citation:
PDF:
https://aclanthology.org/N19-2019.pdf
Poster:
 N19-2019.Poster.pdf