Subword Language Model for Query Auto-Completion

Gyuwan Kim


Abstract
Current neural query auto-completion (QAC) systems rely on character-level language models, but they slow down when queries are long. We present how to utilize subword language models for the fast and accurate generation of query completion candidates. Representing queries with subwords shorten a decoding length significantly. To deal with issues coming from introducing subword language model, we develop a retrace algorithm and a reranking method by approximate marginalization. As a result, our model achieves up to 2.5 times faster while maintaining a similar quality of generated results compared to the character-level baseline. Also, we propose a new evaluation metric, mean recoverable length (MRL), measuring how many upcoming characters the model could complete correctly. It provides more explicit meaning and eliminates the need for prefix length sampling for existing rank-based metrics. Moreover, we performed a comprehensive analysis with ablation study to figure out the importance of each component.
Anthology ID:
D19-1507
Volume:
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
Month:
November
Year:
2019
Address:
Hong Kong, China
Venues:
EMNLP | IJCNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
5022–5032
Language:
URL:
https://aclanthology.org/D19-1507
DOI:
10.18653/v1/D19-1507
Bibkey:
Cite (ACL):
Gyuwan Kim. 2019. Subword Language Model for Query Auto-Completion. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 5022–5032, Hong Kong, China. Association for Computational Linguistics.
Cite (Informal):
Subword Language Model for Query Auto-Completion (Kim, EMNLP-IJCNLP 2019)
Copy Citation:
PDF:
https://aclanthology.org/D19-1507.pdf
Code
 clovaai/subword-qac