Fast Query Expansion on an Accounting Corpus using Sub-Word Embeddings

Hrishikesh Ganu; Viswa Datha P.

doi:10.18653/v1/W18-1208

Fast Query Expansion on an Accounting Corpus using Sub-Word Embeddings

Abstract

We present early results from a system under development which uses sub-word embeddings for query expansion in presence of mis-spelled words and other aberrations. We work for a company which creates accounting software and the end goal is to improve customer experience when they search for help on our “Customer Care” portal. Our customers use colloquial language, non-standard acronyms and sometimes mis-spell words when they use our Search portal or interact over other channels. However, our Knowledge Base has curated content which leverages technical terms and is in language which is quite formal. This results in the answer not being retrieved even though the answer might actually be present in the documentation (as assessed by a human). We address this problem by creating equivalence classes of words with similar meanings (with the additional property that the mappings to these equivalence classes are robust to mis-spellings) using sub-word embeddings and then use them to fine tune an Elasticsearch index to improve recall. We demonstrate through an end-end system that using sub-word embeddings leads to a significant lift in correct answers retrieved for an accounting corpus available in the public domain.

Anthology ID:: W18-1208
Volume:: Proceedings of the Second Workshop on Subword/Character LEvel Models
Month:: June
Year:: 2018
Address:: New Orleans
Editors:: Manaal Faruqui, Hinrich Schütze, Isabel Trancoso, Yulia Tsvetkov, Yadollah Yaghoobzadeh
Venue:: SCLeM
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 61–65
Language:
URL:: https://aclanthology.org/W18-1208/
DOI:: 10.18653/v1/W18-1208
Bibkey:
Cite (ACL):: Hrishikesh Ganu and Viswa Datha P.. 2018. Fast Query Expansion on an Accounting Corpus using Sub-Word Embeddings. In Proceedings of the Second Workshop on Subword/Character LEvel Models, pages 61–65, New Orleans. Association for Computational Linguistics.
Cite (Informal):: Fast Query Expansion on an Accounting Corpus using Sub-Word Embeddings (Ganu & P., SCLeM 2018)
Copy Citation:
PDF:: https://aclanthology.org/W18-1208.pdf

PDF Cite Search Fix data