ABB-BERT: A BERT model for disambiguating abbreviations and contractions

Prateek Kacker, Andi Cupallari, Aswin Subramanian, Nimit Jain


Abstract
Abbreviations and contractions are commonly found in text across different domains. For example, doctors’ notes contain many contractions that can be personalized based on their choices. Existing spelling correction models are not suitable to handle expansions because of many reductions of characters in words. In this work, we propose ABB-BERT, a BERT-based model, which deals with an ambiguous language containing abbreviations and contractions. ABB-BERT can rank them from thousands of options and is designed for scale. It is trained on Wikipedia text, and the algorithm allows it to be fine-tuned with little compute to get better performance for a domain or person. We are publicly releasing the training dataset for abbreviations and contractions derived from Wikipedia.
Anthology ID:
2021.icon-main.35
Volume:
Proceedings of the 18th International Conference on Natural Language Processing (ICON)
Month:
December
Year:
2021
Address:
National Institute of Technology Silchar, Silchar, India
Editors:
Sivaji Bandyopadhyay, Sobha Lalitha Devi, Pushpak Bhattacharyya
Venue:
ICON
SIG:
Publisher:
NLP Association of India (NLPAI)
Note:
Pages:
289–297
Language:
URL:
https://aclanthology.org/2021.icon-main.35
DOI:
Bibkey:
Cite (ACL):
Prateek Kacker, Andi Cupallari, Aswin Subramanian, and Nimit Jain. 2021. ABB-BERT: A BERT model for disambiguating abbreviations and contractions. In Proceedings of the 18th International Conference on Natural Language Processing (ICON), pages 289–297, National Institute of Technology Silchar, Silchar, India. NLP Association of India (NLPAI).
Cite (Informal):
ABB-BERT: A BERT model for disambiguating abbreviations and contractions (Kacker et al., ICON 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.icon-main.35.pdf
Code
 prateek-kacker/ABB-BERT
Data
GLUE