An Effective, Performant Named Entity Recognition System for Noisy Business Telephone Conversation Transcripts

Xue-Yong Fu, Cheng Chen, Md Tahmid Rahman Laskar, Shashi Bhushan Tn, Simon Corston-Oliver


Abstract
We present a simple yet effective method to train a named entity recognition (NER) model that operates on business telephone conversation transcripts that contain noise due to the nature of spoken conversation and artifacts of automatic speech recognition. We first fine-tune LUKE, a state-of-the-art Named Entity Recognition (NER) model, on a limited amount of transcripts, then use it as the teacher model to teach a smaller DistilBERT-based student model using a large amount of weakly labeled data and a small amount of human-annotated data. The model achieves high accuracy while also satisfying the practical constraints for inclusion in a commercial telephony product: realtime performance when deployed on cost-effective CPUs rather than GPUs. In this paper, we introduce the fine-tune-then-distill method for entity recognition on real world noisy data to deploy our NER model in a limited budget production environment. By generating pseudo-labels using a large teacher model pre-trained on typed text while fine-tuned on noisy speech text to train a smaller student model, we make the student model 75x times faster while reserving 99.09% of its accuracy. These findings demonstrate that our proposed approach is very effective in limited budget scenarios to alleviate the need of human labeling of a large amount of noisy data.
Anthology ID:
2022.wnut-1.10
Volume:
Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022)
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Venue:
WNUT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
96–100
Language:
URL:
https://aclanthology.org/2022.wnut-1.10
DOI:
Bibkey:
Cite (ACL):
Xue-Yong Fu, Cheng Chen, Md Tahmid Rahman Laskar, Shashi Bhushan Tn, and Simon Corston-Oliver. 2022. An Effective, Performant Named Entity Recognition System for Noisy Business Telephone Conversation Transcripts. In Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022), pages 96–100, Gyeongju, Republic of Korea. Association for Computational Linguistics.
Cite (Informal):
An Effective, Performant Named Entity Recognition System for Noisy Business Telephone Conversation Transcripts (Fu et al., WNUT 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.wnut-1.10.pdf