KITLM: Domain-Specific Knowledge InTegration into Language Models for Question Answering

Ankush Agarwal, Sakharam Gawade, Amar Prakash Azad, Pushpak Bhattacharyya


Abstract
Large language models (LLMs) have demon- strated remarkable performance in a wide range of natural language tasks. However, as these models continue to grow in size, they face sig- nificant challenges in terms of computational costs. Additionally, LLMs often lack efficient domain-specific understanding, which is par- ticularly crucial in specialized fields such as aviation and healthcare. To boost the domain- specific understanding, we propose, KITLM 1 , a novel knowledge base integration approach into language model through relevant informa- tion infusion. By integrating pertinent knowl- edge, not only the performance of the lan- guage model is greatly enhanced, but the model size requirement is also significantly reduced while achieving comparable performance. Our proposed knowledge-infused model surpasses the performance of both GPT-3.5-turbo and the state-of-the-art knowledge infusion method, SKILL, achieving over 1.5 times improvement in exact match scores on the MetaQA. KITLM showed a similar performance boost in the avi- ation domain with AeroQA. The drastic perfor- mance improvement of KITLM over the exist- ing methods can be attributed to the infusion of relevant knowledge while mitigating noise. In addition, we release two curated datasets to accelerate knowledge infusion research in specialized fields: a) AeroQA, a new bench- mark dataset designed for multi-hop question- answering within the aviation domain, and b) Aviation Corpus, a dataset constructed from unstructured text extracted from the National Transportation Safety Board reports. Our re- search contributes to advancing the field of domain-specific language understanding and showcases the potential of knowledge infusion techniques in improving the performance.
Anthology ID:
2023.icon-1.20
Volume:
Proceedings of the 20th International Conference on Natural Language Processing (ICON)
Month:
December
Year:
2023
Address:
Goa University, Goa, India
Editors:
Jyoti D. Pawar, Sobha Lalitha Devi
Venue:
ICON
SIG:
SIGLEX
Publisher:
NLP Association of India (NLPAI)
Note:
Pages:
202–294
Language:
URL:
https://aclanthology.org/2023.icon-1.20
DOI:
Bibkey:
Cite (ACL):
Ankush Agarwal, Sakharam Gawade, Amar Prakash Azad, and Pushpak Bhattacharyya. 2023. KITLM: Domain-Specific Knowledge InTegration into Language Models for Question Answering. In Proceedings of the 20th International Conference on Natural Language Processing (ICON), pages 202–294, Goa University, Goa, India. NLP Association of India (NLPAI).
Cite (Informal):
KITLM: Domain-Specific Knowledge InTegration into Language Models for Question Answering (Agarwal et al., ICON 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.icon-1.20.pdf