D2LLM: Decomposed and Distilled Large Language Models for Semantic Search

Zihan Liao, Hang Yu, Jianguo Li, Jun Wang, Wei Zhang


Abstract
The key challenge in semantic search is to create models that are both accurate and efficient in pinpointing relevant sentences for queries. While BERT-style bi-encoders excel in efficiency with pre-computed embeddings, they often miss subtle nuances in search tasks. Conversely, GPT-style LLMs with cross-encoder designs capture these nuances but are computationally intensive, hindering real-time applications. In this paper, we present D2LLMs—Decomposed and Distilled LLMs for semantic search—that combines the best of both worlds. We decompose a cross-encoder into an efficient bi-encoder integrated with Pooling by Multihead Attention and an Interaction Emulation Module, achieving nuanced understanding and pre-computability. Knowledge from the LLM is distilled into this model using contrastive, rank, and feature imitation techniques. Our experiments show that D2LLM surpasses five leading baselines in terms of all metrics across three tasks, particularly improving NLI task performance by at least 6.45%
Anthology ID:
2024.acl-long.791
Volume:
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
14798–14814
Language:
URL:
https://aclanthology.org/2024.acl-long.791
DOI:
10.18653/v1/2024.acl-long.791
Bibkey:
Cite (ACL):
Zihan Liao, Hang Yu, Jianguo Li, Jun Wang, and Wei Zhang. 2024. D2LLM: Decomposed and Distilled Large Language Models for Semantic Search. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 14798–14814, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
D2LLM: Decomposed and Distilled Large Language Models for Semantic Search (Liao et al., ACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.acl-long.791.pdf