Answer is All You Need: Instruction-following Text Embedding via Answering the Question

Letian Peng, Yuwei Zhang, Zilong Wang, Jayanth Srinivasa, Gaowen Liu, Zihan Wang, Jingbo Shang


Abstract
This work aims to build a text embedder that can capture characteristics of texts specified by user instructions clarifying the similarity criterion. While previous methods improve general task awareness by injecting the instruction information into encoding, they fail to be sensitive to clearer criteria like “evaluate similarity based on emotion”. We instead propose a different viewpoint, which treats the instruction as a “question” about the input text and encodes the expected answers to obtain the representation accordingly. Intuitively, texts with the same (implicit) semantics would share similar answers following the instruction, thus leading to more similar representations. Specifically, we propose InBedder that instantiates this learning-to-answer idea by only fine-tuning language models via abstractive question answering tasks. Despite its simplicity, InBedder demonstrates significantly improved instruction-following capabilities according to our proposed instruction awareness tests and instruction robustness tests, when applied to language models with large language models (LLMs) (e.g., llama-2-7b) and smaller encoder-based LMs (e.g., roberta-large). Additionally, our qualitative analysis of clustering outcomes, achieved by applying diverse instructions to the same unlabeled corpus, demonstrates a high degree of interpretability in the clusters formed.
Anthology ID:
2024.luhme-long.27
Volume:
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
459–477
Language:
URL:
https://aclanthology.org/2024.luhme-long.27/
DOI:
10.18653/v1/2024.acl-long.27
Bibkey:
Cite (ACL):
Letian Peng, Yuwei Zhang, Zilong Wang, Jayanth Srinivasa, Gaowen Liu, Zihan Wang, and Jingbo Shang. 2024. Answer is All You Need: Instruction-following Text Embedding via Answering the Question. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 459–477, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
Answer is All You Need: Instruction-following Text Embedding via Answering the Question (Peng et al., ACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.acl-long.27.pdf