CoverICL: Selective Annotation for In-Context Learning via Active Graph Coverage

Costas Mavromatis, Balasubramaniam Srinivasan, Zhengyuan Shen, Jiani Zhang, Huzefa Rangwala, Christos Faloutsos, George Karypis


Abstract
In-context learning (ICL) adapts Large Language Models (LLMs) to new tasks, without requiring any parameter updates, but few annotated examples as input. In this work, we investigate selective annotation for ICL, where there is a limited budget for annotating examples, similar to low-budget active learning (AL). Although uncertainty-based selection is unreliable with few annotated data, we present CoverICL, an adaptive graph-based selection algorithm, that effectively incorporates uncertainty sampling into selective annotation for ICL. First, CoverICL builds a nearest-neighbor graph based on the semantic similarity between candidate ICL examples. Then, CoverICL employs uncertainty estimation by the LLM to identify hard examples for the task. Selective annotation is performed over the active graph of the hard examples, adapting the process to the particular LLM used and the task tackled. CoverICL selects the most representative examples by solving a Maximum Coverage problem, approximating diversity-based sampling. Extensive experiments on ten datasets and seven LLMs show that, by incorporating uncertainty via coverage on the active graph, CoverICL (1) outperforms existing AL methods for ICL by 2–4.6% accuracy points, (2) is up to 2x more budget-efficient than SOTA methods for low-budget AL, and (3) generalizes better across tasks compared to non-graph alternatives.
Anthology ID:
2024.emnlp-main.1185
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
21268–21286
Language:
URL:
https://aclanthology.org/2024.emnlp-main.1185
DOI:
10.18653/v1/2024.emnlp-main.1185
Bibkey:
Cite (ACL):
Costas Mavromatis, Balasubramaniam Srinivasan, Zhengyuan Shen, Jiani Zhang, Huzefa Rangwala, Christos Faloutsos, and George Karypis. 2024. CoverICL: Selective Annotation for In-Context Learning via Active Graph Coverage. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 21268–21286, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
CoverICL: Selective Annotation for In-Context Learning via Active Graph Coverage (Mavromatis et al., EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-main.1185.pdf
Software:
 2024.emnlp-main.1185.software.zip