%0 Conference Proceedings %T Minimizing Annotation Effort via Max-Volume Spectral Sampling %A Quattoni, Ariadna %A Carreras, Xavier %Y Moens, Marie-Francine %Y Huang, Xuanjing %Y Specia, Lucia %Y Yih, Scott Wen-tau %S Findings of the Association for Computational Linguistics: EMNLP 2021 %D 2021 %8 November %I Association for Computational Linguistics %C Punta Cana, Dominican Republic %F quattoni-carreras-2021-minimizing-annotation %X We address the annotation data bottleneck for sequence classification. Specifically we ask the question: if one has a budget of N annotations, which samples should we select for annotation? The solution we propose looks for diversity in the selected sample, by maximizing the amount of information that is useful for the learning algorithm, or equivalently by minimizing the redundancy of samples in the selection. This is formulated in the context of spectral learning of recurrent functions for sequence classification. Our method represents unlabeled data in the form of a Hankel matrix, and uses the notion of spectral max-volume to find a compact sub-block from which annotation samples are drawn. Experiments on sequence classification confirm that our spectral sampling strategy is in fact efficient and yields good models. %R 10.18653/v1/2021.findings-emnlp.246 %U https://aclanthology.org/2021.findings-emnlp.246 %U https://doi.org/10.18653/v1/2021.findings-emnlp.246 %P 2890-2899