%0 Conference Proceedings %T Augmented SBERT: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks %A Thakur, Nandan %A Reimers, Nils %A Daxenberger, Johannes %A Gurevych, Iryna %Y Toutanova, Kristina %Y Rumshisky, Anna %Y Zettlemoyer, Luke %Y Hakkani-Tur, Dilek %Y Beltagy, Iz %Y Bethard, Steven %Y Cotterell, Ryan %Y Chakraborty, Tanmoy %Y Zhou, Yichao %S Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies %D 2021 %8 June %I Association for Computational Linguistics %C Online %F thakur-etal-2021-augmented %X There are two approaches for pairwise sentence scoring: Cross-encoders, which perform full-attention over the input pair, and Bi-encoders, which map each input independently to a dense vector space. While cross-encoders often achieve higher performance, they are too slow for many practical use cases. Bi-encoders, on the other hand, require substantial training data and fine-tuning over the target task to achieve competitive performance. We present a simple yet efficient data augmentation strategy called Augmented SBERT, where we use the cross-encoder to label a larger set of input pairs to augment the training data for the bi-encoder. We show that, in this process, selecting the sentence pairs is non-trivial and crucial for the success of the method. We evaluate our approach on multiple tasks (in-domain) as well as on a domain adaptation task. Augmented SBERT achieves an improvement of up to 6 points for in-domain and of up to 37 points for domain adaptation tasks compared to the original bi-encoder performance. %R 10.18653/v1/2021.naacl-main.28 %U https://aclanthology.org/2021.naacl-main.28 %U https://doi.org/10.18653/v1/2021.naacl-main.28 %P 296-310