Dmitri Puzyrev


2021

pdf bib
Active Learning for Sequence Tagging with Deep Pre-trained Models and Bayesian Uncertainty Estimates
Artem Shelmanov | Dmitri Puzyrev | Lyubov Kupriyanova | Denis Belyakov | Daniil Larionov | Nikita Khromov | Olga Kozlova | Ekaterina Artemova | Dmitry V. Dylov | Alexander Panchenko
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Annotating training data for sequence tagging of texts is usually very time-consuming. Recent advances in transfer learning for natural language processing in conjunction with active learning open the possibility to significantly reduce the necessary annotation budget. We are the first to thoroughly investigate this powerful combination for the sequence tagging task. We conduct an extensive empirical study of various Bayesian uncertainty estimation methods and Monte Carlo dropout options for deep pre-trained models in the active learning framework and find the best combinations for different types of models. Besides, we also demonstrate that to acquire instances during active learning, a full-size Transformer can be substituted with a distilled version, which yields better computational performance and reduces obstacles for applying deep active learning in practice.

pdf bib
How Certain is Your Transformer?
Artem Shelmanov | Evgenii Tsymbalov | Dmitri Puzyrev | Kirill Fedyanin | Alexander Panchenko | Maxim Panov
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

In this work, we consider the problem of uncertainty estimation for Transformer-based models. We investigate the applicability of uncertainty estimates based on dropout usage at the inference stage (Monte Carlo dropout). The series of experiments on natural language understanding tasks shows that the resulting uncertainty estimates improve the quality of detection of error-prone instances. Special attention is paid to the construction of computationally inexpensive estimates via Monte Carlo dropout and Determinantal Point Processes.