Anton Chernyavskiy


2024

pdf bib
ZenPropaganda: A Comprehensive Study on Identifying Propaganda Techniques in Russian Coronavirus-Related Media
Anton Chernyavskiy | Svetlana Shomova | Irina Dushakova | Ilya Kiriya | Dmitry Ilvovsky
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

The topic of automatic detection of manipulation and propaganda in the media is not a novel issue; however, it remains an urgent concern that necessitates continuous research focus. The topic is studied within the framework of various papers, competitions and shared tasks, which provide different techniques definitions and include the analysis of text data, images, as well as multi-lingual sources. In this study, we propose a novel multi-level classification scheme for identifying propaganda techniques. We introduce a new Russian dataset ZenPropaganda consisting of coronavirus-related texts collected from Vkontakte and Yandex.Zen platforms, which have been expertly annotated with fine-grained labeling of manipulative spans. We further conduct a comprehensive analysis by comparing our dataset with existing related ones and evaluate the performance of state-of-the-art approaches that have been proposed for them. Furthermore, we provide a detailed discussion of our findings, which can serve as a valuable resource for future research in this field.

2022

pdf bib
Batch-Softmax Contrastive Loss for Pairwise Sentence Scoring Tasks
Anton Chernyavskiy | Dmitry Ilvovsky | Pavel Kalinin | Preslav Nakov
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

The use of contrastive loss for representation learning has become prominent in computer vision, and it is now getting attention in Natural Language Processing (NLP).Here, we explore the idea of using a batch-softmax contrastive loss when fine-tuning large-scale pre-trained transformer models to learn better task-specific sentence embeddings for pairwise sentence scoring tasks. We introduce and study a number of variations in the calculation of the loss as well as in the overall training procedure; in particular, we find that a special data shuffling can be quite important. Our experimental results show sizable improvements on a number of datasets and pairwise sentence scoring tasks including classification, ranking, and regression. Finally, we offer detailed analysis and discussion, which should be useful for researchers aiming to explore the utility of contrastive loss in NLP.

pdf bib
CrowdChecked: Detecting Previously Fact-Checked Claims in Social Media
Momchil Hardalov | Anton Chernyavskiy | Ivan Koychev | Dmitry Ilvovsky | Preslav Nakov
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

While there has been substantial progress in developing systems to automate fact-checking, they still lack credibility in the eyes of the users. Thus, an interesting approach has emerged: to perform automatic fact-checking by verifying whether an input claim has been previously fact-checked by professional fact-checkers and to return back an article that explains their decision. This is a sensible approach as people trust manual fact-checking, and as many claims are repeated multiple times. Yet, a major issue when building such systems is the small number of known tweet–verifying article pairs available for training. Here, we aim to bridge this gap by making use of crowd fact-checking, i.e., mining claims in social media for which users have responded with a link to a fact-checking article. In particular, we mine a large-scale collection of 330,000 tweets paired with a corresponding fact-checking article. We further propose an end-to-end framework to learn from this noisy data based on modified self-adaptive training, in a distant supervision scenario. Our experiments on the CLEF’21 CheckThat! test set show improvements over the state of the art by two points absolute. Our code and datasets are available at https://github.com/mhardalov/crowdchecked-claims

2020

pdf bib
Aschern at SemEval-2020 Task 11: It Takes Three to Tango: RoBERTa, CRF, and Transfer Learning
Anton Chernyavskiy | Dmitry Ilvovsky | Preslav Nakov
Proceedings of the Fourteenth Workshop on Semantic Evaluation

We describe our system for SemEval-2020 Task 11 on Detection of Propaganda Techniques in News Articles. We developed ensemble models using RoBERTa-based neural architectures, additional CRF layers, transfer learning between the two subtasks, and advanced post-processing to handle the multi-label nature of the task, the consistency between nested spans, repetitions, and labels from similar spans in training. We achieved sizable improvements over baseline fine-tuned RoBERTa models, and the official evaluation ranked our system 3rd (almost tied with the 2nd) out of 36 teams on the span identification subtask with an F1 score of 0.491, and 2nd (almost tied with the 1st) out of 31 teams on the technique classification subtask with an F1 score of 0.62.

2019

pdf bib
Extract and Aggregate: A Novel Domain-Independent Approach to Factual Data Verification
Anton Chernyavskiy | Dmitry Ilvovsky
Proceedings of the Second Workshop on Fact Extraction and VERification (FEVER)

Triggered by Internet development, a large amount of information is published in online sources. However, it is a well-known fact that publications are inundated with inaccurate data. That is why fact-checking has become a significant topic in the last 5 years. It is widely accepted that factual data verification is a challenge even for the experts. This paper presents a domain-independent fact checking system. It can solve the fact verification problem entirely or at the individual stages. The proposed model combines various advanced methods of text data analysis, such as BERT and Infersent. The theoretical and empirical study of the system features is carried out. Based on FEVER and Fact Checking Challenge test-collections, experimental results demonstrate that our model can achieve the score on a par with state-of-the-art models designed by the specificity of particular datasets.