Ivan P. Yamshchikov


2024

pdf bib
Vygotsky Distance: Measure for Benchmark Task Similarity
Maxim K. Surkov | Ivan P. Yamshchikov
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Evaluation plays a significant role in modern natural language processing. Most modern NLP benchmarks consist of arbitrary sets of tasks that neither guarantee any generalization potential for the model once applied outside the test set nor try to minimize the resource consumption needed for model evaluation. This paper presents a theoretical instrument and a practical algorithm to calculate similarity between benchmark tasks, we call this similarity measure “Vygotsky distance”. The core idea of this similarity measure is that it is based on relative performance of the “students” on a given task, rather that on the properties of the task itself. If two tasks are close to each other in terms of Vygotsky distance the models tend to have similar relative performance on them. Thus knowing Vygotsky distance between tasks one can significantly reduce the number of evaluation tasks while maintaining a high validation quality. Experiments on various benchmarks, including GLUE, SuperGLUE, CLUE, and RussianSuperGLUE, demonstrate that a vast majority of NLP benchmarks could be at least 40% smaller in terms of the tasks included. Most importantly, Vygotsky distance could also be used for the validation of new tasks thus increasing the generalization potential of the future NLP models.

pdf bib
Knowledge Graph Representation for Political Information Sources
Tinatin Osmonova | Alexey Tikhonov | Ivan P. Yamshchikov
Proceedings of the Second Workshop on Natural Language Processing for Political Sciences @ LREC-COLING 2024

With the rise of computational social science, many scholars utilize data analysis and natural language processing tools to analyze social media, news articles, and other accessible data sources for examining political and social discourse. Particularly, the study of the emergence of echo-chambers due to the dissemination of specific information has become a topic of interest in mixed methods research areas. In this paper, we analyze data collected from two news portals, Breitbart News (BN) and New York Times (NYT) to prove the hypothesis that the formation of echo-chambers can be partially explained on the level of an individual information consumption rather than a collective topology of individuals’ social networks. Our research findings are presented through knowledge graphs, utilizing a dataset spanning 11.5 years gathered from BN and NYT media portals. We demonstrate that the application of knowledge representation techniques to the aforementioned news streams highlights, contrary to common assumptions, shows relative “internal” neutrality of both sources and polarizing attitude towards a small fraction of entities. Additionally, we argue that such characteristics in information sources lead to fundamental disparities in audience worldviews, potentially acting as a catalyst for the formation of echo-chambers.

pdf bib
Echo-chambers and Idea Labs: Communication Styles on Twitter
Aleksandra Sorokovikova | Michael Becker | Ivan P. Yamshchikov
Proceedings of the Second Workshop on Natural Language Processing for Political Sciences @ LREC-COLING 2024

This paper investigates the communication styles and structures of Twitter (X) communities within the vaccination context. While mainstream research primarily focuses on the echo-chamber phenomenon, wherein certain ideas are reinforced and participants are isolated from opposing opinions, this study reveals the presence of diverse communication styles across various communities. In addition to the communities exhibiting echo-chamber behavior, this research uncovers communities with distinct communication patterns. By shedding light on the nuanced nature of communication within social networks, this study emphasizes the significance of understanding the diversity of perspectives within online communities.

2019

pdf bib
Dyr Bul Shchyl. Proxying Sound Symbolism With Word Embeddings
Ivan P. Yamshchikov | Viascheslav Shibaev | Alexey Tikhonov
Proceedings of the 3rd Workshop on Evaluating Vector Space Representations for NLP

This paper explores modern word embeddings in the context of sound symbolism. Using basic properties of the representations space one can construct semantic axes. A method is proposed to measure if the presence of individual sounds in a given word shifts its semantics of that word along a specific axis. It is shown that, in accordance with several experimental and statistical results, word embeddings capture symbolism for certain sounds.

pdf bib
Style Transfer for Texts: Retrain, Report Errors, Compare with Rewrites
Alexey Tikhonov | Viacheslav Shibaev | Aleksander Nagaev | Aigul Nugmanova | Ivan P. Yamshchikov
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

This paper shows that standard assessment methodology for style transfer has several significant problems. First, the standard metrics for style accuracy and semantics preservation vary significantly on different re-runs. Therefore one has to report error margins for the obtained results. Second, starting with certain values of bilingual evaluation understudy (BLEU) between input and output and accuracy of the sentiment transfer the optimization of these two standard metrics diverge from the intuitive goal of the style transfer task. Finally, due to the nature of the task itself, there is a specific dependence between these two metrics that could be easily manipulated. Under these circumstances, we suggest taking BLEU between input and human-written reformulations into consideration for benchmarks. We also propose three new architectures that outperform state of the art in terms of this metric.

pdf bib
Decomposing Textual Information For Style Transfer
Ivan P. Yamshchikov | Viacheslav Shibaev | Aleksander Nagaev | Jürgen Jost | Alexey Tikhonov
Proceedings of the 3rd Workshop on Neural Generation and Translation

This paper focuses on latent representations that could effectively decompose different aspects of textual information. Using a framework of style transfer for texts, we propose several empirical methods to assess information decomposition quality. We validate these methods with several state-of-the-art textual style transfer methods. Higher quality of information decomposition corresponds to higher performance in terms of bilingual evaluation understudy (BLEU) between output and human-written reformulations.

2018

pdf bib
Sounds Wilde. Phonetically Extended Embeddings for Author-Stylized Poetry Generation
Aleksey Tikhonov | Ivan P. Yamshchikov
Proceedings of the Fifteenth Workshop on Computational Research in Phonetics, Phonology, and Morphology

This paper addresses author-stylized text generation. Using a version of a language model with extended phonetic and semantic embeddings for poetry generation we show that phonetics has comparable contribution to the overall model performance as the information on the target author. Phonetic information is shown to be important for English and Russian language. Humans tend to attribute machine generated texts to the target author.