Sergei Bagdasarov


2024

pdf bib
Applying Information-theoretic Notions to Measure Effects of the Plain English Movement on English Law Reports and Scientific Articles
Sergei Bagdasarov | Stefania Degaetano-Ortlieb
Proceedings of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2024)

We investigate the impact of the Plain English Movement (PEM) on the complexity of legal language in UK law reports from the 1950s-2010s, contrasting it with the evolution of scientific language. The PEM, emerging in the late 20th century, advocated for clear and understandable legal language. We define complexity through the concept of surprisal - an information-theoretic measure correlating with cognitive processing difficulty. Our research contrasts surprisal with traditional readability measures, which often overlook content. We hypothesize that, if the PEM has influenced legal language, there would be a reduction in complexity over time and a shift from a nominal to a more verbal style. We analyze text complexity and lexico-grammatical changes in line with PEM recommendations. Results indicate minimal impact of the PEM on both legal and scientific domains. This finding suggests future research should consider processing effort when advocating for linguistic norms to enhance accessibility.

2023

pdf bib
Linguistically Motivated Evaluation of the 2023 State-of-the-art Machine Translation: Can ChatGPT Outperform NMT?
Shushen Manakhimova | Eleftherios Avramidis | Vivien Macketanz | Ekaterina Lapshinova-Koltunski | Sergei Bagdasarov | Sebastian Möller
Proceedings of the Eighth Conference on Machine Translation

This paper offers a fine-grained analysis of the machine translation outputs in the context of the Shared Task at the 8th Conference of Machine Translation (WMT23). Building on the foundation of previous test suite efforts, our analysis includes Large Language Models and an updated test set featuring new linguistic phenomena. To our knowledge, this is the first fine-grained linguistic analysis for the GPT-4 translation outputs. Our evaluation spans German-English, English-German, and English-Russian language directions. Some of the phenomena with the lowest accuracies for German-English are idioms and resultative predicates. For English-German, these include mediopassive voice, and noun formation(er). As for English-Russian, these included idioms and semantic roles. GPT-4 performs equally or comparably to the best systems in German-English and English-German but falls in the second significance cluster for English-Russian.

2022

pdf bib
Linguistically Motivated Evaluation of the 2022 State-of-the-art Machine Translation Systems for Three Language Directions
Vivien Macketanz | Shushen Manakhimova | Eleftherios Avramidis | Ekaterina Lapshinova-koltunski | Sergei Bagdasarov | Sebastian Möller
Proceedings of the Seventh Conference on Machine Translation (WMT)

This document describes a fine-grained linguistically motivated analysis of 29 machine translation systems submitted at the Shared Task of the 7th Conference of Machine Translation (WMT22). This submission expands the test suite work of previous years by adding the language direction of English–Russian. As a result, evaluation takes place for the language directions of German–English, English–German, and English–Russian. We find that the German–English systems suffer in translating idioms, some tenses of modal verbs, and resultative predicates, the English–German ones in idioms, transitive-past progressive, and middle voice, whereas the English–Russian ones in pseudogapping and idioms.