Ruslan Mitkov

Also published as: R. Mitkov

2025

Recently, language models (LMs) have produced excellent results in many natural language processing (NLP) tasks. However, their effectiveness is highly dependent on available pre-training resources, which is particularly challenging for low-resource languages such as Sinhala. Furthermore, the scarcity of benchmarks to evaluate LMs is also a major concern for low-resource languages. In this paper, we address these two challenges for Sinhala by (i) collecting the largest monolingual corpus for Sinhala, (ii) training multiple LMs on this corpus and (iii) compiling the first Sinhala NLP benchmark (Sinhala-GLUE) and evaluating LMs on it. We show the Sinhala LMs trained in this paper outperform the popular multilingual LMs, such as XLM-R and existing Sinhala LMs in downstream NLP tasks. All the trained LMs are publicly available. We also make Sinhala-GLUE publicly available as a public leaderboard, and we hope that it will enable further advancements in developing and evaluating LMs for Sinhala.

pdf bib abs

MUSTS: MUltilingual Semantic Textual Similarity Benchmark
Tharindu Ranasinghe | Hansi Hettiarachchi | Constantin Orasan | Ruslan Mitkov
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Predicting semantic textual similarity (STS) is a complex and ongoing challenge in natural language processing (NLP). Over the years, researchers have developed a variety of supervised and unsupervised approaches to calculate STS automatically. Additionally, various benchmarks, which include STS datasets, have been established to consistently evaluate and compare these STS methods. However, they largely focus on high-resource languages, mixed with datasets annotated focusing on relatedness instead of similarity and containing automatically translated instances. Therefore, no dedicated benchmark for multilingual STS exists. To solve this gap, we introduce the Multilingual Semantic Textual Similarity Benchmark (MUSTS), which spans 13 languages, including low-resource languages. By evaluating more than 25 models on MUSTS, we establish the most comprehensive benchmark of multilingual STS methods. Our findings confirm that STS remains a challenging task, particularly for low-resource languages.

pdf bib abs

Machine Translation in the AI Era: Comparing previous methods of machine translation with large language models
William Jock Boyd | Ruslan Mitkov
Proceedings of the 8th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Texts

The aim of this paper is to compare the efficacy of multiple different methods of machine translation in the French-English language pair. There is a particular focus on Large Language Models given they are an emerging technology that could have a profound effect on the field of machine translation. This study used the European Parliament’s parallel French-English corpus, testing each method on the same section of data, with multiple different Neural Translation, Large Language Model and Rule-Based solutions being used. The translations were then evaluated using BLEU and METEOR scores to gain an accurate understanding of both precision and semantic accuracy of translation. Statistical analysis was then performed to ensure the results validity and statistical significance. This study found that Neural Translation was the best translation technology overall, with Large Language Models coming second and Rule-Based translation coming last by a significant margin. It was also discovered that within Large Language Model implementations that specifically trained translation capabilities outperformed emergent translation capabilities.

pdf bib abs

XAutoLM: Efficient Fine-Tuning of Language Models via Meta-Learning and AutoML
Ernesto Luis Estevanell Valladares | Suilan Estevez-Velarde | Yoan Gutierrez | Andrés Montoyo | Ruslan Mitkov
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Experts in machine learning leverage domain knowledge to navigate decisions in model selection, hyperparameter optimization, and resource allocation. This is particularly critical for fine-tuning language models (LMs), where repeated trials incur substantial computational overhead and environmental impact. However, no existing automated framework simultaneously tackles the entire model selection and hyperparameter optimization (HPO) task for resource-efficient LM fine-tuning. We introduce XAutoLM, a meta-learning-augmented AutoML framework that reuses past experiences to optimize discriminative and generative LM fine-tuning pipelines efficiently. XAutoLM learns from stored successes and failures by extracting task- and system-level meta-features to bias its sampling toward valuable configurations and away from costly dead ends. On four text classification and two question-answering benchmarks, XAutoLM surpasses zero-shot optimizer’s peak F1 on five of six tasks, cuts mean evaluation time of pipelines by up to 4.5x, reduces search error ratios by up to sevenfold, and uncovers up to 50% more pipelines above the zero-shot Pareto front. In contrast, simpler memory-based baselines suffer negative transfer. We release XAutoLM and our experience store to catalyze resource-efficient, Green AI fine-tuning in the NLP community.

pdf bib

Proceedings of the First on Natural Language Processing and Language Models for Digital Humanities
Isuri Nanomi Arachchige | Francesca Frontini | Ruslan Mitkov | Paul Rayson
Proceedings of the First on Natural Language Processing and Language Models for Digital Humanities

pdf bib

pdf bib abs

The first Workshop on Language Models for Low-Resource Languages (LoResLM 2025) was held in conjunction with the 31st International Conference on Computational Linguistics (COLING 2025) in Abu Dhabi, United Arab Emirates. This workshop mainly aimed to provide a forum for researchers to share and discuss their ongoing work on language models (LMs) focusing on low-resource languages, following the recent advancements in neural language models and their linguistic biases towards high-resource languages. LoResLM 2025 attracted notable interest from the natural language processing (NLP) community, resulting in 35 accepted papers from 52 submissions. These contributions cover a broad range of low-resource languages from eight language families and 13 diverse research areas, paving the way for future possibilities and promoting linguistic inclusivity in NLP.

pdf bib abs

ADOR: Dataset for Arabic Dialects in Hotel Reviews: A Human Benchmark for Sentiment Analysis
Maram I. Alharbi | Saad Ezzini | Hansi Hettiarachchi | Tharindu Ranasinghe | Ruslan Mitkov
Proceedings of the First Workshop on Advancing NLP for Low-Resource Languages

Arabic machine translation remains a fundamentally challenging task, primarily due to the lack of comprehensive annotated resources. This study evaluates the performance of Meta’s NLLB-200 model in translating Modern Standard Arabic (MSA) into three regional dialects: Saudi, Maghribi, and Egyptian Arabic using a manually curated dataset of hotel reviews. We applied a multi-criteria human annotation framework to assess translation correctness, dialect accuracy, and sentiment and aspect preservation. Our analysis reveals significant variation in translation quality across dialects. While sentiment and aspect preservation were generally high, dialect accuracy and overall translation fidelity were inconsistent. For Saudi Arabic, over 95% of translations required human correction, highlighting systemic issues. Maghribi outputs demonstrated better dialectal authenticity, while Egyptian translations achieved the highest reliability with the lowest correction rate and fewest multi-criteria failures. These results underscore the limitations of current multilingual models in handling informal Arabic varieties and highlight the importance of dialect-sensitive evaluation.

pdf bib

Proceedings of the First Workshop on Comparative Performance Evaluation: From Rules to Language Models
Alicia Picazo-Izquierdo | Ernesto Luis Estevanell-Valladares | Ruslan Mitkov | Rafael Muñoz Guillena | Raúl García Cerdá
Proceedings of the First Workshop on Comparative Performance Evaluation: From Rules to Language Models

pdf bib abs

A Comparative Study of Vision Transformers and Multimodal Language Models for Violence Detection in Videos
Tomas Ditchfield-Ogle | Ruslan Mitkov
Proceedings of the First Workshop on Comparative Performance Evaluation: From Rules to Language Models

This project compares methods for de- tecting violent videos, which are crucial for ensuring real-time safety in surveil- lance and digital moderation. It evaluates four approaches: a random forest classi- fier, a transformer model, and two multi- modal vision-language models. The pro- cess involves preprocessing datasets, train- ing models, and assessing accuracy, inter- pretability, scalability, and real-time suit- ability. Results show that traditional meth- ods are simple but less effective. The trans- former model achieved high accuracy, and the multimodal models offered high vio- lence recall with descriptive justifications. The study highlights trade-offs and pro- vides practical insights for the deployment of automated violence detection.

pdf bib abs

Detection of AI-generated Content in Scientific Abstracts
Ernesto Luis Estevanell-Valladares | Alicia Picazo-Izquierdo | Ruslan Mitkov
Proceedings of the First Workshop on Comparative Performance Evaluation: From Rules to Language Models

The growing use of generative AI in academic writing raises urgent questions about authorship and the integrity of scientific communication. This study addresses the detection of AI-generated scientific abstracts by constructing a temporally anchored dataset of paired abstracts—each with a human-written version that contains scientific abstracts of works published before 2021 and a synthetic version generated using GPT-4.1. We evaluate three approaches to authorship classification: zero-shot large language models (LLMs), fine-tuned encoder-based transformers, and traditional machine learning classifiers. Results show that LLMs perform near chance level, while a LoRA-fine-tuned DistilBERT and a PassiveAggressive classifier achieve near-perfect performance. These findings suggest that shallow lexical or stylistic patterns still differentiate human and AI writing, and that supervised learning is key to capturing these signals.

pdf bib abs

A Comparative Study of Hyperbole Detection Methods: From Rule-Based Approaches through Deep Learning Models to Large Language Models
Silvia Gargova | Nevena Grigorova | Ruslan Mitkov
Proceedings of the First Workshop on Comparative Performance Evaluation: From Rules to Language Models

We address hyperbole detection as a binary classification task, comparing rule-based methods, fine-tuned transformers (BERT, RoBERTa), and large language models (LLMs) in zero-shot and few-shot prompting (Gemini, LLaMA). Fine-tuned transformers achieved the best overall performance, with RoBERTa attaining an F1-score of 0.82. Rule-based methods performed lower (F1 = 0.58) but remain effective in constrained linguistic contexts. LLMs showed mixed results: zero-shot performance was variable, while few-shot prompting notably improved outcomes, reaching F1-scores up to 0.79 without task-specific training data. We discuss the trade-offs between interpretability, computational cost, and data requirements across methods. Our results highlight the promise of LLMs in low-resource scenarios and suggest future work on hybrid models and broader figurative language tasks.

pdf bib abs

Evaluating the Performance of Transformers in Translating Low-Resource Languages through Akkadian
Daniel A. Jones | Ruslan Mitkov
Proceedings of the First Workshop on Comparative Performance Evaluation: From Rules to Language Models

In this paper, we evaluate the performance of various fine-tuned, transformer-based models in translating Akkadian into English. Using annotated Akkadian data, we seek to establish potential considerations when developing models for other low-resource languages, which do not yet have as robust data. The results of this study show the potency, but also cost inefficiency, of Large Language Models compared to smaller Neural Machine Translation models. Significant evidence was also found demonstrating the importance of fine-tuning machine translation models from related languages.

pdf bib abs

Does Anaphora Resolution Improve LLM Fine-Tuning for Summarisation?
Yi Chun Lo | Ruslan Mitkov
Proceedings of the First Workshop on Comparative Performance Evaluation: From Rules to Language Models

This study investigates whether adding anaphora resolution as a preprocessing step before fine-tuning the text summarisation application in LLM can improve the quality of summary output. Two sets of training with the T5-base model and BART-large model using the SAMSum dataset were conducted. One uses the original text and the other uses the text processed by a simplified version of MARS (Mitkov’s Anaphora Resolution System). The experiment reveals that when T5-base model is fine-tuned on the anaphora-resolved inputs, the ROUGE metrics are improved. In contrast, BART-large model only has a slight improvement after fine-tuning under the same conditions, which is not statistically significant. Further analysis of the generated summaries indicates that anaphora resolution is helpful in semantic alignment.

pdf bib abs

From Zero to Hero: Building Serbian NER from Rules to LLMs
Milica Ikonić Nešić | Sasa Petalinkar | Ranka Stanković | Ruslan Mitkov
Proceedings of the First Workshop on Comparative Performance Evaluation: From Rules to Language Models

Named Entity Recognition (NER) presents specific challenges in Serbian, a morphologically rich language. To address these challenges, a comparative evaluation of distinct model paradigms across diverse text genres was conducted. A rule-based system (SrpNER), a traditional deep learning model (Convolutional Neural Network – CNN), fine-tuned transformer architectures (Jerteh and Tesla), and Large Language Models (LLMs), specifically ChatGPT 4.0 Nano and 4.1 Mini, were evaluated and compared. For the LLMs, a one-shot prompt engineering approach was employed, using prompt instructions aligned with the entity type definitions used in the manual annotation guidelines. Evaluation was performed on three Serbian datasets representing varied domains: newspaper articles, history textbook excerpts, and a sample of literary texts from the srpELTeC collection. The highest performance was consistently achieved by the fine-tuned transformer models, with F1 scores ranging from 0.78 on newspaper articles to 0.96 on primary school history textbook sample.

pdf bib abs

Evaluating the LLM and NMT Models in Translating Low-Resourced Languages
Julita JP Pucinskaite | Ruslan Mitkov
Proceedings of the First Workshop on Comparative Performance Evaluation: From Rules to Language Models

Machine translation has significantly advanced due to the development of transformer architecture, which is utilised by many modern deep-learning models. However, low-resource languages, such as Lithuanian, still face challenges stemming from the limited availability of training data and resource constraints. This study examines the translation capabilities of Neural Machine Translation (NMT) models and Large Language Models (LLMs), comparing their performance in low-resource translation tasks. Furthermore, it assesses the impact of parameter scaling and fine-tuning on their effectiveness in enhancing model performance. The evaluation showed that while LLMs demonstrated proficiency in low-resource translation, their results were lower compared to NMT models, which remained consistent across smaller variants. However, as model size increased, the lead was not as prominent, correlating with automatic and human evaluations. The effort to enhance translation accuracy through fine-tuning proved to be an effective strategy, demonstrating improvements in vocabulary expansion and structural coherence in both architectures. These findings highlight the importance of diverse datasets, comprehensive model design, and fine-tuning techniques in addressing the challenges of low-resourced language translation. This project, one of the first studies to focus on the low-resourced Lithuanian language, aims to contribute to the broader discourse and ongoing efforts to enhance accessibility and inclusivity in Natural Language Processing.

pdf bib

Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era
Galia Angelova | Maria Kunilovskaya | Marie Escribe | Ruslan Mitkov
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era

pdf bib abs

Evaluating Large Language Models on Sentiment Analysis in Arabic Dialects
Maram I. Alharbi | Saad Ezzini | Hansi Hettiarachchi | Tharindu Ranasinghe | Ruslan Mitkov
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era

Despite recent progress in large language models (LLMs), their performance on Arabic dialects remains underexplored, particularly in the context of sentiment analysis. This study presents a comparative evaluation of three LLMs, DeepSeek-R1, Qwen2.5, and LLaMA-3, on sentiment classification across Modern Standard Arabic (MSA), Saudi dialect and Darija. We construct a balanced sentiment dataset by translating and validating MSA hotel reviews into Saudi dialect and Darija. Using parameter-efficient fine-tuning (LoRA) and dialect-specific prompts, we assess each model under matched and mismatched prompting conditions. Evaluation results show that Qwen2.5 achieves the highest macro F1 score of 79% on Darija input using MSA prompts, while DeepSeek performs best when prompted in the input dialect, reaching 71% on Saudi dialect. LLaMA-3 exhibits stable performance across prompt variations, with 75% macro F1 on Darija input under MSA prompting. Dialect-aware prompting consistently improves classification accuracy, particularly for neutral and negative sentiment classes.

pdf bib abs

Toponym Resolution: Will Prompt Engineering Change Expectations?
Isuri Anuradha | Deshan Koshala Sumanathilaka | Ruslan Mitkov | Paul Rayson
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era

Large Language Models(LLMs) have revolutionised the field of artificial intelligence and have been successfully employed in many disciplines, capturing widespread attention and enthusiasm. Many previous studies have established that Domain-specific Deep Learning models to competitively perform with the general-purpose LLMs (Maatouk et al., 2024;Lu et al., 2024). However, a suitable prompt which provides direct instructions and background information is expected to yield im- proved results (Kamruzzaman and Kim, 2024). The present study focuses on utilising LLMs for the Toponym Resolution task by incorporating Retrieval-Augmented Generation(RAG) and prompting techniques to surpass the results of the traditional Deep Learning models. Moreover, this study demonstrates that promising results can be achieved without relying on large amounts of labelled, domain-specific data. After a descriptive comparison between open-source and proprietary LLMs through different prompt engineering techniques, the GPT-4o model performs best compared to the other LLMs for the Toponym Resolution task.

pdf bib abs

HoloBERT: Pre-Trained Transformer Model for Historical Narratives
Isuri Anuradha | Le An Ha | Ruslan Mitkov
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era

Oral texts often contain spontaneous, unstructured language with features like disfluencies, colloquialisms, and non-standard syntax. In this paper, we investigate how further pretraining language models with specialised learning objectives for oral and transcribed texts to enhance Named Entity Recognition (NER) performance in Holocaust-related discourse. To evaluate our models, we compare the extracted named entities (NE) against those from other pretrained models on historical texts and generative AI models such as GPT. Furthermore, we demonstrate practical applications of the recognised NEs by linking them to a knowledge base as structured metadata and representing them in a graph format. With these contributions, our work illustrates how the further-pretrain-and-fine-tune paradigm in Natural Language Processing advances research in Digital Humanities.

pdf bib abs

LLM-based Embedders for Prior Case Retrieval
Damith Premasiri | Tharindu Ranasinghe | Ruslan Mitkov
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era

In common law systems, legal professionals such as lawyers and judges rely on precedents to build their arguments. As the volume of cases has grown massively over time, effectively retrieving prior cases has become essential. Prior case retrieval (PCR) is an information retrieval (IR) task that aims to automatically identify the most relevant court cases for a specific query from a large pool of potential candidates. While IR methods have seen several paradigm shifts over the last few years, the vast majority of PCR methods continue to rely on traditional IR methods, such as BM25. The state-of-the-art deep learning IR methods have not been successful in PCR due to two key challenges: i. Lengthy legal text limitation; when using the powerful BERT-based transformer models, there is a limit of input text lengths, which inevitably requires to shorten the input via truncation or division with a loss of legal context information. ii. Lack of legal training data; due to data privacy concerns, available PCR datasets are often limited in size, making it difficult to train deep learning-based models effectively. In this research, we address these challenges by leveraging LLM-based text embedders in PCR. LLM-based embedders support longer input lengths, and since we use them in an unsupervised manner, they do not require training data, addressing both challenges simultaneously. In this paper, we evaluate state-of-the-art LLM-based text embedders in four PCR benchmark datasets and show that they outperform BM25 and supervised transformer-based models.

pdf bib

pdf bib abs

The hospitality industry in the Arab world increasingly relies on customer feedback to shape services, driving the need for advanced Arabic sentiment analysis tools. To address this challenge, the Sentiment Analysis on Arabic Dialects in the Hospitality Domain shared task focuses on Sentiment Detection in Arabic Dialects. This task leverages a multi-dialect, manually curated dataset derived from hotel reviews originally written in Modern Standard Arabic (MSA) and translated into Saudi and Moroccan (Darija) dialects. The dataset consists of 538 sentiment-balanced reviews spanning positive, neutral, and negative categories. Translations were validated by native speakers to ensure dialectal accuracy and sentiment preservation. This resource supports the development of dialect-aware NLP systems for real-world applications in customer experience analysis. More than 40 teams have registered for the shared task, with 12 submitting systems during the evaluation phase. The top-performing system achieved an F1 score of 0.81, demonstrating the feasibility and ongoing challenges of sentiment analysis across Arabic dialects.

pdf bib

2024

pdf bib abs

DARES: Dataset for Arabic Readability Estimation of School Materials
Mo El-Haj | Sultan Almujaiwel | Damith Premasiri | Tharindu Ranasinghe | Ruslan Mitkov
Proceedings of the Workshop on DeTermIt! Evaluating Text Difficulty in a Multilingual Context @ LREC-COLING 2024

This research introduces DARES, a dataset for assessing the readability of Arabic text in Saudi school materials. DARES compromise of 13335 instances from textbooks used in 2021 and contains two subtasks; (a) Coarse-grained readability assessment where the text is classified into different educational levels such as primary and secondary. (b) Fine-grained readability assessment where the text is classified into individual grades.. We fine-tuned five transformer models that support Arabic and found that CAMeLBERTmix performed the best in all input settings. Evaluation results showed high performance for the coarse-grained readability assessment task, achieving a weighted F1 score of 0.91 and a macro F1 score of 0.79. The fine-grained task achieved a weighted F1 score of 0.68 and a macro F1 score of 0.55. These findings demonstrate the potential of our approach for advancing Arabic text readability assessment in education, with implications for future innovations in the field.

pdf bib abs

DORE: A Dataset for Portuguese Definition Generation
Anna Beatriz Dimas Furtado | Tharindu Ranasinghe | Frederic Blain | Ruslan Mitkov
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Definition modelling (DM) is the task of automatically generating a dictionary definition of a specific word. Computational systems that are capable of DM can have numerous applications benefiting a wide range of audiences. As DM is considered a supervised natural language generation problem, these systems require large annotated datasets to train the machine learning (ML) models. Several DM datasets have been released for English and other high-resource languages. While Portuguese is considered a mid/high-resource language in most natural language processing tasks and is spoken by more than 200 million native speakers, there is no DM dataset available for Portuguese. In this research, we fill this gap by introducing DORE; the first dataset for Definition MOdelling for PoRtuguEse containing more than 100,000 definitions. We also evaluate several deep learning based DM models on DORE and report the results. The dataset and the findings of this paper will facilitate research and study of Portuguese in wider contexts.

pdf bib

Proceedings of the First International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security
Ruslan Mitkov | Saad Ezzini | Tharindu Ranasinghe | Ignatius Ezeani | Nouran Khallaf | Cengiz Acarturk | Matthew Bradbury | Mo El-Haj | Paul Rayson
Proceedings of the First International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security

2023

pdf bib abs

Authorship Attribution of Late 19th Century Novels using GAN-BERT
Kanishka Silva | Burcu Can | Frédéric Blain | Raheem Sarwar | Laura Ugolini | Ruslan Mitkov
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)

Authorship attribution aims to identify the author of an anonymous text. The task becomes even more worthwhile when it comes to literary works. For example, pen names were commonly used by female authors in the 19th century resulting in some literary works being incorrectly attributed or claimed. With this motivation, we collated a dataset of late 19th century novels in English. Due to the imbalance in the dataset and the unavailability of enough data per author, we employed the GANBERT model along with data sampling strategies to fine-tune a transformer-based model for authorship attribution. Differently from the earlier studies on the GAN-BERT model, we conducted transfer learning on comparatively smaller author subsets to train more focused author-specific models yielding performance over 0.88 accuracy and F1 scores. Furthermore, we observed that increasing the sample size has a negative impact on the model’s performance. Our research mainly contributes to the ongoing authorship attribution research using GAN-BERT architecture, especially in attributing disputed novelists in the late 19th century.

pdf bib

Proceedings of the Workshop on Computational Terminology in NLP and Translation Studies (ConTeNTS) Incorporating the 16th Workshop on Building and Using Comparable Corpora (BUCC)
Amal Haddad Haddad | Ayla Rigouts Terryn | Ruslan Mitkov | Reinhard Rapp | Pierre Zweigenbaum | Serge Sharoff
Proceedings of the Workshop on Computational Terminology in NLP and Translation Studies (ConTeNTS) Incorporating the 16th Workshop on Building and Using Comparable Corpora (BUCC)

pdf bib

Proceedings of the First Workshop on NLP Tools and Resources for Translation and Interpreting Applications
Raquel Lázaro Gutiérrez | Antonio Pareja | Ruslan Mitkov
Proceedings of the First Workshop on NLP Tools and Resources for Translation and Interpreting Applications

pdf bib abs

Machine Translation of literary texts: genres, times and systems
Ana Isabel Cespedosa Vázquez | Ruslan Mitkov
Proceedings of the First Workshop on NLP Tools and Resources for Translation and Interpreting Applications

Machine Translation (MT) has taken off dramatically in recent years due to the advent of Deep Learning methods and Neural Machine Translation (NMT) has enhanced the quality of automatic translation significantly. While most work has covered the automatic translation of technical, legal and medical texts, the application of MT to literary texts and the human role in this process have been underexplored. In an effort to bridge the gap of this under-researched area, this paper presents the results of a study which seeks to evaluate the performance of three MT systems applied to two different literary genres, two novels (1984 by George Orwell and Pride and Prejudice by Jane Austen) and two poems (I Felt a Funeral in my Brain by Emily Dickinson and Siren Song by Margaret Atwood) representing different literary periods and timelines. The evaluation was conducted by way of the automatic evaluation metric BLEU to objectively assess the performance that the MT system shows on each genre. The limitations of this study are also outlined.

pdf bib

Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing
Ruslan Mitkov | Galia Angelova
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing

pdf bib abs

Evaluating of Large Language Models in Relationship Extraction from Unstructured Data: Empirical Study from Holocaust Testimonies
Isuri Anuradha | Le An Ha | Ruslan Mitkov | Vinita Nahar
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing

Relationship extraction from unstructured data remains one of the most challenging tasks in the field of Natural Language Processing (NLP). The complexity of relationship extraction arises from the need to comprehend the underlying semantics, syntactic structures, and contextual dependencies within the text. Unstructured data poses challenges with diverse linguistic patterns, implicit relationships, contextual nuances, complicating accurate relationship identification and extraction. The emergence of Large Language Models (LLMs), such as GPT (Generative Pre-trained Transformer), has indeed marked a significant advancement in the field of NLP. In this work, we assess and evaluate the effectiveness of LLMs in relationship extraction in the Holocaust testimonies within the context of the Historical realm. By delving into this domain-specific context, we aim to gain deeper insights into the performance and capabilities of LLMs in accurately capturing and extracting relationships within the Holocaust domain by developing a novel knowledge graph to visualise the relationships of the Holocaust. To the best of our knowledge, there is no existing study which discusses relationship extraction in Holocaust testimonies. The majority of current approaches for Information Extraction (IE) in historic documents are either manual or OCR based. Moreover, in this study, we found that the Subject-Object-Verb extraction using GPT3-based relations produced more meaningful results compared to the Semantic Role labeling-based triple extraction.

pdf bib abs

Can Model Fusing Help Transformers in Long Document Classification? An Empirical Study
Damith Premasiri | Tharindu Ranasinghe | Ruslan Mitkov
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing

Text classification is an area of research which has been studied over the years in Natural Language Processing (NLP). Adapting NLP to multiple domains has introduced many new challenges for text classification and one of them is long document classification. While state-of-the-art transformer models provide excellent results in text classification, most of them have limitations in the maximum sequence length of the input sequence. The majority of the transformer models are limited to 512 tokens, and therefore, they struggle with long document classification problems. In this research, we explore on employing Model Fusing for long document classification while comparing the results with well-known BERT and Longformer architectures.

pdf bib abs

Deep Learning Methods for Identification of Multiword Flower and Plant Names
Damith Premasiri | Amal Haddad Haddad | Tharindu Ranasinghe | Ruslan Mitkov
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing

Multiword Terms (MWTs) are domain-specific Multiword Expressions (MWE) where two or more lexemes converge to form a new unit of meaning. The task of processing MWTs is crucial in many Natural Language Processing (NLP) applications, including Machine Translation (MT) and terminology extraction. However, the automatic detection of those terms is a difficult task and more research is still required to give more insightful and useful results in this field. In this study, we seek to fill this gap using state-of-the-art transformer models. We evaluate both BERT like discriminative transformer models and generative pre-trained transformer (GPT) models on this task, and we show that discriminative models perform better than current GPT models in multi-word terms identification task in flower and plant names in English and Spanish languages. Best discriminate models perform 94.3127%, 82.1733% F1 scores in English and Spanish data, respectively while ChatGPT could only perform 63.3183% and 47.7925% respectively.

pdf bib abs

Cross-lingual Mediation: Readability Effects
Maria Kunilovskaya | Ruslan Mitkov | Eveline Wandl-Vogt
Proceedings of the Second Workshop on Text Simplification, Accessibility and Readability

This paper explores the readability of translated and interpreted texts compared to the original source texts and target language texts in the same domain. It was shown in the literature that translated and interpreted texts could exhibit lexical and syntactic properties that make them simpler, and hence, easier to process than their sources or comparable non-translations. In translation, this effect is attributed to the tendency to simplify and disambiguate the message. In interpreting, it can be enhanced by the temporal and cognitive constraints. We use readability annotations from the Newsela corpus to formulate a number of classification and regression tasks and fine-tune a multilingual pre-trained model on these tasks, obtaining models that can differentiate between complex and simple sentences. Then, the models are applied to predict the readability of sources, targets, and comparable target language originals in a zero-shot manner. Our test data – parallel and comparable – come from English-German bidirectional interpreting and translation subsets from the Europarl corpus. The results confirm the difference in readability between translated/interpreted targets against sentences in standard originally-authored source and target languages. Besides, we find consistent differences between the translation directions in the English-German language pair.

2022

pdf bib abs

DTW at Qur’an QA 2022: Utilising Transfer Learning with Transformers for Question Answering in a Low-resource Domain
Damith Premasiri | Tharindu Ranasinghe | Wajdi Zaghouani | Ruslan Mitkov
Proceedinsg of the 5th Workshop on Open-Source Arabic Corpora and Processing Tools with Shared Tasks on Qur'an QA and Fine-Grained Hate Speech Detection

The task of machine reading comprehension (MRC) is a useful benchmark to evaluate the natural language understanding of machines. It has gained popularity in the natural language processing (NLP) field mainly due to the large number of datasets released for many languages. However, the research in MRC has been understudied in several domains, including religious texts. The goal of the Qur’an QA 2022 shared task is to fill this gap by producing state-of-the-art question answering and reading comprehension research on Qur’an. This paper describes the DTW entry to the Quran QA 2022 shared task. Our methodology uses transfer learning to take advantage of available Arabic MRC data. We further improve the results using various ensemble learning strategies. Our approach provided a partial Reciprocal Rank (pRR) score of 0.49 on the test set, proving its strong performance on the task.

2021

pdf bib abs

An Exploratory Analysis of Multilingual Word-Level Quality Estimation with Cross-Lingual Transformers
Tharindu Ranasinghe | Constantin Orasan | Ruslan Mitkov
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Most studies on word-level Quality Estimation (QE) of machine translation focus on language-specific models. The obvious disadvantages of these approaches are the need for labelled data for each language pair and the high cost required to maintain several language-specific models. To overcome these problems, we explore different approaches to multilingual, word-level QE. We show that multilingual QE models perform on par with the current language-specific models. In the cases of zero-shot and few-shot QE, we demonstrate that it is possible to accurately predict word-level quality for any given new language pair from models trained on other language pairs. Our findings suggest that the word-level QE models based on powerful pre-trained transformers that we propose in this paper generalise well across languages, making them more useful in real-world scenarios.

pdf bib abs

Translationese in Russian Literary Texts
Maria Kunilovskaya | Ekaterina Lapshinova-Koltunski | Ruslan Mitkov
Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

The paper reports the results of a translationese study of literary texts based on translated and non-translated Russian. We aim to find out if translations deviate from non-translated literary texts, and if the established differences can be attributed to typological relations between source and target languages. We expect that literary translations from typologically distant languages should exhibit more translationese, and the fingerprints of individual source languages (and their families) are traceable in translations. We explore linguistic properties that distinguish non-translated Russian literature from translations into Russian. Our results show that non-translated fiction is different from translations to the degree that these two language varieties can be automatically classified. As expected, language typology is reflected in translations of literary texts. We identified features that point to linguistic specificity of Russian non-translated literature and to shining-through effects. Some of translationese features cut across all language pairs, while others are characteristic of literary translations from languages belonging to specific language families.

pdf bib

Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)
Ruslan Mitkov | Galia Angelova
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

pdf bib abs

Fiction in Russian Translation: A Translationese Study
Maria Kunilovskaya | Ekaterina Lapshinova-Koltunski | Ruslan Mitkov
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

This paper presents a translationese study based on the parallel data from the Russian National Corpus (RNC). We explored differences between literary texts originally authored in Russian and fiction translated into Russian from 11 languages. The texts are represented with frequency-based features that capture structural and lexical properties of language. Binary classification results indicate that literary translations can be distinguished from non-translations with an accuracy ranging from 82 to 92% depending on the source language and feature set. Multiclass classification confirms that translations from distant languages are more distinct from non-translations than translations from languages that are typologically close to Russian. It also demonstrates that translations from same-family source languages share translationese properties. Structural features return more consistent results than features relying on external resources and capturing lexical properties of texts in both translationese detection and source language identification tasks.

pdf bib abs

Paragraph Similarity Matches for Generating Multiple-choice Test Items
Halyna Maslak | Ruslan Mitkov
Proceedings of the Student Research Workshop Associated with RANLP 2021

Multiple-choice questions (MCQs) are widely used in knowledge assessment in educational institutions, during work interviews, in entertainment quizzes and games. Although the research on the automatic or semi-automatic generation of multiple-choice test items has been conducted since the beginning of this millennium, most approaches focus on generating questions from a single sentence. In this research, a state-of-the-art method of creating questions based on multiple sentences is introduced. It was inspired by semantic similarity matches used in the translation memory component of translation management systems. The performance of two deep learning algorithms, doc2vec and SBERT, is compared for the paragraph similarity task. The experiments are performed on the ad-hoc corpus within the EU domain. For the automatic evaluation, a smaller corpus of manually selected matching paragraphs has been compiled. The results prove the good performance of Sentence Embeddings for the given task.

pdf bib abs

Towards New Generation Translation Memory Systems
Nikola Spasovski | Ruslan Mitkov
Proceedings of the Student Research Workshop Associated with RANLP 2021

Despite the enormous popularity of Translation Memory systems and the active research in the field, their language processing features still suffer from certain limitations. While many recent papers focus on semantic matching capabilities of TMs, this planned study will address how these tools perform when dealing with longer segments and whether this could be a cause of lower match scores. An experiment will be carried out on corpora from two different (repetitive) domains. Following the results, recommendations for future developments of new TMs will be made.

pdf bib

Proceedings of the Translation and Interpreting Technology Online Conference
Ruslan Mitkov | Vilelmini Sosoni | Julie Christine Giguère | Elena Murgolo | Elizabeth Deysel
Proceedings of the Translation and Interpreting Technology Online Conference

pdf bib abs

A Comparison between Named Entity Recognition Models in the Biomedical Domain
Maria Carmela Cariello | Alessandro Lenci | Ruslan Mitkov
Proceedings of the Translation and Interpreting Technology Online Conference

The domain-specialised application of Named Entity Recognition (NER) is known as Biomedical NER (BioNER), which aims to identify and classify biomedical concepts that are of interest to researchers, such as genes, proteins, chemical compounds, drugs, mutations, diseases, and so on. The BioNER task is very similar to general NER but recognising Biomedical Named Entities (BNEs) is more challenging than recognising proper names from newspapers due to the characteristics of biomedical nomenclature. In order to address the challenges posed by BioNER, seven machine learning models were implemented comparing a transfer learning approach based on fine-tuned BERT with Bi-LSTM based neural models and a CRF model used as baseline. Precision, Recall and F1-score were used as performance scores evaluating the models on two well-known biomedical corpora: JNLPBA and BIOCREATIVE IV (BC-IV). Strict and partial matching were considered as evaluation criteria. The reported results show that a transfer learning approach based on fine-tuned BERT outperforms all others methods achieving the highest scores for all metrics on both corpora.

pdf bib abs

Cross-Lingual Named Entity Recognition via FastAlign: a Case Study
Ali Hatami | Ruslan Mitkov | Gloria Corpas Pastor
Proceedings of the Translation and Interpreting Technology Online Conference

Named Entity Recognition is an essential task in natural language processing to detect entities and classify them into predetermined categories. An entity is a meaningful word, or phrase that refers to proper nouns. Named Entities play an important role in different NLP tasks such as Information Extraction, Question Answering and Machine Translation. In Machine Translation, named entities often cause translation failures regardless of local context, affecting the output quality of translation. Annotating named entities is a time-consuming and expensive process especially for low-resource languages. One solution for this problem is to use word alignment methods in bilingual parallel corpora in which just one side has been annotated. The goal is to extract named entities in the target language by using the annotated corpus of the source language. In this paper, we compare the performance of two alignment methods, Grow-diag-final-and and Intersect Symmetrisation heuristics, to exploit the annotation projection of English-Brazilian Portuguese bilingual corpus to detect named entities in Brazilian Portuguese. A NER model that is trained on annotated data extracted from the alignment methods, is used to evaluate the performance of aligners. Experimental results show the Intersect Symmetrisation is able to achieve superior performance scores compared to the Grow-diag-final-and heuristic in Brazilian Portuguese.

pdf bib abs

Fake News Detection for Portuguese with Deep Learning
Lígia Venturott | Ruslan Mitkov
Proceedings of the Translation and Interpreting Technology Online Conference

The exponential growth of the internet and social media in the past decade gave way to the increase in dissemination of false or misleading information. Since the 2016 US presidential election, the term “fake news” became increasingly popular and this phenomenon has received more attention. In the past years several fact-checking agencies were created, but due to the great number of daily posts on social media, manual checking is insufficient. Currently, there is a pressing need for automatic fake news detection tools, either to assist manual fact-checkers or to operate as standalone tools. There are several projects underway on this topic, but most of them focus on English. This research-in-progress paper discusses the employment of deep learning methods, and the development of a tool, for detecting false news in Portuguese. As a first step we shall compare well-established architectures that were tested in other languages and analyse their performance on our Portuguese data. Based on the preliminary results of these classifiers, we shall choose a deep learning model or combine several deep learning models which hold promise to enhance the performance of our fake news detection system.

pdf bib abs

Interactive Models for Post-Editing
Marie Escribe | Ruslan Mitkov
Proceedings of the Translation and Interpreting Technology Online Conference

Despite the increasingly good quality of Machine Translation (MT) systems, MT outputs require corrections. Automatic Post-Editing (APE) models have been introduced to perform these corrections without human intervention. However, no system has been able to fully automate the Post-Editing (PE) process. Moreover, while numerous translation tools, such as Translation Memories (TMs), largely benefit from translators’ input, Human-Computer Interaction (HCI) remains limited when it comes to PE. This research-in-progress paper discusses APE models and suggests that they could be improved in more interactive scenarios, as previously done in MT with the creation of Interactive MT (IMT) systems. Based on the hypothesis that PE would benefit from HCI, two methodologies are proposed. Both suggest that traditional batch learning settings are not optimal for PE. Instead, online techniques are recommended to train and update PE models on the fly, via either real or simulated interactions with the translator.

pdf bib abs

Benchmarking ASR Systems Based on Post-Editing Effort and Error Analysis
Martha Maria Papadopoulou | Anna Zaretskaya | Ruslan Mitkov
Proceedings of the Translation and Interpreting Technology Online Conference

This paper offers a comparative evaluation of four commercial ASR systems which are evaluated according to the post-editing effort required to reach “publishable” quality and according to the number of errors they produce. For the error annotation task, an original error typology for transcription errors is proposed. This study also seeks to examine whether there is a difference in the performance of these systems between native and non-native English speakers. The experimental results suggest that among the four systems, Trint obtains the best scores. It is also observed that most systems perform noticeably better with native speakers and that all systems are most prone to fluency errors.

2020

pdf bib abs

TransQuest: Translation Quality Estimation with Cross-lingual Transformers
Tharindu Ranasinghe | Constantin Orasan | Ruslan Mitkov
Proceedings of the 28th International Conference on Computational Linguistics

Recent years have seen big advances in the field of sentence-level quality estimation (QE), largely as a result of using neural-based architectures. However, the majority of these methods work only on the language pair they are trained on and need retraining for new language pairs. This process can prove difficult from a technical point of view and is usually computationally expensive. In this paper we propose a simple QE framework based on cross-lingual transformers, and we use it to implement and evaluate two different neural architectures. Our evaluation shows that the proposed methods achieve state-of-the-art results outperforming current open-source quality estimation frameworks when trained on datasets from WMT. In addition, the framework proves very useful in transfer learning settings, especially when dealing with low-resourced languages, allowing us to obtain very competitive results.

pdf bib abs

Intelligent Translation Memory Matching and Retrieval with Sentence Encoders
Tharindu Ranasinghe | Constantin Orasan | Ruslan Mitkov
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

Matching and retrieving previously translated segments from the Translation Memory is a key functionality in Translation Memories systems. However this matching and retrieving process is still limited to algorithms based on edit distance which we have identified as a major drawback in Translation Memories systems. In this paper, we introduce sentence encoders to improve matching and retrieving process in Translation Memories systems - an effective and efficient solution to replace edit distance-based algorithms.

pdf bib abs

RGCL at SemEval-2020 Task 6: Neural Approaches to DefinitionExtraction
Tharindu Ranasinghe | Alistair Plum | Constantin Orasan | Ruslan Mitkov
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This paper presents the RGCL team submission to SemEval 2020 Task 6: DeftEval, subtasks 1 and 2. The system classifies definitions at the sentence and token levels. It utilises state-of-the-art neural network architectures, which have some task-specific adaptations, including an automatically extended training set. Overall, the approach achieves acceptable evaluation scores, while maintaining flexibility in architecture selection.

pdf bib abs

TransQuest at WMT2020: Sentence-Level Direct Assessment
Tharindu Ranasinghe | Constantin Orasan | Ruslan Mitkov
Proceedings of the Fifth Conference on Machine Translation

This paper presents the team TransQuest’s participation in Sentence-Level Direct Assessment shared task in WMT 2020. We introduce a simple QE framework based on cross-lingual transformers, and we use it to implement and evaluate two different neural architectures. The proposed methods achieve state-of-the-art results surpassing the results obtained by OpenKiwi, the baseline used in the shared task. We further fine tune the QE framework by performing ensemble and data augmentation. Our approach is the winning solution in all of the language pairs according to the WMT 2020 official results.

2019

pdf bib abs

Bridging the Gap: Attending to Discontinuity in Identification of Multiword Expressions
Omid Rohanian | Shiva Taslimipoor | Samaneh Kouchaki | Le An Ha | Ruslan Mitkov
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

We introduce a new method to tag Multiword Expressions (MWEs) using a linguistically interpretable language-independent deep learning architecture. We specifically target discontinuity, an under-explored aspect that poses a significant challenge to computational treatment of MWEs. Two neural architectures are explored: Graph Convolutional Network (GCN) and multi-head self-attention. GCN leverages dependency parse information, and self-attention attends to long-range relations. We finally propose a combined model that integrates complementary information from both, through a gating mechanism. The experiments on a standard multilingual dataset for verbal MWEs show that our model outperforms the baselines not only in the case of discontinuous MWEs but also in overall F-score.

pdf bib

Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)
Ruslan Mitkov | Galia Angelova
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

pdf bib abs

Enhancing Unsupervised Sentence Similarity Methods with Deep Contextualised Word Representations
Tharindu Ranasinghe | Constantin Orasan | Ruslan Mitkov
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

Calculating Semantic Textual Similarity (STS) plays a significant role in many applications such as question answering, document summarisation, information retrieval and information extraction. All modern state of the art STS methods rely on word embeddings one way or another. The recently introduced contextualised word embeddings have proved more effective than standard word embeddings in many natural language processing tasks. This paper evaluates the impact of several contextualised word embeddings on unsupervised STS methods and compares it with the existing supervised/unsupervised STS methods for different datasets in different languages and different domains

pdf bib abs

Semantic Textual Similarity with Siamese Neural Networks
Tharindu Ranasinghe | Constantin Orasan | Ruslan Mitkov
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

Calculating the Semantic Textual Similarity (STS) is an important research area in natural language processing which plays a significant role in many applications such as question answering, document summarisation, information retrieval and information extraction. This paper evaluates Siamese recurrent architectures, a special type of neural networks, which are used here to measure STS. Several variants of the architecture are compared with existing methods

pdf bib abs

RGCL-WLV at SemEval-2019 Task 12: Toponym Detection
Alistair Plum | Tharindu Ranasinghe | Pablo Calleja | Constantin Orăsan | Ruslan Mitkov
Proceedings of the 13th International Workshop on Semantic Evaluation

This article describes the system submitted by the RGCL-WLV team to the SemEval 2019 Task 12: Toponym resolution in scientific papers. The system detects toponyms using a bootstrapped machine learning (ML) approach which classifies names identified using gazetteers extracted from the GeoNames geographical database. The paper evaluates the performance of several ML classifiers, as well as how the gazetteers influence the accuracy of the system. Several runs were submitted. The highest precision achieved for one of the submissions was 89%, albeit it at a relatively low recall of 49%.

pdf bib abs

What Influences the Features of Post-editese? A Preliminary Study
Sheila Castilho | Natália Resende | Ruslan Mitkov
Proceedings of the Human-Informed Translation and Interpreting Technology Workshop (HiT-IT 2019)

While a number of studies have shown evidence of translationese phenomena, that is, statistical differences between original texts and translated texts (Gellerstam, 1986), results of studies searching for translationese features in postedited texts (what has been called ”posteditese” (Daems et al., 2017)) have presented mixed results. This paper reports a preliminary study aimed at identifying the presence of post-editese features in machine-translated post-edited texts and at understanding how they differ from translationese features. We test the influence of factors such as post-editing (PE) levels (full vs. light), translation proficiency (professionals vs. students) and text domain (news vs. literary). Results show evidence of post-editese features, especially in light PE texts and in certain domains.

2018

pdf bib abs

With a little help from NLP: My Language Technology applications with impact on society
Ruslan Mitkov
Proceedings of the Third International Conference on Computational Linguistics in Bulgaria (CLIB 2018)

The keynote speech presents the speaker’s vision that research should lead to the development of applications which benefit society. To support this, the speaker will present three original methodologies proposed by him which underpin applications jointly implemented with colleagues from across his research group. These Language Technology tools already have a substantial societal impact in the following areas: learning and assessment, translation and care for people with language disabilities.

pdf bib abs

Classifying Referential and Non-referential It Using Gaze
Victoria Yaneva | Le An Ha | Richard Evans | Ruslan Mitkov
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

When processing a text, humans and machines must disambiguate between different uses of the pronoun it, including non-referential, nominal anaphoric or clause anaphoric ones. In this paper we use eye-tracking data to learn how humans perform this disambiguation and use this knowledge to improve the automatic classification of it. We show that by using gaze data and a POS-tagger we are able to significantly outperform a common baseline and classify between three categories of it with an accuracy comparable to that of linguistic-based approaches. In addition, the discriminatory power of specific gaze features informs the way humans process the pronoun, which, to the best of our knowledge, has not been explored using data from a natural reading task.

pdf bib abs

WLV at SemEval-2018 Task 3: Dissecting Tweets in Search of Irony
Omid Rohanian | Shiva Taslimipoor | Richard Evans | Ruslan Mitkov
Proceedings of the 12th International Workshop on Semantic Evaluation

This paper describes the systems submitted to SemEval 2018 Task 3 “Irony detection in English tweets” for both subtasks A and B. The first system leveraging a combination of sentiment, distributional semantic, and text surface features is ranked third among 44 teams according to the official leaderboard of the subtask A. The second system with slightly different representation of the features ranked ninth in subtask B. We present a method that entails decomposing tweets into separate parts. Searching for contrast within the constituents of a tweet is an integral part of our system. We embrace an extensive definition of contrast which leads to a vast coverage in detecting ironic content.

pdf bib abs

Wolves at SemEval-2018 Task 10: Semantic Discrimination based on Knowledge and Association
Shiva Taslimipoor | Omid Rohanian | Le An Ha | Gloria Corpas Pastor | Ruslan Mitkov
Proceedings of the 12th International Workshop on Semantic Evaluation

This paper describes the system submitted to SemEval 2018 shared task 10 ‘Capturing Dicriminative Attributes’. We use a combination of knowledge-based and co-occurrence features to capture the semantic difference between two words in relation to an attribute. We define scores based on association measures, ngram counts, word similarity, and ConceptNet relations. The system is ranked 4th (joint) on the official leaderboard of the task.

2017

bib

Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017
Ruslan Mitkov | Galia Angelova
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

pdf bib abs

Investigating the Opacity of Verb-Noun Multiword Expression Usages in Context
Shiva Taslimipoor | Omid Rohanian | Ruslan Mitkov | Afsaneh Fazly
Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017)

This study investigates the supervised token-based identification of Multiword Expressions (MWEs). This is an ongoing research to exploit the information contained in the contexts in which different instances of an expression could occur. This information is used to investigate the question of whether an expression is literal or MWE. Lexical and syntactic context features derived from vector representations are shown to be more effective over traditional statistical measures to identify tokens of MWEs.

pdf bib abs

Effects of Lexical Properties on Viewing Time per Word in Autistic and Neurotypical Readers
Sanja Štajner | Victoria Yaneva | Ruslan Mitkov | Simone Paolo Ponzetto
Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications

Eye tracking studies from the past few decades have shaped the way we think of word complexity and cognitive load: words that are long, rare and ambiguous are more difficult to read. However, online processing techniques have been scarcely applied to investigating the reading difficulties of people with autism and what vocabulary is challenging for them. We present parallel gaze data obtained from adult readers with autism and a control group of neurotypical readers and show that the former required higher cognitive effort to comprehend the texts as evidenced by three gaze-based measures. We divide all words into four classes based on their viewing times for both groups and investigate the relationship between longer viewing times and word length, word frequency, and four cognitively-based measures (word concreteness, familiarity, age of acquisition and imagability).

pdf bib abs

Translation Memory Systems Have a Long Way to Go
Andrea Silvestre Baquero | Ruslan Mitkov
Proceedings of the Workshop Human-Informed Translation and Interpreting Technology

The TM memory systems changed the work of translators and now the translators not benefiting from these tools are a tiny minority. These tools operate on fuzzy (surface) matching mostly and cannot benefit from already translated texts which are synonymous to (or paraphrased versions of) the text to be translated. The match score is mostly based on character-string similarity, calculated through Levenshtein distance. The TM tools have difficulties with detecting similarities even in sentences which represent a minor revision of sentences already available in the translation memory. This shortcoming of the current TM systems was the subject of the present study and was empirically proven in the experiments we conducted. To this end, we compiled a small translation memory (English-Spanish) and applied several lexical and syntactic transformation rules to the source sentences with both English and Spanish being the source language. The results of this study show that current TM systems have a long way to go and highlight the need for TM systems equipped with NLP capabilities which will offer the translator the advantage of he/she not having to translate a sentence again if an almost identical sentence has already been already translated.

2016

pdf bib abs

Evaluating the Readability of Text Simplification Output for Readers with Cognitive Disabilities
Victoria Yaneva | Irina Temnikova | Ruslan Mitkov
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper presents an approach for automatic evaluation of the readability of text simplification output for readers with cognitive disabilities. First, we present our work towards the development of the EasyRead corpus, which contains easy-to-read documents created especially for people with cognitive disabilities. We then compare the EasyRead corpus to the simplified output contained in the LocalNews corpus (Feng, 2009), the accessibility of which has been evaluated through reading comprehension experiments including 20 adults with mild intellectual disability. This comparison is made on the basis of 13 disability-specific linguistic features. The comparison reveals that there are no major differences between the two corpora, which shows that the EasyRead corpus is to a similar reading level as the user-evaluated texts. We also discuss the role of Simple Wikipedia (Zhu et al., 2010) as a widely-used accessibility benchmark, in light of our finding that it is significantly more complex than both the EasyRead and the LocalNews corpora.

pdf bib abs

A Corpus of Text Data and Gaze Fixations from Autistic and Non-Autistic Adults
Victoria Yaneva | Irina Temnikova | Ruslan Mitkov
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

The paper presents a corpus of text data and its corresponding gaze fixations obtained from autistic and non-autistic readers. The data was elicited through reading comprehension testing combined with eye-tracking recording. The corpus consists of 1034 content words tagged with their POS, syntactic role and three gaze-based measures corresponding to the autistic and control participants. The reading skills of the participants were measured through multiple-choice questions and, based on the answers given, they were divided into groups of skillful and less-skillful readers. This division of the groups informs researchers on whether particular fixations were elicited from skillful or less-skillful readers and allows a fair between-group comparison for two levels of reading ability. In addition to describing the process of data collection and corpus development, we present a study on the effect that word length has on reading in autism. The corpus is intended as a resource for investigating the particular linguistic constructions which pose reading difficulties for people with autism and hopefully, as a way to inform future text simplification research intended for this population.

pdf bib

WOLVESAAR at SemEval-2016 Task 1: Replicating the Success of Monolingual Word Alignment and Neural Embeddings for Semantic Textual Similarity
Hannah Bechara | Rohit Gupta | Liling Tan | Constantin Orăsan | Ruslan Mitkov | Josef van Genabith
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

A syntactically complex text may represent a problem for both comprehension by humans and various NLP tasks. A large number of studies in text simplification are concerned with this problem and their aim is to transform the given text into a simplified form in order to make it accessible to the wider audience. In this study, we were investigating what the natural tendency of texts is in 20th century English language. Are they becoming syntactically more complex over the years, requiring a higher literacy level and greater effort from the readers, or are they becoming simpler and easier to read? We examined several factors of text complexity (average sentence length, Automated Readability Index, sentence complexity and passive voice) in the 20th century for two main English language varieties - British and American, using the `Brown family' of corpora. In British English, we compared the complexity of texts published in 1931, 1961 and 1991, while in American English we compared the complexity of texts published in 1961 and 1992. Furthermore, we demonstrated how the state-of-the-art NLP tools can be used for automatic extraction of some complex features from the raw text version of the corpora.

pdf bib abs

A review corpus annotated for negation, speculation and their scope
Natalia Konstantinova | Sheila C.M. de Sousa | Noa P. Cruz | Manuel J. Maña | Maite Taboada | Ruslan Mitkov
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper presents a freely available resource for research on handling negation and speculation in review texts. The SFU Review Corpus, consisting of 400 documents of movie, book, and consumer product reviews, was annotated at the token level with negative and speculative keywords and at the sentence level with their linguistic scope. We report statistics on corpus size and the consistency of annotations. The annotated corpus will be useful in many applications, such as document mining and sentiment analysis.

pdf bib abs

CLCM - A Linguistic Resource for Effective Simplification of Instructions in the Crisis Management Domain and its Evaluations
Irina Temnikova | Constantin Orasan | Ruslan Mitkov
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Due to the increasing number of emergency situations which can have substantial consequences, both financially and fatally, the Crisis Management (CM) domain is developing at an exponential speed. The efficient management of emergency situations relies on clear communication between all of the participants in a crisis situation. For these reasons the Text Complexity (TC) of the CM domain needed to be investigated and showed that CM domain texts exhibit high TC levels. This article presents a new linguistic resource in the form of Controlled Language (CL) guidelines for manual text simplification in the CM domain which aims to address high TC in the CM domain and produce clear messages to be used in crisis situations. The effectiveness of the resource has been tested via evaluation from several different perspectives important for the domain. The overall results show that the CLCM simplification has a positive impact on TC, reading comprehension, manual translation and machine translation. Additionally, an investigation of the cognitive difficulty in applying manual simplification operations led to interesting discoveries. This article provides details of the evaluation methods, the conducted experiments, their results and indications about future work.

2011

pdf bib

Proceedings of the International Conference Recent Advances in Natural Language Processing 2011
Ruslan Mitkov | Galia Angelova
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011

pdf bib

Diachronic Stylistic Changes in British and American Varieties of 20th Century Written English Language
Sanja Štajner | Ruslan Mitkov
Proceedings of the Workshop on Language Technologies for Digital Humanities and Cultural Heritage

2010

pdf bib abs

Resources for Controlled Languages for Alert Messages and Protocols in the European Perspective
Sylviane Cardey | Krzysztof Bogacki | Xavier Blanco | Ruslan Mitkov
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper is concerned with resources for controlled languages for alert messages and protocols in the European perspective. These resources have been produced as the outcome of a project (Alert Messages and Protocols: MESSAGE) which has been funded with the support of the European Commission - Directorate-General Justice, Freedom and Security, and with the specific objective of 'promoting and supporting the development of security standards, and an exchange of know-how and experience on protection of people'. The MESSAGE project involved the development and transfer of a methodology for writing safe and safely translatable alert messages and protocols created by Centre Tesnière in collaboration with the aircraft industry, the health profession, and emergency services by means of a consortium of four partners to their four European member states in their languages (ES, FR (Coordinator), GB, PL). The paper describes alert messages and protocols, controlled languages for safety and security, the target groups involved, controlled language evaluation, dissemination, the resources that are available, both Freely available and From Owner, together with illustrations of the resources, and the potential transferability to other sectors and users.

2009

pdf bib

Proceedings of the International Conference RANLP-2009
Galia Angelova | Ruslan Mitkov
Proceedings of the International Conference RANLP-2009

pdf bib

Semantic Similarity of Distractors in Multiple-Choice Tests: Extrinsic Evaluation
Ruslan Mitkov | Le An Ha | Andrea Varga | Luz Rello
Proceedings of the Workshop on Geometrical Models of Natural Language Semantics

2008

pdf bib abs

Translation universals: do they exist? A corpus-based NLP study of convergence and simplification
Gloria Corpas Pastor | Ruslan Mitkov | Naveed Afzal | Viktor Pekar
Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Research Papers

Convergence and simplification are two of the so-called universals in translation studies. The first one postulates that translated texts tend to be more similar than non-translated texts. The second one postulates that translated texts are simpler, easier-to-understand than non-translated ones. This paper discusses the results of a project which applies NLP techniques over comparable corpora of translated and non-translated texts in Spanish seeking to establish whether these two universals hold Corpas Pastor (2008).

pdf bib abs

Mutual Bilingual Terminology Extraction
Le An Ha | Gabriela Fernandez | Ruslan Mitkov | Gloria Corpas
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper describes a novel methodology to perform bilingual terminology extraction, in which automatic alignment is used to improve the performance of terminology extraction for each language. The strengths of monolingual terminology extraction for each language are exploited to improve the performance of terminology extraction in the other language, thanks to the availability of a sentence-level aligned bilingual corpus, and an automatic noun phrase alignment mechanism. The experiment indicates that weaknesses in monolingual terminology extraction due to the limitation of resources in certain languages can be overcome by using another language which has no such limitation.

pdf bib abs

Anaphora Resolution Exercise: an Overview
Constantin Orăsan | Dan Cristea | Ruslan Mitkov | António Branco
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Evaluation campaigns have become an established way to evaluate automatic systems which tackle the same task. This paper presents the first edition of the Anaphora Resolution Exercise (ARE) and the lessons learnt from it. This first edition focused only on English pronominal anaphora and NP coreference, and was organised as an exploratory exercise where various issues were investigated. ARE proposed four different tasks: pronominal anaphora resolution and NP coreference resolution on a predefined set of entities, pronominal anaphora resolution and NP coreference resolution on raw texts. For each of these tasks different inputs and evaluation metrics were prepared. This paper presents the four tasks, their input data and evaluation metrics used. Even though a large number of researchers in the field expressed their interest to participate, only three institutions took part in the formal evaluation. The paper briefly presents their results, but does not try to interpret them because in this edition of ARE our aim was not about finding why certain methods are better, but to prepare the ground for a fully-fledged edition.

pdf bib abs

Smarty - Extendable Framework for Bilingual and Multilingual Comprehension Assistants
Todor Arnaudov | Ruslan Mitkov
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper discusses a framework for development of bilingual and multilingual comprehension assistants and presents a prototype implementation of an English-Bulgarian comprehension assistant. The framework is based on the application of advanced graphical user interface techniques, WordNet and compatible lexical databases as well as a series of NLP preprocessing tasks, including POS-tagging, lemmatisation, multiword expressions recognition and word sense disambiguation. The aim of this framework is to speed up the process of dictionary look-up, to offer enhanced look-up functionalities and to perform a context-sensitive narrowing-down of the set of translation alternatives proposed to the user.

2006

pdf bib abs

If “it” were “then”, then when was “it”? Establishing the anaphoric role of “then”
Georgiana Puşcaşu | Ruslan Mitkov
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

The adverb "then" is among the most frequent Englishtemporal adverbs, being also capable of filling a variety of semantic roles. The identification of anaphoric usages of "then"is important for temporal expression resolution, while thetemporal relationship usage is important for event ordering. Given that previous work has not tackled the identification and temporal resolution of anaphoric "then", this paper presents a machine learning approach for setting apart anaphoric usages and a rule-based normaliser that resolves it with respect to an antecedent. The performance of the two modules is evaluated. The present paper also describes the construction of an annotated corpus and the subsequent derivation of training data required by the machine learning module.

pdf bib

Generating Multiple-Choice Test Items from Medical Text: A Pilot Study
Nikiforos Karamanis | Le An Ha | Ruslan Mitkov
Proceedings of the Fourth International Natural Language Generation Conference

2005

pdf bib

Building a WSD module within an MT system to enable interactive resolution in the user’s source language
Constantin Orasan | Ted Marshall | Robert Clark | Le An Ha | Ruslan Mitkov
Proceedings of the 10th EAMT Conference: Practical applications of machine translation

2004

pdf bib

Annotation of Anaphoric Expressions in an Aligned Bilingual Corpus
Agnès Tutin | Meriam Haddara | Ruslan Mitkov | Constantin Orasan
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib

Categorizing Web Pages as a Preprocessing Step for Information Extraction
Viktor Pekar | Richard Evans | Ruslan Mitkov
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

2003

pdf bib

CAST: A computer-aided summarisation tool
Constantin Orasan | Ruslan Mitkov | Laura Hasler
10th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib

Computer-Aided Generation of Multiple-Choice Tests
Ruslan Mitkov | Le An Ha
Proceedings of the HLT-NAACL 03 Workshop on Building Educational Applications Using Natural Language Processing

2002

pdf bib

Shallow Language Processing Architecture for Bulgarian
Hristo Tanev | Ruslan Mitkov
COLING 2002: The 19th International Conference on Computational Linguistics

pdf bib

pdf bib

A corpus based investigation of morphological disagreement in anaphoric relations
Cătălina Barbu | Richard Evans | Ruslan Mitkov
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

2001

pdf bib

Introduction to the Special Issue on Computational Anaphora Resolution
Ruslan Mitkov | Branimir Boguraev | Shalom Lappin
Computational Linguistics, Volume 27, Number 4, December 2001

pdf bib

Evaluation Tool for Rule-based Anaphora Resolution Methods
Catalina Barbu | Ruslan Mitkov
Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics

2000

pdf bib

Evaluation environment for anaphora resolution
Catalina Barbu | Ruslan Mitkov
Proceedings of the International Conference on Machine Translation and Multilingual Applications in the new Millennium: MT 2000

pdf bib

LINGUA: a robust architecture for text processing and anaphora resolution in Bulgarian
Hristo Tanev | Ruslan Mitkov
Proceedings of the International Conference on Machine Translation and Multilingual Applications in the new Millennium: MT 2000

pdf bib

Towards More Comprehensive Evaluation in Anaphora Resolution
Ruslan Mitkov
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

1999

pdf bib

Book Reviews: Centering Theory in Discourse
Ruslan Mitkov
Computational Linguistics, Volume 25, Number 4, December 1999

1998

pdf bib

Robust pronoun resolution with limited knowledge
Ruslan Mitkov
COLING 1998 Volume 2: The 17th International Conference on Computational Linguistics

pdf bib

Robust Pronoun Resolution with Limited Knowledge
Ruslan Mitkov
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 2

pdf bib

Multilingual Robust Anaphora Resolution
Ruslan Mitkov | Lamia Belguith | Malgorzata Stys
Proceedings of the Third Conference on Empirical Methods for Natural Language Processing

1997

pdf bib

Factors in anaphora resolution: they are not the only things that matter. A case study based on two different approaches
Ruslan Mitkov
Operational Factors in Practical, Robust Anaphora Resolution for Unrestricted Texts

pdf bib

How far are we from (semi-)automatic of anaphoric links in corpora?
Ruslan Mitkov
Operational Factors in Practical, Robust Anaphora Resolution for Unrestricted Texts

1996

pdf bib

Towards a more efficient use of PC-based MT in education
Ruslan Mitkov
Proceedings of Translating and the Computer 18

1995

pdf bib

Anaphora Resolution in Machine Translation
Ruslan Mitkov | Sung-Kwon Choi | Randall Sharp
Proceedings of the Sixth Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages

1994

pdf bib abs

Machine translation, ten years on: Discourse has yet to make a breakthrough
Ruslan Mitkov | Johann Haller
Proceedings of the Second International Conference on Machine Translation: Ten years on

Progress in Machine Translation (MT) during the last ten years has been observed at different levels, but discourse has yet to make a breakthrough. MT research and development has concentrated so far mostly on sentence translation (discourse analysis being a very complicated task) and the successful operation of most of the working MT systems does not usually go beyond the sentence level. To start with, the paper will refer to the MT research and development in the last ten years at the IAI in Saarbrücken. Next, the MT discourse issues will be discussed both from the point of view of source language analysis and target text generation, and on the basis of the preliminary results of an ongoing "discourse-oriented MT" project . Probably the most important aspect in successfully analysing multisentential source texts is the capacity to establish the anaphoric references to preceding discourse entities. The paper will discuss the problem of anaphora resolution from the perspective of MT. A new integrated model for anaphora resolution, developed for the needs of MT, will be also outlined. As already mentioned, most machine translation systems perform translation sentence by sentence. But even in the case of paragraph translation, the discourse structure of the target text tends to be identical to that of the source text. However, the sublanguage discourse structures may differ across the different languages, and thus a translated text which assumes the same discourse structure as the source text may sound unnatural and perhaps disguise the true intent of the writer. Finally, the paper will outline a new approach for generating discourse structures, appropriate to the target sublanguage and will discuss some of the complicated problems encountered.

pdf bib

An Integrated Model for Anaphora Resolution
Ruslan Mitkov
COLING 1994 Volume 2: The 15th International Conference on Computational Linguistics

pdf bib

Book Reviews: Expressibility and the Problem of Efficient Text Planning
Ruslan Mitkov
Computational Linguistics, Volume 20, Number 1, March 1994