Ruslan Mitkov

Also published as: R. Mitkov


2021

pdf bib
An Exploratory Analysis of Multilingual Word-Level Quality Estimation with Cross-Lingual Transformers
Tharindu Ranasinghe | Constantin Orasan | Ruslan Mitkov
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Most studies on word-level Quality Estimation (QE) of machine translation focus on language-specific models. The obvious disadvantages of these approaches are the need for labelled data for each language pair and the high cost required to maintain several language-specific models. To overcome these problems, we explore different approaches to multilingual, word-level QE. We show that multilingual QE models perform on par with the current language-specific models. In the cases of zero-shot and few-shot QE, we demonstrate that it is possible to accurately predict word-level quality for any given new language pair from models trained on other language pairs. Our findings suggest that the word-level QE models based on powerful pre-trained transformers that we propose in this paper generalise well across languages, making them more useful in real-world scenarios.

2020

pdf bib
TransQuest: Translation Quality Estimation with Cross-lingual Transformers
Tharindu Ranasinghe | Constantin Orasan | Ruslan Mitkov
Proceedings of the 28th International Conference on Computational Linguistics

Recent years have seen big advances in the field of sentence-level quality estimation (QE), largely as a result of using neural-based architectures. However, the majority of these methods work only on the language pair they are trained on and need retraining for new language pairs. This process can prove difficult from a technical point of view and is usually computationally expensive. In this paper we propose a simple QE framework based on cross-lingual transformers, and we use it to implement and evaluate two different neural architectures. Our evaluation shows that the proposed methods achieve state-of-the-art results outperforming current open-source quality estimation frameworks when trained on datasets from WMT. In addition, the framework proves very useful in transfer learning settings, especially when dealing with low-resourced languages, allowing us to obtain very competitive results.

pdf bib
Intelligent Translation Memory Matching and Retrieval with Sentence Encoders
Tharindu Ranasinghe | Constantin Orasan | Ruslan Mitkov
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

Matching and retrieving previously translated segments from the Translation Memory is a key functionality in Translation Memories systems. However this matching and retrieving process is still limited to algorithms based on edit distance which we have identified as a major drawback in Translation Memories systems. In this paper, we introduce sentence encoders to improve matching and retrieving process in Translation Memories systems - an effective and efficient solution to replace edit distance-based algorithms.

pdf bib
TransQuest at WMT2020: Sentence-Level Direct Assessment
Tharindu Ranasinghe | Constantin Orasan | Ruslan Mitkov
Proceedings of the Fifth Conference on Machine Translation

This paper presents the team TransQuest’s participation in Sentence-Level Direct Assessment shared task in WMT 2020. We introduce a simple QE framework based on cross-lingual transformers, and we use it to implement and evaluate two different neural architectures. The proposed methods achieve state-of-the-art results surpassing the results obtained by OpenKiwi, the baseline used in the shared task. We further fine tune the QE framework by performing ensemble and data augmentation. Our approach is the winning solution in all of the language pairs according to the WMT 2020 official results.

pdf bib
RGCL at SemEval-2020 Task 6: Neural Approaches to DefinitionExtraction
Tharindu Ranasinghe | Alistair Plum | Constantin Orasan | Ruslan Mitkov
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This paper presents the RGCL team submission to SemEval 2020 Task 6: DeftEval, subtasks 1 and 2. The system classifies definitions at the sentence and token levels. It utilises state-of-the-art neural network architectures, which have some task-specific adaptations, including an automatically extended training set. Overall, the approach achieves acceptable evaluation scores, while maintaining flexibility in architecture selection.

2019

pdf bib
Bridging the Gap: Attending to Discontinuity in Identification of Multiword Expressions
Omid Rohanian | Shiva Taslimipoor | Samaneh Kouchaki | Le An Ha | Ruslan Mitkov
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

We introduce a new method to tag Multiword Expressions (MWEs) using a linguistically interpretable language-independent deep learning architecture. We specifically target discontinuity, an under-explored aspect that poses a significant challenge to computational treatment of MWEs. Two neural architectures are explored: Graph Convolutional Network (GCN) and multi-head self-attention. GCN leverages dependency parse information, and self-attention attends to long-range relations. We finally propose a combined model that integrates complementary information from both, through a gating mechanism. The experiments on a standard multilingual dataset for verbal MWEs show that our model outperforms the baselines not only in the case of discontinuous MWEs but also in overall F-score.

pdf bib
What Influences the Features of Post-editese? A Preliminary Study
Sheila Castilho | Natália Resende | Ruslan Mitkov
Proceedings of the Human-Informed Translation and Interpreting Technology Workshop (HiT-IT 2019)

While a number of studies have shown evidence of translationese phenomena, that is, statistical differences between original texts and translated texts (Gellerstam, 1986), results of studies searching for translationese features in postedited texts (what has been called ”posteditese” (Daems et al., 2017)) have presented mixed results. This paper reports a preliminary study aimed at identifying the presence of post-editese features in machine-translated post-edited texts and at understanding how they differ from translationese features. We test the influence of factors such as post-editing (PE) levels (full vs. light), translation proficiency (professionals vs. students) and text domain (news vs. literary). Results show evidence of post-editese features, especially in light PE texts and in certain domains.

pdf bib
RGCL-WLV at SemEval-2019 Task 12: Toponym Detection
Alistair Plum | Tharindu Ranasinghe | Pablo Calleja | Constantin Orăsan | Ruslan Mitkov
Proceedings of the 13th International Workshop on Semantic Evaluation

This article describes the system submitted by the RGCL-WLV team to the SemEval 2019 Task 12: Toponym resolution in scientific papers. The system detects toponyms using a bootstrapped machine learning (ML) approach which classifies names identified using gazetteers extracted from the GeoNames geographical database. The paper evaluates the performance of several ML classifiers, as well as how the gazetteers influence the accuracy of the system. Several runs were submitted. The highest precision achieved for one of the submissions was 89%, albeit it at a relatively low recall of 49%.

pdf bib
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)
Ruslan Mitkov | Galia Angelova
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

pdf bib
Enhancing Unsupervised Sentence Similarity Methods with Deep Contextualised Word Representations
Tharindu Ranasinghe | Constantin Orasan | Ruslan Mitkov
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

Calculating Semantic Textual Similarity (STS) plays a significant role in many applications such as question answering, document summarisation, information retrieval and information extraction. All modern state of the art STS methods rely on word embeddings one way or another. The recently introduced contextualised word embeddings have proved more effective than standard word embeddings in many natural language processing tasks. This paper evaluates the impact of several contextualised word embeddings on unsupervised STS methods and compares it with the existing supervised/unsupervised STS methods for different datasets in different languages and different domains

pdf bib
Semantic Textual Similarity with Siamese Neural Networks
Tharindu Ranasinghe | Constantin Orasan | Ruslan Mitkov
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

Calculating the Semantic Textual Similarity (STS) is an important research area in natural language processing which plays a significant role in many applications such as question answering, document summarisation, information retrieval and information extraction. This paper evaluates Siamese recurrent architectures, a special type of neural networks, which are used here to measure STS. Several variants of the architecture are compared with existing methods

2018

pdf bib
Classifying Referential and Non-referential It Using Gaze
Victoria Yaneva | Le An Ha | Richard Evans | Ruslan Mitkov
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

When processing a text, humans and machines must disambiguate between different uses of the pronoun it, including non-referential, nominal anaphoric or clause anaphoric ones. In this paper we use eye-tracking data to learn how humans perform this disambiguation and use this knowledge to improve the automatic classification of it. We show that by using gaze data and a POS-tagger we are able to significantly outperform a common baseline and classify between three categories of it with an accuracy comparable to that of linguistic-based approaches. In addition, the discriminatory power of specific gaze features informs the way humans process the pronoun, which, to the best of our knowledge, has not been explored using data from a natural reading task.

pdf bib
WLV at SemEval-2018 Task 3: Dissecting Tweets in Search of Irony
Omid Rohanian | Shiva Taslimipoor | Richard Evans | Ruslan Mitkov
Proceedings of The 12th International Workshop on Semantic Evaluation

This paper describes the systems submitted to SemEval 2018 Task 3 “Irony detection in English tweets” for both subtasks A and B. The first system leveraging a combination of sentiment, distributional semantic, and text surface features is ranked third among 44 teams according to the official leaderboard of the subtask A. The second system with slightly different representation of the features ranked ninth in subtask B. We present a method that entails decomposing tweets into separate parts. Searching for contrast within the constituents of a tweet is an integral part of our system. We embrace an extensive definition of contrast which leads to a vast coverage in detecting ironic content.

pdf bib
Wolves at SemEval-2018 Task 10: Semantic Discrimination based on Knowledge and Association
Shiva Taslimipoor | Omid Rohanian | Le An Ha | Gloria Corpas Pastor | Ruslan Mitkov
Proceedings of The 12th International Workshop on Semantic Evaluation

This paper describes the system submitted to SemEval 2018 shared task 10 ‘Capturing Dicriminative Attributes’. We use a combination of knowledge-based and co-occurrence features to capture the semantic difference between two words in relation to an attribute. We define scores based on association measures, ngram counts, word similarity, and ConceptNet relations. The system is ranked 4th (joint) on the official leaderboard of the task.

2017

pdf bib
Investigating the Opacity of Verb-Noun Multiword Expression Usages in Context
Shiva Taslimipoor | Omid Rohanian | Ruslan Mitkov | Afsaneh Fazly
Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017)

This study investigates the supervised token-based identification of Multiword Expressions (MWEs). This is an ongoing research to exploit the information contained in the contexts in which different instances of an expression could occur. This information is used to investigate the question of whether an expression is literal or MWE. Lexical and syntactic context features derived from vector representations are shown to be more effective over traditional statistical measures to identify tokens of MWEs.

pdf bib
Effects of Lexical Properties on Viewing Time per Word in Autistic and Neurotypical Readers
Sanja Štajner | Victoria Yaneva | Ruslan Mitkov | Simone Paolo Ponzetto
Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications

Eye tracking studies from the past few decades have shaped the way we think of word complexity and cognitive load: words that are long, rare and ambiguous are more difficult to read. However, online processing techniques have been scarcely applied to investigating the reading difficulties of people with autism and what vocabulary is challenging for them. We present parallel gaze data obtained from adult readers with autism and a control group of neurotypical readers and show that the former required higher cognitive effort to comprehend the texts as evidenced by three gaze-based measures. We divide all words into four classes based on their viewing times for both groups and investigate the relationship between longer viewing times and word length, word frequency, and four cognitively-based measures (word concreteness, familiarity, age of acquisition and imagability).

pdf bib
Translation Memory Systems Have a Long Way to Go
Andrea Silvestre Baquero | Ruslan Mitkov
Proceedings of the Workshop Human-Informed Translation and Interpreting Technology

The TM memory systems changed the work of translators and now the translators not benefiting from these tools are a tiny minority. These tools operate on fuzzy (surface) matching mostly and cannot benefit from already translated texts which are synonymous to (or paraphrased versions of) the text to be translated. The match score is mostly based on character-string similarity, calculated through Levenshtein distance. The TM tools have difficulties with detecting similarities even in sentences which represent a minor revision of sentences already available in the translation memory. This shortcoming of the current TM systems was the subject of the present study and was empirically proven in the experiments we conducted. To this end, we compiled a small translation memory (English-Spanish) and applied several lexical and syntactic transformation rules to the source sentences with both English and Spanish being the source language. The results of this study show that current TM systems have a long way to go and highlight the need for TM systems equipped with NLP capabilities which will offer the translator the advantage of he/she not having to translate a sentence again if an almost identical sentence has already been already translated.

bib
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017
Ruslan Mitkov | Galia Angelova
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

2016

pdf bib
Evaluating the Readability of Text Simplification Output for Readers with Cognitive Disabilities
Victoria Yaneva | Irina Temnikova | Ruslan Mitkov
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper presents an approach for automatic evaluation of the readability of text simplification output for readers with cognitive disabilities. First, we present our work towards the development of the EasyRead corpus, which contains easy-to-read documents created especially for people with cognitive disabilities. We then compare the EasyRead corpus to the simplified output contained in the LocalNews corpus (Feng, 2009), the accessibility of which has been evaluated through reading comprehension experiments including 20 adults with mild intellectual disability. This comparison is made on the basis of 13 disability-specific linguistic features. The comparison reveals that there are no major differences between the two corpora, which shows that the EasyRead corpus is to a similar reading level as the user-evaluated texts. We also discuss the role of Simple Wikipedia (Zhu et al., 2010) as a widely-used accessibility benchmark, in light of our finding that it is significantly more complex than both the EasyRead and the LocalNews corpora.

pdf bib
A Corpus of Text Data and Gaze Fixations from Autistic and Non-Autistic Adults
Victoria Yaneva | Irina Temnikova | Ruslan Mitkov
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

The paper presents a corpus of text data and its corresponding gaze fixations obtained from autistic and non-autistic readers. The data was elicited through reading comprehension testing combined with eye-tracking recording. The corpus consists of 1034 content words tagged with their POS, syntactic role and three gaze-based measures corresponding to the autistic and control participants. The reading skills of the participants were measured through multiple-choice questions and, based on the answers given, they were divided into groups of skillful and less-skillful readers. This division of the groups informs researchers on whether particular fixations were elicited from skillful or less-skillful readers and allows a fair between-group comparison for two levels of reading ability. In addition to describing the process of data collection and corpus development, we present a study on the effect that word length has on reading in autism. The corpus is intended as a resource for investigating the particular linguistic constructions which pose reading difficulties for people with autism and hopefully, as a way to inform future text simplification research intended for this population.

pdf bib
WOLVESAAR at SemEval-2016 Task 1: Replicating the Success of Monolingual Word Alignment and Neural Embeddings for Semantic Textual Similarity
Hannah Bechara | Rohit Gupta | Liling Tan | Constantin Orăsan | Ruslan Mitkov | Josef van Genabith
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

2015

pdf bib
Improving Translation Memory Matching through Clause Splitting
Katerina Raisa Timonera | Ruslan Mitkov
Proceedings of the Workshop Natural Language Processing for Translation Memories

pdf bib
MiniExperts: An SVM Approach for Measuring Semantic Textual Similarity
Hanna Béchara | Hernani Costa | Shiva Taslimipoor | Rohit Gupta | Constantin Orasan | Gloria Corpas Pastor | Ruslan Mitkov
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf bib
Proceedings of the International Conference Recent Advances in Natural Language Processing
Ruslan Mitkov | Galia Angelova | Kalina Bontcheva
Proceedings of the International Conference Recent Advances in Natural Language Processing

2014

pdf bib
One Step Closer to Automatic Evaluation of Text Simplification Systems
Sanja Štajner | Ruslan Mitkov | Horacio Saggion
Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR)

pdf bib
The Fewer, the Better? A Contrastive Study about Ways to Simplify
Ruslan Mitkov | Sanja Štajner
Proceedings of the Workshop on Automatic Text Simplification - Methods and Applications in the Multilingual Society (ATS-MA 2014)

2013

pdf bib
Proceedings of the Sixth International Joint Conference on Natural Language Processing
Ruslan Mitkov | Jong C. Park
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013
Ruslan Mitkov | Galia Angelova | Kalina Bontcheva
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

pdf bib
Pangeanic in the EXPERT Project: Exploiting Empirical appRoaches to Translation
Manuel Herranz | Alex Helle | Elia Yuste | Ruslan Mitkov | Lucia Specia
Proceedings of Machine Translation Summit XIV: European projects

pdf bib
A flexible framework for collocation retrieval and translation from parallel and comparable corpora
Oscar Mendoza Rivera | Ruslan Mitkov | Gloria Corpas Pastor
Proceedings of the Workshop on Multi-word Units in Machine Translation and Translation Technologies

2012

pdf bib
Diachronic Changes in Text Complexity in 20th Century English Language: An NLP Approach
Sanja Štajner | Ruslan Mitkov
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

A syntactically complex text may represent a problem for both comprehension by humans and various NLP tasks. A large number of studies in text simplification are concerned with this problem and their aim is to transform the given text into a simplified form in order to make it accessible to the wider audience. In this study, we were investigating what the natural tendency of texts is in 20th century English language. Are they becoming syntactically more complex over the years, requiring a higher literacy level and greater effort from the readers, or are they becoming simpler and easier to read? We examined several factors of text complexity (average sentence length, Automated Readability Index, sentence complexity and passive voice) in the 20th century for two main English language varieties - British and American, using the `Brown family' of corpora. In British English, we compared the complexity of texts published in 1931, 1961 and 1991, while in American English we compared the complexity of texts published in 1961 and 1992. Furthermore, we demonstrated how the state-of-the-art NLP tools can be used for automatic extraction of some complex features from the raw text version of the corpora.

pdf bib
A review corpus annotated for negation, speculation and their scope
Natalia Konstantinova | Sheila C.M. de Sousa | Noa P. Cruz | Manuel J. Maña | Maite Taboada | Ruslan Mitkov
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper presents a freely available resource for research on handling negation and speculation in review texts. The SFU Review Corpus, consisting of 400 documents of movie, book, and consumer product reviews, was annotated at the token level with negative and speculative keywords and at the sentence level with their linguistic scope. We report statistics on corpus size and the consistency of annotations. The annotated corpus will be useful in many applications, such as document mining and sentiment analysis.

pdf bib
CLCM - A Linguistic Resource for Effective Simplification of Instructions in the Crisis Management Domain and its Evaluations
Irina Temnikova | Constantin Orasan | Ruslan Mitkov
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Due to the increasing number of emergency situations which can have substantial consequences, both financially and fatally, the Crisis Management (CM) domain is developing at an exponential speed. The efficient management of emergency situations relies on clear communication between all of the participants in a crisis situation. For these reasons the Text Complexity (TC) of the CM domain needed to be investigated and showed that CM domain texts exhibit high TC levels. This article presents a new linguistic resource in the form of Controlled Language (CL) guidelines for manual text simplification in the CM domain which aims to address high TC in the CM domain and produce clear messages to be used in crisis situations. The effectiveness of the resource has been tested via evaluation from several different perspectives important for the domain. The overall results show that the CLCM simplification has a positive impact on TC, reading comprehension, manual translation and machine translation. Additionally, an investigation of the cognitive difficulty in applying manual simplification operations led to interesting discoveries. This article provides details of the evaluation methods, the conducted experiments, their results and indications about future work.

pdf bib
Automatic Question Generation in Multimedia-Based Learning
Yvonne Skalban | Le An Ha | Lucia Specia | Ruslan Mitkov
Proceedings of COLING 2012: Posters

pdf bib
Elliphant: Improved Automatic Detection of Zero Subjects and Impersonal Constructions in Spanish
Luz Rello | Ricardo Baeza-Yates | Ruslan Mitkov
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

2011

pdf bib
Diachronic Stylistic Changes in British and American Varieties of 20th Century Written English Language
Sanja Štajner | Ruslan Mitkov
Proceedings of the Workshop on Language Technologies for Digital Humanities and Cultural Heritage

pdf bib
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011
Ruslan Mitkov | Galia Angelova
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011

2010

pdf bib
Resources for Controlled Languages for Alert Messages and Protocols in the European Perspective
Sylviane Cardey | Krzysztof Bogacki | Xavier Blanco | Ruslan Mitkov
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper is concerned with resources for controlled languages for alert messages and protocols in the European perspective. These resources have been produced as the outcome of a project (Alert Messages and Protocols: MESSAGE) which has been funded with the support of the European Commission - Directorate-General Justice, Freedom and Security, and with the specific objective of 'promoting and supporting the development of security standards, and an exchange of know-how and experience on protection of people'. The MESSAGE project involved the development and transfer of a methodology for writing safe and safely translatable alert messages and protocols created by Centre Tesnière in collaboration with the aircraft industry, the health profession, and emergency services by means of a consortium of four partners to their four European member states in their languages (ES, FR (Coordinator), GB, PL). The paper describes alert messages and protocols, controlled languages for safety and security, the target groups involved, controlled language evaluation, dissemination, the resources that are available, both “Freely available” and “From Owner”, together with illustrations of the resources, and the potential transferability to other sectors and users.

2009

pdf bib
Semantic Similarity of Distractors in Multiple-Choice Tests: Extrinsic Evaluation
Ruslan Mitkov | Le An Ha | Andrea Varga | Luz Rello
Proceedings of the Workshop on Geometrical Models of Natural Language Semantics

pdf bib
Proceedings of the International Conference RANLP-2009
Galia Angelova | Ruslan Mitkov
Proceedings of the International Conference RANLP-2009

2008

pdf bib
Mutual Bilingual Terminology Extraction
Le An Ha | Gabriela Fernandez | Ruslan Mitkov | Gloria Corpas
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper describes a novel methodology to perform bilingual terminology extraction, in which automatic alignment is used to improve the performance of terminology extraction for each language. The strengths of monolingual terminology extraction for each language are exploited to improve the performance of terminology extraction in the other language, thanks to the availability of a sentence-level aligned bilingual corpus, and an automatic noun phrase alignment mechanism. The experiment indicates that weaknesses in monolingual terminology extraction due to the limitation of resources in certain languages can be overcome by using another language which has no such limitation.

pdf bib
Anaphora Resolution Exercise: an Overview
Constantin Orăsan | Dan Cristea | Ruslan Mitkov | António Branco
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Evaluation campaigns have become an established way to evaluate automatic systems which tackle the same task. This paper presents the first edition of the Anaphora Resolution Exercise (ARE) and the lessons learnt from it. This first edition focused only on English pronominal anaphora and NP coreference, and was organised as an exploratory exercise where various issues were investigated. ARE proposed four different tasks: pronominal anaphora resolution and NP coreference resolution on a predefined set of entities, pronominal anaphora resolution and NP coreference resolution on raw texts. For each of these tasks different inputs and evaluation metrics were prepared. This paper presents the four tasks, their input data and evaluation metrics used. Even though a large number of researchers in the field expressed their interest to participate, only three institutions took part in the formal evaluation. The paper briefly presents their results, but does not try to interpret them because in this edition of ARE our aim was not about finding why certain methods are better, but to prepare the ground for a fully-fledged edition.

pdf bib
Smarty - Extendable Framework for Bilingual and Multilingual Comprehension Assistants
Todor Arnaudov | Ruslan Mitkov
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper discusses a framework for development of bilingual and multilingual comprehension assistants and presents a prototype implementation of an English-Bulgarian comprehension assistant. The framework is based on the application of advanced graphical user interface techniques, WordNet and compatible lexical databases as well as a series of NLP preprocessing tasks, including POS-tagging, lemmatisation, multiword expressions recognition and word sense disambiguation. The aim of this framework is to speed up the process of dictionary look-up, to offer enhanced look-up functionalities and to perform a context-sensitive narrowing-down of the set of translation alternatives proposed to the user.

pdf bib
Translation universals: do they exist? A corpus-based NLP study of convergence and simplification
Gloria Corpas Pastor | Ruslan Mitkov | Naveed Afzal | Viktor Pekar
Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Research Papers

Convergence and simplification are two of the so-called universals in translation studies. The first one postulates that translated texts tend to be more similar than non-translated texts. The second one postulates that translated texts are simpler, easier-to-understand than non-translated ones. This paper discusses the results of a project which applies NLP techniques over comparable corpora of translated and non-translated texts in Spanish seeking to establish whether these two universals hold Corpas Pastor (2008).

2006

pdf bib
Generating Multiple-Choice Test Items from Medical Text: A Pilot Study
Nikiforos Karamanis | Le An Ha | Ruslan Mitkov
Proceedings of the Fourth International Natural Language Generation Conference

pdf bib
If “it” were “then”, then when was “it”? Establishing the anaphoric role of “then”
Georgiana Puşcaşu | Ruslan Mitkov
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

The adverb "then" is among the most frequent Englishtemporal adverbs, being also capable of filling a variety of semantic roles. The identification of anaphoric usages of "then"is important for temporal expression resolution, while thetemporal relationship usage is important for event ordering. Given that previous work has not tackled the identification and temporal resolution of anaphoric "then", this paper presents a machine learning approach for setting apart anaphoric usages and a rule-based normaliser that resolves it with respect to an antecedent. The performance of the two modules is evaluated. The present paper also describes the construction of an annotated corpus and the subsequent derivation of training data required by the machine learning module.

2005

pdf bib
Building a WSD module within an MT system to enable interactive resolution in the user’s source language
Constantin Orasan | Ted Marshall | Robert Clark | Le An Ha | Ruslan Mitkov
Proceedings of the 10th EAMT Conference: Practical applications of machine translation

2004

pdf bib
Annotation of Anaphoric Expressions in an Aligned Bilingual Corpus
Agnès Tutin | Meriam Haddara | Ruslan Mitkov | Constantin Orasan
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
Categorizing Web Pages as a Preprocessing Step for Information Extraction
Viktor Pekar | Richard Evans | Ruslan Mitkov
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

2003

pdf bib
Computer-Aided Generation of Multiple-Choice Tests
Ruslan Mitkov | Le An Ha
Proceedings of the HLT-NAACL 03 Workshop on Building Educational Applications Using Natural Language Processing

pdf bib
CAST: A computer-aided summarisation tool
Constantin Orasan | Ruslan Mitkov | Laura Hasler
10th Conference of the European Chapter of the Association for Computational Linguistics

2002

pdf bib
Bilingual alignment of anaphoric expressions
R. Muñoz | R. Mitkov | M. Palomar | J. Peral | R. Evans | L. Moreno | C. Orasan | M. Saiz-Noeda | A. Ferrández | C. Barbu | P. Martínez-Barco | A. Suárez
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf bib
A corpus based investigation of morphological disagreement in anaphoric relations
Cătălina Barbu | Richard Evans | Ruslan Mitkov
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf bib
Shallow Language Processing Architecture for Bulgarian
Hristo Tanev | Ruslan Mitkov
COLING 2002: The 19th International Conference on Computational Linguistics

2001

pdf bib
Evaluation Tool for Rule-based Anaphora Resolution Methods
Catalina Barbu | Ruslan Mitkov
Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics

pdf bib
Introduction to the Special Issue on Computational Anaphora Resolution
Ruslan Mitkov | Branimir Boguraev | Shalom Lappin
Computational Linguistics, Volume 27, Number 4, December 2001

2000

pdf bib
Towards More Comprehensive Evaluation in Anaphora Resolution
Ruslan Mitkov
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

pdf bib
Evaluation environment for anaphora resolution
Catalina Barbu | Ruslan Mitkov
Proceedings of the International Conference on Machine Translation and Multilingual Applications in the new Millennium: MT 2000

pdf bib
LINGUA: a robust architecture for text processing and anaphora resolution in Bulgarian
Hristo Tanev | Ruslan Mitkov
Proceedings of the International Conference on Machine Translation and Multilingual Applications in the new Millennium: MT 2000

1999

pdf bib
Book Reviews: Centering Theory in Discourse
Ruslan Mitkov
Computational Linguistics, Volume 25, Number 4, December 1999

1998

pdf bib
Robust pronoun resolution with limited knowledge
Ruslan Mitkov
COLING 1998 Volume 2: The 17th International Conference on Computational Linguistics

pdf bib
Robust Pronoun Resolution with Limited Knowledge
Ruslan Mitkov
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 2

pdf bib
Multilingual Robust Anaphora Resolution
Ruslan Mitkov | Lamia Belguith | Malgorzata Stys
Proceedings of the Third Conference on Empirical Methods for Natural Language Processing

1997

pdf bib
Factors in anaphora resolution: they are not the only things that matter. A case study based on two different approaches
Ruslan Mitkov
Operational Factors in Practical, Robust Anaphora Resolution for Unrestricted Texts

pdf bib
How far are we from (semi-)automatic of anaphoric links in corpora?
Ruslan Mitkov
Operational Factors in Practical, Robust Anaphora Resolution for Unrestricted Texts

1996

pdf bib
Towards a more efficient use of PC-based MT in education
Ruslan Mitkov
Proceedings of Translating and the Computer 18

1995

pdf bib
Anaphora Resolution in Machine Translation
Ruslan Mitkov | Sung-Kwon Choi | Randall Sharp
Proceedings of the Sixth Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages

1994

pdf bib
Machine translation, ten years on: Discourse has yet to make a breakthrough
Ruslan Mitkov | Johann Haller
Proceedings of the Second International Conference on Machine Translation: Ten years on

pdf bib
An Integrated Model for Anaphora Resolution
Ruslan Mitkov
COLING 1994 Volume 2: The 15th International Conference on Computational Linguistics

pdf bib
Book Reviews: Expressibility and the Problem of Efficient Text Planning
Ruslan Mitkov
Computational Linguistics, Volume 20, Number 1, March 1994

1993

pdf bib
How Could Rhetorical Relations Be Used in Machine Translation?
Ruslan Mitkov
Intentionality and Structure in Discourse Relations