Mihaela Vela


English to Manipuri and Mizo Post-Editing Effort and its Impact on Low Resource Machine Translation
Loitongbam Sanayai Meetei | Thoudam Doren Singh | Sivaji Bandyopadhyay | Mihaela Vela | Josef van Genabith
Proceedings of the 17th International Conference on Natural Language Processing (ICON)

We present the first study on the post-editing (PE) effort required to build a parallel dataset for English-Manipuri and English-Mizo, in the context of a project on creating data for machine translation (MT). English source text from a local daily newspaper are machine translated into Manipuri and Mizo using PBSMT systems built in-house. A Computer Assisted Translation (CAT) tool is used to record the time, keystroke and other indicators to measure PE effort in terms of temporal and technical effort. A positive correlation between the technical effort and the number of function words is seen for English-Manipuri and English-Mizo but a negative correlation between the technical effort and the number of noun words for English-Mizo. However, average time spent per token in PE English-Mizo text is negatively correlated with the temporal effort. The main reason for these results are due to (i) English and Mizo using the same script, while Manipuri uses a different script and (ii) the agglutinative nature of Manipuri. Further, we check the impact of training a MT system in an incremental approach, by including the post-edited dataset as additional training data. The result shows an increase in HBLEU of up to 4.6 for English-Manipuri.


Improving CAT Tools in the Translation Workflow: New Approaches and Evaluation
Mihaela Vela | Santanu Pal | Marcos Zampieri | Sudip Naskar | Josef van Genabith
Proceedings of Machine Translation Summit XVII: Translator, Project and User Tracks


Predicting the Law Area and Decisions of French Supreme Court Cases
Octavia-Maria Şulea | Marcos Zampieri | Mihaela Vela | Josef van Genabith
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

In this paper, we investigate the application of text classification methods to predict the law area and the decision of cases judged by the French Supreme Court. We also investigate the influence of the time period in which a ruling was made over the textual form of the case description and the extent to which it is necessary to mask the judge’s motivation for a ruling to emulate a real-world test scenario. We report results of 96% f1 score in predicting a case ruling, 90% f1 score in predicting the law area of a case, and 75.9% f1 score in estimating the time span when a ruling has been issued using a linear Support Vector Machine (SVM) classifier trained on lexical features.

Neural Automatic Post-Editing Using Prior Alignment and Reranking
Santanu Pal | Sudip Kumar Naskar | Mihaela Vela | Qun Liu | Josef van Genabith
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

We present a second-stage machine translation (MT) system based on a neural machine translation (NMT) approach to automatic post-editing (APE) that improves the translation quality provided by a first-stage MT system. Our APE system (APE_Sym) is an extended version of an attention based NMT model with bilingual symmetry employing bidirectional models, mt–pe and pe–mt. APE translations produced by our system show statistically significant improvements over the first-stage MT, phrase-based APE and the best reported score on the WMT 2016 APE dataset by a previous neural APE system. Re-ranking (APE_Rerank) of the n-best translations from the phrase-based APE and APE_Sym systems provides further substantial improvements over the symmetric neural APE model. Human evaluation confirms that the APE_Rerank generated PE translations improve on the previous best neural APE system at WMT 2016.


CATaLog Online: Porting a Post-editing Tool to the Web
Santanu Pal | Marcos Zampieri | Sudip Kumar Naskar | Tapas Nayak | Mihaela Vela | Josef van Genabith
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper presents CATaLog online, a new web-based MT and TM post-editing tool. CATaLog online is a freeware software that can be used through a web browser and it requires only a simple registration. The tool features a number of editing and log functions similar to the desktop version of CATaLog enhanced with several new features that we describe in detail in this paper. CATaLog online is designed to allow users to post-edit both translation memory segments as well as machine translation output. The tool provides a complete set of log information currently not available in most commercial CAT tools. Log information can be used both for project management purposes as well as for the study of the translation process and translator’s productivity.

SubCo: A Learner Translation Corpus of Human and Machine Subtitles
José Manuel Martínez Martínez | Mihaela Vela
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In this paper, we present a freely available corpus of human and automatic translations of subtitles. The corpus comprises, the original English subtitles (SRC), both human (HT) and machine translations (MT) into German, as well as post-editions (PE) of the MT output. HT and MT are annotated with errors. Moreover, human evaluation is included in HT, MT, and PE. Such a corpus is a valuable resource for both human and machine translation communities, enabling the direct comparison – in terms of errors and evaluation – between human and machine translations and post-edited machine translations.

A Neural Network based Approach to Automatic Post-Editing
Santanu Pal | Sudip Kumar Naskar | Mihaela Vela | Josef van Genabith
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)


Can Translation Memories afford not to use paraphrasing ?
Rohit Gupta | Constantin Orasan | Marcos Zampieri | Mihaela Vela | Josef van Genabith
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

Searching for Context: a Study on Document-Level Labels for Translation Quality Estimation
Carolina Scarton | Marcos Zampieri | Mihaela Vela | Josef van Genabith | Lucia Specia
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

Re-assessing the WMT2013 Human Evaluation with Professional Translators Trainees
Mihaela Vela | Josef van Genabith
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

Register-based machine translation evaluation with text classification techniques
Mihaela Vela | Ekaterina Lapshinova-Koltunski
Proceedings of Machine Translation Summit XV: Papers

Measuring ‘Registerness’ in Human and Machine Translation: A Text Classification Approach
Ekaterina Lapshinova-Koltunski | Mihaela Vela
Proceedings of the Second Workshop on Discourse in Machine Translation

USAAR-SAPE: An English–Spanish Statistical Automatic Post-Editing System
Santanu Pal | Mihaela Vela | Sudip Kumar Naskar | Josef van Genabith
Proceedings of the Tenth Workshop on Statistical Machine Translation

Predicting Machine Translation Adequacy with Document Embeddings
Mihaela Vela | Liling Tan
Proceedings of the Tenth Workshop on Statistical Machine Translation

CATaLog: New Approaches to TM and Post Editing Interfaces
Tapas Nayek | Sudip Kumar Naskar | Santanu Pal | Marcos Zampieri | Mihaela Vela | Josef van Genabith
Proceedings of the Workshop Natural Language Processing for Translation Memories


Beyond Linguistic Equivalence. An Empirical Study of Translation Evaluation in a Translation Learner Corpus
Mihaela Vela | Anne-Kathrin Schumann | Andrea Wurm
Proceedings of the EACL 2014 Workshop on Humans and Computer-assisted Translation

Quantifying the Influence of MT Output in the Translators’ Performance: A Case Study in Technical Translation
Marcos Zampieri | Mihaela Vela
Proceedings of the EACL 2014 Workshop on Humans and Computer-assisted Translation


Concept and Relation Extraction in the Finance Domain
Mihaela Vela | Thierry Declerck
Proceedings of the Eight International Conference on Computational Semantics


Generic NLP Tools for Supporting Shallow Ontology Building
Thierry Declerck | Mihaela Vela
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

In this paper we present on-going investigations on how complex syntactic annotation, combined with linguistic semantics, can possibly help in supporting the semi-automatic building of (shallow) ontologies from text by proposing an automated extraction of (possibly underspecified) semantic relations from linguistically annotated text.

The Use of Multilevel Annotation and Alignment for the Translator
Mihaela Vela
Proceedings of Translating and the Computer 28

Multi-dimensional Annotation and Alignment in an English-German Translation Corpus
Silvia Hansen-Schirra | Stella Neumann | Mihaela Vela
Proceedings of the 5th Workshop on NLP and XML (NLPXML-2006): Multi-Dimensional Markup in Natural Language Processing