Karin Sim Smith

Also published as: Karin Sim Smith


2017

pdf bib
On Integrating Discourse in Machine Translation
Karin Sim Smith
Proceedings of the Third Workshop on Discourse in Machine Translation

As the quality of Machine Translation (MT) improves, research on improving discourse in automatic translations becomes more viable. This has resulted in an increase in the amount of work on discourse in MT. However many of the existing models and metrics have yet to integrate these insights. Part of this is due to the evaluation methodology, based as it is largely on matching to a single reference. At a time when MT is increasingly being used in a pipeline for other tasks, the semantic element of the translation process needs to be properly integrated into the task. Moreover, in order to take MT to another level, it will need to judge output not based on a single reference translation, but based on notions of fluency and of adequacy – ideally with reference to the source text.

2016

pdf bib
Word embeddings and discourse information for Quality Estimation
Carolina Scarton | Daniel Beck | Kashif Shah | Karin Sim Smith | Lucia Specia
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf bib
The Trouble with Machine Translation Coherence
Karin Sim Smith | Wilker Aziz | Lucia Specia
Proceedings of the 19th Annual Conference of the European Association for Machine Translation

pdf bib
Cohere: A Toolkit for Local Coherence
Karin Sim Smith | Wilker Aziz | Lucia Specia
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We describe COHERE, our coherence toolkit which incorporates various complementary models for capturing and measuring different aspects of text coherence. In addition to the traditional entity grid model (Lapata, 2005) and graph-based metric (Guinaudeau and Strube, 2013), we provide an implementation of a state-of-the-art syntax-based model (Louis and Nenkova, 2012), as well as an adaptation of this model which shows significant performance improvements in our experiments. We benchmark these models using the standard setting for text coherence: original documents and versions of the document with sentences in shuffled order.

2015

pdf bib
A Proposal for a Coherence Corpus in Machine Translation
Karin Sim Smith | Wilker Aziz | Lucia Specia
Proceedings of the Second Workshop on Discourse in Machine Translation

pdf bib
Sheffield Systems for the Finnish-English WMT Translation Task
David Steele | Karin Sim Smith | Lucia Specia
Proceedings of the Tenth Workshop on Statistical Machine Translation