Achieving Domain Specificity in SMT without Overt Siloing

William D. Lewis, Chris Wendt, David Bullock


Abstract
We examine pooling data as a method for improving Statistical Machine Translation (SMT) quality for narrowly defined domains, such as data for a particular company or public entity. By pooling all available data, building large SMT engines, and using domain-specific target language models, we see boosts in quality, and can achieve the generalizability and resiliency of a larger SMT but with the precision of a domain-specific engine.
Anthology ID:
L10-1545
Volume:
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Month:
May
Year:
2010
Address:
Valletta, Malta
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/791_Paper.pdf
DOI:
Bibkey:
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/791_Paper.pdf