Achieving Domain Specificity in SMT without Overt Siloing
William D. Lewis | Chris Wendt | David Bullock
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
We examine pooling data as a method for improving Statistical Machine Translation (SMT) quality for narrowly defined domains, such as data for a particular company or public entity. By pooling all available data, building large SMT engines, and using domain-specific target language models, we see boosts in quality, and can achieve the generalizability and resiliency of a larger SMT but with the precision of a domain-specific engine.