%0 Conference Proceedings %T Monolingual Data Optimisation for Bootstrapping SMT Engines %A Jiang, Jie %A Way, Andy %A Ng, Nelson %A Haque, Rejwanul %A Dillinger, Mike %A Lu, Jun %Y Okita, Tsuyoshi %Y Sokolov, Artem %Y Watanabe, Taro %S Workshop on Monolingual Machine Translation %D 2012 %8 oct 28 nov 1 %I Association for Machine Translation in the Americas %C San Diego, California, USA %F jiang-etal-2012-monolingual %X Content localisation via machine translation (MT) is a sine qua non, especially for international online business. While most applications utilise rule-based solutions due to the lack of suitable in-domain parallel corpora for statistical MT (SMT) training, in this paper we investigate the possibility of applying SMT where huge amounts of monolingual content only are available. We describe a case study where an analysis of a very large amount of monolingual online trading data from eBay is conducted by ALS with a view to reducing this corpus to the most representative sample in order to ensure the widest possible coverage of the total data set. Furthermore, minimal yet optimal sets of sentences/words/terms are selected for generation of initial translation units for future SMT system-building. %U https://aclanthology.org/2012.amta-monomt.2