Machine Translation Archive

Index of data, corpora and resources

Publications since 2010

For other periods go to: publications 2005-2009; publications 2000-2004; publications 1990-1999; publications 1970-1989; publications before 1989

To return to home page click here

Bilingual corpora [see also Comparable corpora, Example-based methods, Multilingual corpora]

(2015) Vishwajeet Kumar, Ashish Kulkarni, Pankaj Singh, Ganesh Ramakrishnan, & Ganesh Arnaal: A machine-assisted human translation system for technical documents. MT Summit XV, October 30 – November 3, 2015, Miami, Florida, USA. Proceedings of MT Summit XV: vol.2: MT Users’ Track; p.259-272. [PDF, 1,275KB]

(2015) Chaochao Wang, Deyi Xiong, Min Zhang, & Chunyu Kit: Learning bilingual distributed phrase represenations for statistical machine translation. MT Summit XV, October 30 – November 3, 2015, Miami, Florida, USA. Proceedings of MT Summit XV: vol.1: MT Researchers’ Track; pp.32-43. [PDF, 668KB]

(2015) Alex Yanishevsky: How much cake is enough: the case for domain-specific engines. MT Summit XV, October 30 – November 3, 2015, Miami, Florida, USA. Proceedings of MT Summit XV: vol.2: MT Users’ Track; pp.224-247. [PDF, 1,682KB]

(2015) Dong Zhan & Hiromi Nakaiwa: Automatic detection of antecedents of Japanese zero pronouns using a Japanese-English bilingual corpus. MT Summit XV, October 30 – November 3, 2015, Miami, Florida, USA. Proceedings of MT Summit XV: vol.1: MT Researchers’ Track; pp.66-79. [PDF, 863KB]

(2014) Burak Aydın & Arzucan Özgür: Expanding machine translation training data with an out-of-domain corpus using language modeling based vocabulary saturation. AMTA 2014: proceedings of the eleventh conference of the Association for Machine Translation in the Americas, Vancouver, BC, October 22-26; pp.180-192. [PDF, 523KB]

(2014) Fabrizio Gotti, Philippe Langlais, & Atefeh Farzindar: Hashtag occurrences, layout and translation: a corpus-driven analysis of tweets published by the Canadian government.  LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.2254-2261. [PDF, 486KB]

(2014) Adam Kilgarriff: Terminology finding in the Sketch Engine: an evaluation. Translating and the Computer 36: proceedings. Asling: International Society for Advancement in Language Technology, 27-28 November 2014; pp.130-132. [PDF, 286KB]

(2014) Shachar Mirkin & Laurent Besacier: Data selection for compact adapted SMT models. AMTA 2014: proceedings of the eleventh conference of the Association for Machine Translation in the Americas, Vancouver, BC, October 22-26; pp.301-314. [PDF, 610KB]

 (2014) Xingyi Song, Lucia Specia, & Trevor Cohn: Data selection for discriminative training in statistical machine translation. Proceedings of the 17th annual conference of the European Association for Machine Translation, EAMT 2014, Dubrovnik, Croatia, 16th-18th June 2014; pp.45-52. [PDF, 415KB]

(2013) Wanxiang Che, Mengqiu Wang, Christopher D.Manning, & Ting Liu: Named entity recognition with bilingual constraints. [NAACL-HLT 2013] The 2013 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 9-14 June 2013, Atlanta, Georgia; pp.52-62. [PDF, 289KB]

(2013) Lei Cui, Dongdong Zhang, Shujie Liu, Mu Li, & Ming Zhou: Bilingual data cleaning for SMT using graph-based random walk.  ACL-2013: Proceedings of the 51st Meeting of the Association for Computational Linguistics, Short papers, Sofia, Bulgaria, August 4-9 2013; pp.340-345. [PDF, 259KB]

(2013) Manaal Faruqui & Chris Dyer: An information theoretic approach to bilingual word clustering.  ACL-2013: Proceedings of the 51st Meeting of the Association for Computational Linguistics, Short papers, Sofia, Bulgaria, August 4-9 2013; pp.777-783. [PDF, 263KB]

(2013) Francisco Guzman, Hassan Sajjad, Stephan Vogel, & Ahmed Abdelali: The AMARA corpus: building resources for translating the web’s educational content. [IWSLT 2013] Proceedings of the 10th International Workshop on Spoken Language Translation, Heidelberg, Germany, Dec.5-6, 2013; 8pp. [PDF, 189KB]

(2013) Ann Irvine & Chris Callison-Burch: Supervised bilingual lexicon induction with multiple monolingual signals. [NAACL-HLT 2013] The 2013 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 9-14 June 2013, Atlanta, Georgia; pp.518-523. [PDF, 511KB]

(2013) Adam Kilgarriff: Terminology finding, parallel corpora and bilingual word sketches in the Sketch Engine. [Aslib 2013] Translating and the Computer 35, 28-29 November 2013, etc.venues, Paddington, London, UK; 6pp. [PDF, 1018KB]; presentation, 23 slides [PDF of PPT, 957KB]

(2013) Anoop Kunchukuttan, Rajen Chatterjee, Shourya Roy, Abhijit Mishra, & Pushpak Bhattacharyya: TransDoop: a map-reduce based crowdsourced translation for complex domains. ACL-2013: Proceedings of the 51st Meeting of the Association for Computational Linguistics, System demonstrations, Sofia, Bulgaria, August 4-9 2013; pp.175-180. [PDF, 1061KB]

(2013) Oscar Mendoza Rivera, Ruslan Mitkov, & Gloria Corpas Pastor: A flexible framework for collocation retrieval and translation from parallel and comparable corpora. [MT Summit XIV] Proceedings of the Workshop on Multi-word Units in Machine Translation and Translation Technology, Nice, September 3, 2013; pp.18-25. [PDF, 356KB]

(2013) Vassilis Papavassiliou, Prokopis Prokopidis, & Gregor Thurmair: A modular open-source focused crawler for mining monolingual and bilingual corpora from the web.  Proceedings of the 6th Workshop on Building and Using Comparable Corpora, Sofia, Bulgaria, August 8, 2013; pp.43-51. [PDF, 663KB]

(2013) Karl Pichotta & John DeNero: Identifying phrasal verbs using many bilingual corpora. [EMNLP 2013] Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, Washington, USA, 18-21 October 2013; pp.636-646. [PDF, 267KB]

(2013) Jason R.Smith, Herve Saint-Amand, Magdalena Plamada, Philipp Koehn, Chris Callison-Burch, & Adam Lopez: Dirt cheap web-scale parallel text from the Common Crawl.  ACL-2013: Proceedings of the 51st Meeting of the Association for Computational Linguistics, Sofia, Bulgaria, August 4-9 2013; pp.1374-1383. [PDF, 179KB]

(2013) Ivan Vulić & Marie-Francine Moens: A study on bootstrapping bilingual vector spaces from non-parallel data (and nothing else). [EMNLP 2013] Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, Washington, USA, 18-21 October 2013; pp.1044-1054. [PDF, 261KB]

(2013) Lingxiao Wang & Christian Boitet: Online production of HQ parallel corpora and permanent task-based evaluation of multiple MT systems: both can be obtained through iMAGs with no added cost. Proceedings of MT Summit XIV Workshop on Post-editing Technology and Practice (WPTP-2), Nice, France, 2 September 2013; pp. 103-110. [PDF, 2109KB]

(2012) Walid Aransa, Holger Schwenk, & Loic Barrault: Semi-supervised transliteration mining from parallel and comparable corpora. IWSLT-2012: 9th International Workshop on Spoken Language Translation, Hong Kong, December 6th-7th, 2012; pp. 185-192. [PDF, 650KB]

(2012) Mihael Arcan, Paul Buitelaar, & Christian Federmann: Using domain-specific and collaborative resources for term translation. SSST-6, Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation, Jeju, Republic of Korea, 12 July 2012; pp.86-94. [PDF, 122KB]

(2012) Núria Bel, Vassilis Papavasiliou, Prokopis Prokopidis, Antonio Toral, & Victoria Arranz: Mining and exploiting domain-specific corpora in the PANACEA platform. [BUCC 2012] The 5th Workshop on Building and Using Comparable Corpora: “Language Resources for Machine Translation in Less-Resourced Languages and Domains”,  LREC 2012 Workshop, 26 May 2012, Istanbul, Turkey; pp.24-26. [PDF, 469KB]

(2012) Sergey Block, Michael Bloodgood, Petra Bradley, Ryan Corbett, Michael Maxwell, Erica Michael, Peter Osthus, Paul Rodrigues, & Benjamin Strauss: Evaluating parallel corpora: assessing utility for use with translation memory systems in government settings [abstract]. AMTA-2012: the Tenth Biennial Conference of the Association for Machine Translation in the Americas. Proceedings, San Diego, CA, October 28 – November 1, 2012; 1p. [PDF, 12KB]; presentation, 39 slides [PDF of PPT, 1589KB]

(2012) Ondřej Bojar, Zdeněk Žabokrtský, Ondřej Dušek, Petra Galuščáková, Martin Majliš, David Mareček, Jiří Maršík, Michal Novák, Martin Popel, & Aleš Tamchyna: The joy of parallelism with CzEng 1.0.  LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.3921-3928. [PDF, 460KB]

(2012) Houda Bouamor, Aurélien Max, & Anne Vilnat: Validation of sub-sentential paraphrases acquired from parallel monolingual corpora. [EACL 2012] Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, Avignon, France, April 23-27, 2012; pp. 716-725. [PDF, 173KB]

(2012) Mauro Cettolo, Christian Girardi, & Marcello Federico: WIT3: web inventory of transcribed and translated talks. EAMT 2012: Proceedings of the 16th Annual Conference of the European Association for Machine Translation, Trento, Italy, May 28-30 2012, ed. Mauro Cettolo, Marcello Federico, Lucia Specia, Andy Way; pp.261-268. [PDF, 197KB]

(2012) Sherri Condon, Luis Hernandez, Dan Parvaz, Mohammad S.Khan, & Hazrat Jahed: Producing data for under-resourced languages: a Dari-English parallel corpus of multi-genre text. AMTA-2012: the Tenth Biennial Conference of the Association for Machine Translation in the Americas. Proceedings, San Diego, CA, October 28 – November 1, 2012; 10p. [PDF, 602KB]; abstract, 1p. [PDF, 12KB]

(2012) Ângela Costa, Tiago Luís, Joana Ribeiro, Ana Cristina Mendes, & Luísa Coheur: An English-Portuguese parallel corpus of questions: translation guidelines and application in statistical machine translation.  LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.2172-2176. [PDF, 388KB]

(2012) Mark Fishel, Yota Georgakopoulou, Sergio Penkale, Volha Petukhova, Matej Rojc, Martin Volk, & Andy Way: From subtitles to parallel corpora. EAMT 2012: Proceedings of the 16th Annual Conference of the European Association for Machine Translation, Trento, Italy, May 28-30 2012, ed. Mauro Cettolo, Marcello Federico, Lucia Specia, Andy Way; pp.3-6. [PDF, 165KB]

(2012) Guillem Gascó, Martha-Alicia Rocha, Germán Sanchis-Trilles, Jesús Andrés-Ferrer, & Francisco Casacuberta: Does more data always yield better translations?  [EACL 2012] Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, Avignon, France, April 23-27, 2012; pp.152-161. [PDF,  162KB]

(2012) Monica Gavrila, Walther v.Hahn, & Cristina Vertan: Same domain different discourse style: a case study on language resources for data-driven machine translation.  LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.3441-3446. [PDF, 344KB]

(2012) Cyril Goutte, Marine Carpuat, & George Foster: The impact of sentence alignment errors on phrase-based machine translation performance. AMTA-2012: the Tenth Biennial Conference of the Association for Machine Translation in the Americas. Proceedings, San Diego, CA, October 28 – November 1, 2012; 10pp. [PDF, 147KB]

(2012) Stephen Grimes, Katherine Peterson, & Xuansong Li: Automatic word alignment tools to scale production of manually aligned parallel texts.  LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.2194-2198. [PDF, 259KB]

(2012) Eva Hajičová & Petr Sgall: Formal models and practice of annotation [abstract]. In: Crosslingual Language Technology in service of an integrated multilingual Europe, 4-5 May 2012, University of Hamburg, Germany; pp.17-18. [PDF]

(2012) Petter Haugereid & Francis Bond: Extracting semantic transfer rules from parallel corpora with SMT phrase aligners. SSST-6, Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation, Jeju, Republic of Korea, 12 July 2012; pp.67-75. [PDF, 194KB]

(2012) Quoc Hung-Ngo & Werner Winiwarter: A visualizing annotation tool for semi-automatically building a bilingual corpus.  [BUCC 2012] The 5th Workshop on Building and Using Comparable Corpora: “Language Resources for Machine Translation in Less-Resourced Languages and Domains”,  LREC 2012 Workshop, 26 May 2012, Istanbul, Turkey; pp.67-74. [PDF, 820KB]

(2012) Georgi Iliev & Angel Genov: Expanding parallel resources for medium-density languages for free.  LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.3937-3943. [PDF, 321KB]

(2012) Fattaneh Jabbari, Somayeh Bakhshaei, Seyed Mohammad Mohammadzadeh Ziabary, & Shahram Khadivi: Developing an open-domain English-Farsi translation system using AFEC, Amirkabir bilingual Farsi-English corpus. AMTA-2012: Fourth workshop on computational approaches to Arabic script-based languages. Proceedings, San Diego, November 1, 2012; pp.17-23. [PDF, 683KB]

(2012) Weimin Jiang: Engine-specific Chinese-English user parallel corpora AMTA-2012: the Tenth Biennial Conference of the Association for Machine Translation in the Americas. Proceedings, San Diego, CA, October 28 – November 1, 2012; 9pp. [PDF, 326KB]; presentation, 25 slides [PDF of PPT,540KB]

(2012) J.Howard Johnson: Conditional significance pruning: discarding more of huge phrase tables. AMTA-2012: the Tenth Biennial Conference of the Association for Machine Translation in the Americas. Proceedings, San Diego, CA, October 28 – November 1, 2012; 10pp. [PDF, 241KB]

(2012) Adam Kilgarriff & George Tambouratzis: The PRESEMT project. [BUCC 2012] The 5th Workshop on Building and Using Comparable Corpora: “Language Resources for Machine Translation in Less-Resourced Languages and Domains”,  LREC 2012 Workshop, 26 May 2012, Istanbul, Turkey; pp.27-28. [PDF, 355KB]

(2012) Gerhard Kremer, Matthias Hartung, Sebastian Padó & Stefan Riezler: Statistical machine translation support improves human adjective translation. Translation: Computation, Corpora,  Cognition 2 (1), July 2012; pp.103-126. [PDF, 266KB]

 (2012) Cvetana Krsteva & Duško Vitas: Construction and exploitation of X-Serbian bitexts [abstract]. In: Crosslingual Language Technology in service of an integrated multilingual Europe, 4-5 May 2012, University of Hamburg, Germany; pp.25-26. [PDF]

(2012) Septina Dian Larasati: IDENTIC corpus: morphologically enriched Indonesian-English parallel corpus.  LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.902-906. [PDF, 719KB]

(2012) Marianna J.Martindale: Can statistical post-editing with a small parallel corpus save a weak MT engine?  LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.2138-2142. [PDF, 353KB]

(2012) Mohammed Mediani, Jan Niehues, & Alex Waibel: Parallel phrase scoring for extra-large corpora. Prague Bulletin of Mathematical Linguistics 98, October 2012; pp.87-98. [PDF, 142KB]

(2012) Robert Munro & Christopher D.Manning: Accurate unsupervised joint named-entity extraction from unaligned parallel text. [ACL 2012] Proceedings of NEWS 2012 Named Entities Workshop, July 12, 2012, Jeju, Republic of Korea; pp.21-29. [PDF, 160KB]

(2012) Preslav Nakov & Hwee Tou Ng: Improving statistical machine translation for a resource-poor language using related resource-rich languages. Journal of Artificial Intelligence Research 44 (2012); pp.179-222. [PDF, 421KB]

(2012) Carla Parra Escartin: Design and compilation of a specialized Spanish-German parallel corpus.  LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.2199-2206. [PDF, 1572KB]

(2012) Matt Post, Chris Callison-Burch, & Miles Osborne: Constructing parallel corpora for six Indian languages via crowdsourcing. WMT 2012: 7th Workshop on Statistical Machine Translation. Proceedings of the workshop, June 7-8, 2012, Montréal, Canada; pp.401-409. [PDF, 388KB]

(2012) Bruno Pouliquen, Christophe Mazenc, Cecilia Elizalde, & Jose Garcia-Verdugo: Statistical machine translation prototype using UN parallel documents. EAMT 2012: Proceedings of the 16th Annual Conference of the European Association for Machine Translation, Trento, Italy, May 28-30 2012, ed. Mauro Cettolo, Marcello Federico, Lucia Specia, Andy Way; pp.12-19. [PDF, 251KB]

(2012) Felipe Sánchez-Martínez, Rafael C.Carrasco, Miguel A.Martínez-Prieto, & Joaquín Adiego: Generalized bywords for bitext compression and translation spotting. Journal of Artificial Intelligence Research 43; pp.389-418. [PDF, 418KB]

(2012) Rico Sennrich: Perplexity minimization for translation model domain adaptation in statistical machine translation.  [EACL 2012] Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, Avignon, France, April 23-27, 2012; pp. 539-549. [PDF, 160KB]

(2012) Ińaki San Vicente & Iker Manterola: PaCo2: a fully automated tool for gathering parallel corpora from the Web.  LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.1-6. [PDF, 415KB]

(2012) Martina Katalin Szabó, Veronika Vincze, & István Nagy T.: HunOr: a Hungarian-Russian parallel corpus.  LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.2453-2458. [PDF, 396KB]

(2012) George Tambouratzis, Marina Vassiliou, & Sokratis Sofianopoulos: PRESEMT: pattern recognition-based statistically enhanced MT. EACL Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra): Proceedings of the workshop, 23-24 April 2012, Avignon, France; pp.65-68. [PDF, 170KB]

(2012) George Tambouratzis, Michalis Troullinos, Sokratis Sofianopoulos, & Marina Vassiliou: Accurate phrase alignment in a bilingual corpus for EBMT systems.  [BUCC 2012] The 5th Workshop on Building and Using Comparable Corpora: “Language Resources for Machine Translation in Less-Resourced Languages and Domains”,  LREC 2012 Workshop, 26 May 2012, Istanbul, Turkey; pp.104-111. [PDF, 294KB]

(2012) Aleš Tamchyna, Petra Galuščáková, Amir Kamran, Miloš Stanojević, & Ondřej Bojar: Selecting data for English-to-Czech machine translation. WMT 2012: 7th Workshop on Statistical Machine Translation. Proceedings of the workshop, June 7-8, 2012, Montréal, Canada; pp.374-381. [PDF, 110KB]

(2012) Veronika Vincze: Light verb constructions in the SzegedParallelFX English-Hungarian parallel corpus. LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.2381-2388. [PDF, 349KB]

(2012) Pidong Wang, Preslav Nakov, & Hwee Tou Ng: Source language adaptation for resource-poor machine translation. EMNLP-CoNLL 2012: Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Proceedings of the conference, July 12-14, Jeju Island, Korea; pp.286-296. [PDF, 171KB]

(2012) Dominikus Wetzel & Francis Bond: Enriching parallel corpora for statistical machine translation with semantic negation rephrasing. SSST-6, Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation, Jeju, Republic of Korea, 12 July 2012; pp.20-29. [PDF, 163KB]

(2012) Qian Yu, Aurélien Max, & François Yvon: Aligning bilingual literary works: a pilot study. NAACL-HLT Workshop on Computational Linguistics for Literature, Montréal, Canada, June 8, 2012; pp.36-44. [PDF, 232KB]

(2012) Daniel Zeman: Data issues of the multilingual translation matrix. WMT 2012: 7th Workshop on Statistical Machine Translation. Proceedings of the workshop, June 7-8, 2012, Montréal, Canada; pp.395-400. [PDF, 225KB]

(2011) Takeshi Abekawa & Kyo Kageura: Using seed terms for crawling bilingual terminology lists on the Web. Translating and the Computer 33, 17-18 November 2011, London; 12pp. [PDF, 68KB]

(2011) Marilisa Amoia, Kerstin Kunz, & Ekaterina Lapshinova-Koltunski: Discontinuous constituents: a problematic case for parallel corpora annotation and querying. AEPC 2011: proceedings of the Second Workshop on Annotation and Exploitation of Parallel Corpora, associated with the 8th International Conference on Recent Advances in Natural Language Processing (RANLP 2011), 15th September 2011, Hissar, Bulgaria; pp.2-10. [PDF, 184KB]

(2011) Alexandra Antonova & Alexey Misyurev: Building a web-based parallel corpus and filtering out machine-translated text. ACL 2011: Proceedings of the Fourth Workshop on Building and Using Comparable Corpora, Portland, Oregon, USA, 24 June 2011; pp.136-144. [PDF, 217KB]

(2011) Victoria Arranz, Olivier Hamon, Karim Boudahmane, & Martine Garnier-Rizet: Protocol and lessons learnt from the production of parallel corpora for the evaluation of speech translation systems. IWSLT 2011: Proceedings of the International Workshop on Spoken Language Translation, San Francisco, December 8-9, 2011, ed. Marcello Federico, Mei-Yuh Hwang, Margit Rödder, Sebastian Stüker; pp.129-135. [PDF, 115KB]

(2011) Elizabeth Baran & Nianwen Xue: Singular or plural? Exploiting parallel corpora for Chinese number prediction. MT Summit XIII: the Thirteenth Machine Translation Summit [organized by the] Asia-Pacific Association for Machine Translation (AAMT), 19-23 September 2011, Xiamen, China; pp.207-214. [PDF, 187KB]

(2011) Luciano Barbosa, Srinivas Bangalore, & Vivek Kumar Sridhar Rangarajan: Crawling back and forth: using back and out links to locate bilingual sites. [IJCNLP 2011] Proceedings of the 5th International Joint Conference on Natural Language Processing, Chiang Mai, Thailand, November 8-13, 2011; pp.429-437. [PDF, 192KB]

(2011) Caroline Barričre & Pierre Isabelle: Searching parallel corpora for contextually equivalent terms. [EAMT 2011]: proceedings of the 15th conference of the European Association for Machine Translation, 30-31 May 2011, Leuven, Belgium; eds. Mikel L.Forcada, Heidi Depraetere, Vincent Vandeghinste; pp.105-112. [PDF, 149KB]

(2011) Shane Bergsma, David Yarowsky, & Kenneth Church: Using large monolingual and bilingual corpora to improve coordination disambiguation. ACL-HLT 2011: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, Portland, Oregon, June 19-24, 2011; pp.1346-1355. [PDF, 157KB]

(2011) Wenliang Chen, Jun’ichi Kazama, Min Zhang, Yoshimasa Tsuruoka, Yujie Zhang, Yiou Wang, Kentaro Torisawa, & Haizhou Li: SMT helps bitext dependency parsing. [EMNLP 2011] Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, Edinburgh, Scotland, UK, July 27-31, 2011; pp.73-83. [PDF, 583KB]

(2011) Oliver Čulo, Silvia Hansen-Schirra, Karin Maksymski, & Stella Neumann: Empty links and crossing lines: querying multi-layer annotation and alignment in parallel corpora. Translation: Computation, Corpora,  Cognition 1 (1), December 2011; pp.75-104. [PDF, 923KB]

(2011) Guy De Pauw, Peter Waiganjo Wagacha, & Gilles-Maurice de Schryver: Towards English-Swahili machine translation. Machine Translation and Morphologically- rich Languages: Research Workshop of the Israel Science Foundation, University of Haifa, Israel, 23-27 January, 2011; 2pp. [PDF, 77KB]

(2011) Ali El-Kahky, Kareem Darwish, Ahmed Saad Aldein, Mohamed Abd El-Wahab, Ahmed Hefny, & Waleed Ammar: Improved transliteration mining using graph reinforcement. [EMNLP 2011] Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, Edinburgh, Scotland, UK, July 27-31, 2011; pp.1384-1393. [PDF, 939KB]

(2011) Ruiji Fu, Bing Qin, & Ting Liu: Generating Chinese named entity data from a parallel corpus. [IJCNLP 2011] Proceedings of the 5th International Joint Conference on Natural Language Processing, Chiang Mai, Thailand, November 8-13, 2011; pp.264-272. [PDF, 450KB]

(2011) Souhir Gahbiche-Braham, Hélčne Bonneau-Maynard, & François Yvon: Two ways to use a noisy parallel news corpus for improving statistical machine translation. ACL 2011: Proceedings of the Fourth Workshop on Building and Using Comparable Corpora, Portland, Oregon, USA, 24 June 2011; pp.44-51. [PDF, 176KB]

(2011) Juri Ganitkevitch, Chris Callison-Burch, Courtney Napoles, & Benjamin Van Durme: Learning sentential paraphrases from bilingual parallel corpora for text-to-text generation. [EMNLP 2011] Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, Edinburgh, Scotland, UK, July 27-31, 2011; pp.1168-1179. [PDF, 311KB]

(2011) Qin Gao & Stephan Vogel: Corpus expansion for statistical machine translation with semantic role label substitution rules. ACL-HLT 2011: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Short papers, Portland, Oregon, June 19-24, 2011; pp.294-298. [PDF, 151KB]

(2011) Petter Haugereid & Francis Bond: Extracting transfer rules for multiword expressions from parallel corpora. Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World (MWE 2011), Portland, Oregon, USA, 23 June 2011; pp.92-100. [PDF, 228KB]

(2011) Carlos A.Henríquez Q., José B.Marińo, & Rafael E.Banchs: Deriving translation units using small additional corpora. [EAMT 2011]: proceedings of the 15th conference of the European Association for Machine Translation, 30-31 May 2011, Leuven, Belgium; eds. Mikel L.Forcada, Heidi Depraetere, Vincent Vandeghinste; pp.121-128. [PDF, 438KB]

(2011) Masamichi Ideue, Kazahide Yamamoto, Masao Utiyama, & Eiichiro Sumita: A comparison of unsupervised bilingual term extraction methods using phrase tables. MT Summit XIII: the Thirteenth Machine Translation Summit [organized by the] Asia-Pacific Association for Machine Translation (AAMT), 19-23 September 2011, Xiamen, China; pp.346-351. [PDF, 160KB]

(2011) Kriste Krstovski & David A.Smith: A minimally supervised approach for detecting and ranking document translation pairs.  [WMT 2011] Proceedings of the 6th Workshop on Statistical Machine Translation, Edinburgh, Scotland, UK, July 30-31, 2011; pp.207-216. [PDF, 389KB]

(2011) Sergey Kulikov: What is web-based machine translation up to? Tralogy, Paris, 3-4 March 2011; 11pp. [PDF, 131KB]

(2011) Emeline Lecuit, Denis Maurel, & Duško Vitas: A tagged and aligned corpus for the study of proper names in translation. AEPC 2011: proceedings of the Second Workshop on Annotation and Exploitation of Parallel Corpora, associated with the 8th International Conference on Recent Advances in Natural Language Processing (RANLP 2011), 15th September 2011, Hissar, Bulgaria; pp.11-18. [PDF, 244KB]

(2011) Els Lefever, Véronique Hoste, & Martine De Cock: ParaSense or how to use parallel corpora for word sense disambiguation. ACL-HLT 2011: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Short papers, Portland, Oregon, June 19-24, 2011; pp.317-322. [PDF, 119KB]

(2011) Feifan Liu, Fei Liu, & Yang Liu: Learning from Chinese-English parallel data for Chinese tense prediction. [IJCNLP 2011] Proceedings of the 5th International Joint Conference on Natural Language Processing, Chiang Mai, Thailand, November 8-13, 2011; pp.1116-1124. [PDF, 556KB]

(2011) Yashar Mehdad, Matteo Negri, & Marcello Federico: Using bilingual parallel corpora for cross-lingual textual entailment. ACL-HLT 2011: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, Portland, Oregon, June 19-24, 2011; pp.1336-1345. [PDF, 236KB]

(2011) Sara Morrissey: Body at work: using corpora in sign language machine translation. International Workshop on Sign Language Translation and Avatar Technology (SLTAT), 10-11 January 2011, Federal Ministry of Labour and Social Affairs, Berlin, Germany; 20 slides [PDF of PPT, 2584KB]

(2011) Preslav Nakov: Reusing parallel corpora between related languages [abstract] AEPC 2011: proceedings of the Second Workshop on Annotation and Exploitation of Parallel Corpora, associated with the 8th International Conference on Recent Advances in Natural Language Processing (RANLP 2011), 15th September 2011, Hissar, Bulgaria; 1p. [PDF, 184KB]

(2011) Alexandre Patry & Philippe Langlais: Identifying parallel documents from a large bilingual collection of texts: application to parallel article extraction in Wikipedia. ACL 2011: Proceedings of the Fourth Workshop on Building and Using Comparable Corpora, Portland, Oregon, USA, 24 June 2011; pp.87-95. [PDF, 147KB]

(2011) Marion Potet, Raphaël Rubino, Benjamin Lecouteux, Stéphane Huet, Hervé Blanchon, Laurent Besacier, & Fabrice Lefčvre: The LIGA (LIG/LIA) machine translation system for WMT 2011. [WMT 2011] Proceedings of the 6th Workshop on Statistical Machine Translation, Edinburgh, Scotland, UK, July 30-31, 2011; pp.440-446. [PDF, 92KB]

(2011) Spencer Rarrick, Chris Quirk, & Will Lewis: MT detection in web-scraped parallel corpora. MT Summit XIII: the Thirteenth Machine Translation Summit [organized by the] Asia-Pacific Association for Machine Translation (AAMT), 19-23 September 2011, Xiamen, China; pp.422-429. [PDF, 323KB]

(2011) Mohammed Rushdi-Saleh, M.Teresa Martín-Valdivia, L.Alfonso Ureńa-López, & José M.Perea-Ortega: Bilingual experiments with an Arabic-English corpus for opinion mining.  [RANLP 2011] Proceedings of Recent Advances in Natural Language Processing, Hissar, Bulgaria, 12-14 September 2011; pp.740-745. [PDF, 390KB]

(2011) Markus Saers & Dekai Wu: Principled induction of phrasal bilexica. [EAMT 2011]: proceedings of the 15th conference of the European Association for Machine Translation, 30-31 May 2011, Leuven, Belgium; eds. Mikel L.Forcada, Heidi Depraetere, Vincent Vandeghinste; pp.313-320. [PDF, 373KB]

(2011) Hassan Sajjad, Nadir Durrani, Helmut Schmid, & Alexander Fraser: Comparing two techniques for learning transliteration models using a parallel corpus. [IJCNLP 2011] Proceedings of the 5th International Joint Conference on Natural Language Processing, Chiang Mai, Thailand, November 8-13, 2011; pp.129-137. [PDF, 605KB]

(2011) Maria Stambolieva: Parallel corpora in aspectual studies of non-aspect languages. AEPC 2011: proceedings of the Second Workshop on Annotation and Exploitation of Parallel Corpora, associated with the 8th International Conference on Recent Advances in Natural Language Processing (RANLP 2011), 15th September 2011, Hissar, Bulgaria; pp.31-42. [PDF, 233KB]

(2011) Kaveh Taghipour, Shahram Khadivi, & Jia Xu: Parallel corpus refinement as an outlier detection algorithm. MT Summit XIII: the Thirteenth Machine Translation Summit [organized by the] Asia-Pacific Association for Machine Translation (AAMT), 19-23 September 2011, Xiamen, China; pp.414-421. [PDF, 213KB]

(2011) Mara Tsoumari & Georgios Petasis: A new annotation tool for aligned bilingual corpora. AEPC 2011: proceedings of the Second Workshop on Annotation and Exploitation of Parallel Corpora, associated with the 8th International Conference on Recent Advances in Natural Language Processing (RANLP 2011), 15th September 2011, Hissar, Bulgaria; pp.43-52. [PDF, 184KB]

(2011) Cristina Vertan & Monica Gavrila: Using manual and parallel aligned corpora for machine translation services within an on-line content management system. AEPC 2011: proceedings of the Second Workshop on Annotation and Exploitation of Parallel Corpora, associated with the 8th International Conference on Recent Advances in Natural Language Processing (RANLP 2011), 15th September 2011, Hissar, Bulgaria; pp.53-58. [PDF, 361KB]

(2011) Špela Vintar & Darja Fišer: Enriching Slovene WordNet with domain-specific terms. Translation: Computation, Corpora,  Cognition 1 (1), December 2011; pp.29-44. [PDF, 631KB]

(2011) Jia Xu & Weiwei Sun: Generating virtual parallel corpus: a compatibility centric method. MT Summit XIII: the Thirteenth Machine Translation Summit [organized by the] Asia-Pacific Association for Machine Translation (AAMT), 19-23 September 2011, Xiamen, China; pp.406-413. [PDF, 859KB]

(2011) Cesare Zanca: Developing translation strategies and cultural awareness using corpora and the  web. Tralogy, Paris, 3-4 March 2011; 14pp. [PDF, 160KB]

(2010) Takeshi Abekawa, Masao Utiyama, Eiichiro Sumita, & Kyo Kageura: Community-based construction of draft and final translation corpus through a translation hosting site Minna no Hon’yaku (MNH).  LREC 2010: proceedings of the seventh international conference on Language Resources and Evaluation, 17-23 May 2010, Valletta, Malta; pp.3662-3669. [PDF, 1695KB]

(2010) Lars Ahrenberg: Alignment-based profiling of Europarl data in an English-Swedish parallel corpus. LREC 2010: proceedings of the seventh international conference on Language Resources and Evaluation, 17-23 May 2010, Valletta, Malta; pp.3398-3404. [PDF, 349KB]

(2010) José Joăo Almeida & Alberto Simőes: Automatic parallel corpora and bilingual terminology extraction from parallel websites. [LREC 2010] Proceedings of the 3rd Workshop on Building and Using Comparable Corpora, Malta, 22 May 2010; pp.50-55. [PDF, 257KB]

(2010) Vamshi Ambati, Stephen Vogel, & Jaime Carbonell: Active learning and crowd-sourcing for machine translation. LREC 2010: proceedings of the seventh international conference on Language Resources and Evaluation, 17-23 May 2010, Valletta, Malta; pp.2169-2174. [PDF, 436KB]

(2010) Vamshi Ambati & Stephan Vogel: Can crowds build parallel corpora for machine translation systems? Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, Los Angeles, CA, June 2010; pp.62-65. [PDF, 89KB]

(2010) Marianna Apidianaki & Yifan He: An algorithm for cross-lingual sense-clustering tested in a MT evaluation setting. Proceedings of the 7th International Workshop on Spoken Language Translation, 2-3 December 2010, Paris, France; pp.219-226. [PDF, 474KB]

(2010) Ondřej Bojar, Adam Liška, & Zdeněk Žabokrtský: Evaluating utility of data sources in a large parallel Czech-English corpus CzEng 0.9. LREC 2010: proceedings of the  seventh international conference on Language Resources and Evaluation, 17-23 May 2010, Valletta, Malta; pp.447-452. [PDF, 359KB]

(2010) Ondřej Bojar, Pavel Straňák, & Daniel Zeman: Data issues in English-to-Hindi machine translation. LREC 2010: proceedings of the  seventh international conference on Language Resources and Evaluation, 17-23 May 2010, Valletta, Malta; pp.1771-1777. [PDF, 557KB]

(2010) Fabienne Braune & Alexander Fraser: Improved unsupervised sentence alignment for symmetrical and asymmetrical parallel corpora. Coling 2010: 23rd International Conference on Computational Linguistics, 23-27 August 2010, Beijing International Convention Center, Beijing, China, Posters volume; pp.81-89. [PDF, 228KB]

(2010) David Burkett, Slav Petrov, John Blitzer, & Dan Klein: Learning better monolingual models with unannotated bilingual text. CoNLL-2010: Fourteenth Conference on Computational Natural Language Learning, Proceedings of the conference, 15-16 July 2010, Uppsala University, Uppsala, Sweden; pp.46-54. [PDF, 431KB]

(2010) Chen Yuncong & Pascale Fung: Unsupervised synthesis of multilingual Wikipedia articles. Coling 2010: 23rd International Conference on Computational Linguistics. Proceedings of the conference, 23-27 August 2010, Beijing International Convention Center, Beijing, China; pp.197-205. [PDF, 1040KB]

(2010) Hercules Dalianis, Hao-chun Xing, & Xin Zhang: Creating a reusable English-Chinese parallel corpus for bilingual dictionary construction. LREC 2010: proceedings of the  seventh international conference on Language Resources and Evaluation, 17-23 May 2010, Valletta, Malta; pp.1700-1705. [PDF, 409KB]

(2010) Yanhui Feng, Yu Hong, Zhenxiang Yan, Jianmin Yao, & Qiaoming Zhu: A novel method for bilingual web page acquisition from search engine web records. Coling 2010: 23rd International Conference on Computational Linguistics, 23-27 August 2010, Beijing International Convention Center, Beijing, China, Posters volume; pp.294-302. [PDF, 184KB]

(2010) Mark Fishel & Heiki-Jaan Kaalep: CorporAl: a method and tool for handling overlapping parallel corpora. Fifth Machine Translation Marathon, 13-18 September, University of Le Mans; Prague Bulletin of Mathematical Linguistics, no.94, September 2010; pp.67-76. [PDF, 149KB]; presentation [PDF, 1037KB]

(2010) J.González-Rubio, J.Civera, A.Juan, & F.Casacuberta: Saturnalia: a Latin-Catalan parallel corpus for statistical MT. LREC 2010: proceedings of the  seventh international conference on Language Resources and Evaluation, 17-23 May 2010, Valletta, Malta; pp.3405-3408. [PDF, 261KB]

(2010) Philipp Koehn & Jean Senellart: Fast approximate string matching with suffix arrays and A* parsing. AMTA 2010: the Ninth conference of the Association for Machine Translation in the Americas, Denver, Colorado, October 31 – November 4, 2010; 9pp. [PDF, 178KB]

(2010) Audrey Laroche & Philippe Langlais: Revisiting context-based projection methods for term-translation spotting in comparable corpora. Coling 2010: 23rd International Conference on Computational Linguistics. Proceedings of the conference, 23-27 August 2010, Beijing International Convention Center, Beijing, China; pp.617-625. [PDF, 224KB]

(2010) Sara Morrissey, Harold Somers, Robert Smith, Shane Gilchrist & Sandipan Dandapat: Building a sign language corpus for use in machine translation. [LREC 2010] 4th Workshop on the Representation and Processing of Sign Languages: Corpora and Sign Language  Technologies, Malta, May 2010; pp.172-177. [PDF, 588KB]

(2010) Smruthi Mukund, Debanjan Ghosh, & Rohini K.Srihari: Using cross-lingual projections to generate semantic role labeled corpus for Urdu – a resource poor language. Coling 2010: 23rd International Conference on Computational Linguistics. Proceedings of the conference, 23-27 August 2010, Beijing International Convention Center, Beijing, China; pp.797-805. [PDF, 300KB]

(2010) Masaki Murata, Tomohiro Ohno, Shigeki Matsubara, & Yasuyoshi Inagaki: Construction of chunk-aligned bilingual lecture corpus for simultaneous machine translation. LREC 2010: proceedings of the  seventh international conference on Language Resources and Evaluation, 17-23 May 2010, Valletta, Malta; pp.1765-1770. [PDF, 398KB]

(2010) Matteo Negri & Yashar Mehdad: Creating bi-lingual entailment corpus through translations with Mechanical Turk: $100 for a 10-day rush. Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, Los Angeles, CA, June 2010; pp.212-216. [PDF, 101KB]

 (2010) Mike O’Malley: The challenges of distributed parallel corpora. AMTA 2010: the Ninth conference of the Association for Machine Translation in the Americas, Denver, Colorado, October 31 – November 4, 2010; abstract

(2010) Paula Paiva: Corpus representativeness in the selection of medical terms to be used in translation memory tools [abstract]. UCCTS 2010: Using Corpora in Contrastive and Translation Studies, Edge Hill University, UK, 27-29 July 2010; p.15. [PDF, 91KB]

(2010) Jocelyn Phillips, Carol Van Ess-Dykema, Timothy Allison & Laurie Gerber: Parallel corpus development at NVTC. AMTA 2010: the Ninth conference of the Association for Machine Translation in the Americas, Denver, Colorado, October 31 – November 4, 2010; 7pp. [PDF, 173KB]; abstract

(2010) John C.Platt, Kristina Toutanova, & Wen-tau Yih: Translingual document representations from discriminative projections. [EMNLP 2010] Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, MIT, Massachusetts, USA, 9-11 October 2010; pp.251-261. [PDF, 319KB]

(2010) Reinhard Rapp & Michael Zock: Utilizing citations of foreign words in corpus-based dictionary generation. [Coling 2010] Proceedings  of the Second Workshop on NLP Challenges in the Information Explosion Era, Beijing, China, 28 August 2010; pp.50-59. [PDF, 188KB]

(2010) Gudrun Rawoens: Multilingual corpora in cross-lingusitic research: focus on the compilation of a Dutch-Swedish parallel corpus. JADT 2010: 10th International Conference on Statistical Analysis of Textual Data, 9-11 juin 2010, Rome, Italie; pp.1287-1294. [PDF, 417KB]

(2010) Gábor Recski, András Rung, Attila Zséder, & András Kornai: NP alignment in bilingual corpora. LREC 2010: proceedings of the  seventh international conference on Language Resources and Evaluation, 17-23 May 2010, Valletta, Malta; pp.3379-3382. [PDF, 305KB]

(2010) Yulia Tsvetkov & Shuly Wintner: Automatic acquisition of parallel corpora from websites with dynamic content.  LREC 2010: proceedings of the  seventh international conference on Language Resources and Evaluation, 17-23 May 2010, Valletta, Malta; pp.3389-3392. [PDF, 447KB]

(2010) Yulia Tsvetkov & Shuly Wintner: Extraction of multi-word expressions from small parallel corpora. Coling 2010: 23rd International Conference on Computational Linguistics, 23-27 August 2010, Beijing International Convention Center, Beijing, China, Posters volume; pp.1256-1264. [PDF, 257KB]

(2010) Jakob Uszkoreit, Jay M.Ponte, Ashok C.Popat, & Moshe Dubiner: Large scale parallel document mining for machine translation. Coling 2010: 23rd International Conference on Computational Linguistics. Proceedings of the conference, 23-27 August 2010, Beijing International Convention Center, Beijing, China; pp.1101-1109. [PDF, 241KB]

(2010) Tom Vanallemeersch: Belgisch Staatsblad corpus: retrieving French-Dutch sentences from official documents. LREC 2010: proceedings of the  seventh international conference on Language Resources and Evaluation, 17-23 May 2010, Valletta, Malta; pp.3413-3416. [PDF, 273KB]

(2010) Sina Zarrieß, Aoife Cahill, Jonas Kuhn, & Christian Rohrer: Cross-lingual induction of deep broad-coverage syntax: a case study on German participles. Coling 2010: 23rd International Conference on Computational Linguistics, 23-27 August 2010, Beijing International Convention Center, Beijing, China, Posters volume; pp.1426-1434. [PDF, 106KB]

Bi-text see Bilingual corpora

Cleaning and filtering

(2014) Michel Simard: Clean data for training statistical MT: the case of MT contamination. AMTA 2014: proceedings of the eleventh conference of the Association for Machine Translation in the Americas, Vancouver, BC, October 22-26; pp.69-82. [PDF, 533KB]

(2014) Raivis Skadiņš, Jörg Tiedemann, Roberts Rozis & Daiga Deksne: Billions of parallel words for free: building and using the EU Bookshop corpus.  LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.1850-1855. [PDF, 521KB]

(2013) Alexey Borisov,Jacob Dlougach & Irina Galinskaya: Yandex School of Data Analysis machine translation systems for WMT13. WMT 2013: 8th Workshop on Statistical Machine Translation, Proceedings of the Workshop, August 8-9, 2013, Sofia, Bulgaria; pp.99-103. [PDF, 125KB]

(2013) Lei Cui, Dongdong Zhang, Shujie Liu, Mu Li, & Ming Zhou: Bilingual data cleaning for SMT using graph-based random walk.  ACL-2013: Proceedings of the 51st Meeting of the Association for Computational Linguistics, Short papers, Sofia, Bulgaria, August 4-9 2013; pp.340-345. [PDF, 259KB]

(2013) Lluís Formiga, Marta R. Costa-jussŕ, José B. Marińo, José A. R. Fonollosa, Alberto Barrón-Cedeńo & Lluis Marquez: The TALP-UPC phrase-based translation systems for WMT13: system combination with morphology generation, domain adaptation and corpus filtering. WMT 2013: 8th Workshop on Statistical Machine Translation, Proceedings of the Workshop, August 8-9, 2013, Sofia, Bulgaria; pp.134-140. [PDF, 159KB]

(2013) Manuel Herranz, Alex Helle, Elia Yuste, Ruslan Mitkov, & Lucia Specia: Pangeanic in the EXPERT project: EXPloiting Emprical approaches to Translation. Proceedings of the XIV Machine Translation Summit, Nice, September 2-6, 2013; ed. K.Sima’an, M.L.Forcada, D.Grasmick, H.Depraetere, A.Way; p.419. [PDF, 432KB]

(2013) William Lewis & Sauleh Eetemadi: Dramatically reducing training data size through vocabulary saturation.  WMT 2013: 8th Workshop on Statistical Machine Translation, Proceedings of the Workshop, August 8-9, 2013, Sofia, Bulgaria; pp.281-291. [PDF, 3943KB]

(2013) Sara Stymne, Christian Hardmeier, Jörg Tiedemann & Joakim Nivre: Tunable distortion limits and corpus cleaning for SMT. WMT 2013: 8th Workshop on Statistical Machine Translation, Proceedings of the Workshop, August 8-9, 2013, Sofia, Bulgaria; pp.225-231. [PDF, 146KB]

(2013) Samira Tofighi Zahabi, Somayeh Bakhshaei, & Shahram Khadivi: Using context vectors in improving a machine translation system with bridge language.  ACL-2013: Proceedings of the 51st Meeting of the Association for Computational Linguistics, Short papers, Sofia, Bulgaria, August 4-9 2013; pp.318-322. [PDF, 193KB]

(2012) Colin Cherry: Decoding. Machine Translation Marathon 2012 September 3-8, Edinburgh, UK; 83 slides [PDF of PPT, 413KB]

(2012) Jie Jiang, Andy Way, & Rejwanul Haque: Translating user-generated content in the social networking space. AMTA-2012: the Tenth Biennial Conference of the Association for Machine Translation in the Americas. Proceedings, San Diego, CA, October 28 – November 1, 2012; 9pp. [PDF, 130KB]

(2012) J.Howard Johnson: Conditional significance pruning: discarding more of huge phrase tables. AMTA-2012: the Tenth Biennial Conference of the Association for Machine Translation in the Americas. Proceedings, San Diego, CA, October 28 – November 1, 2012; 10pp. [PDF, 241KB]

(2012) Saab Mansour & Hermann Ney: A simple and effective weighted phrase extraction for machine translation adaptation. IWSLT-2012: 9th International Workshop on Spoken Language Translation, Hong Kong, December 6th-7th, 2012; pp.193-200. [PDF, 641KB]

(2012) Juan Pino, Aurelien Waite, & William Byrne: Simple and efficient model filtering in statistical machine translation. Prague Bulletin of Mathematical Linguistics 98, October 2012; pp.5-24. [PDF, 172KB]

(2012) Richard Zens, Daisy Stanton, & Peng Xu: A systematic comparison of phrase table pruning techniques. EMNLP-CoNLL 2012: Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Proceedings of the conference, July 12-14, Jeju Island, Korea; pp.972-983. [PDF, 203KB]

(2011) Alexandra Antonova & Alexey Misyurev: Building a web-based parallel corpus and filtering out machine-translated text. ACL 2011: Proceedings of the Fourth Workshop on Building and Using Comparable Corpora, Portland, Oregon, USA, 24 June 2011; pp.136-144. [PDF, 217KB]

(2011) Saab Mansour, Joern Wuebker, & Hermann Ney: Combining translation and language model scoring for domain-specific data filtering. IWSLT 2011: Proceedings of the International Workshop on Spoken Language Translation, San Francisco, December 8-9, 2011, ed. Marcello Federico, Mei-Yuh Hwang, Margit Rödder, Sebastian Stüker; pp.222-229. [PDF, 257KB]

(2011) Česlav Przywara & Ondřej Bojar: eppex: epochal phrase table extraction for statistical machine translation. Sixth Machine Translation Marathon, 5-10 September 2011, Trento; Prague Bulletin of Mathematical Linguistics, no.96, October 2011; pp.89-98. [PDF, 137KB]; presentation, 24 slides [PDF of PPT, 118KB]

(2011) Spencer Rarrick, Chris Quirk, & Will Lewis: MT detection in web-scraped parallel corpora. MT Summit XIII: the Thirteenth Machine Translation Summit [organized by the] Asia-Pacific Association for Machine Translation (AAMT), 19-23 September 2011, Xiamen, China; pp.422-429. [PDF, 323KB]

(2010) Hailong Cao & Eiichiro Sumita: Filtering syntactic constraints for statistical machine translation. ACL 2010: the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, July 11-16, 2010: Proceedings of the Conference Short Papers; pp.17-21. [PDF, 65KB]

 (2010) Jie Jiang, Andy Way, & Julie Carson-Berndsen: Lattice score based data cleaning for phrase-based statistical machine translation.  EAMT 2010: Proceedings of the 14th Annual conference of the European Association for Machine Translation, 27-28 May 2010, Saint-Raphaël, France. Proceedings ed.Viggo Hansen and François Yvon; 8pp. [PDF, 631KB]

(2011) Sara Stymne: Spell checking techniques for replacment of unknown words and data cleaning for Haitian Creole SMS translation. [WMT 2011] Proceedings of the 6th Workshop on Statistical Machine Translation, Edinburgh, Scotland, UK, July 30-31, 2011; pp.470-477. [PDF, 107KB]

Comparable corpora

(2015) Krzysztof Wolk & Krzysztof Marasek: PJAIT systems for the IWSLT 2015 evaluation campaign enhanced by comparable corpora. [IWSLT 2015] Proceedings of the International Workshop on Spoken Language Translation, December 3-4, 2015, Da Nang, Vietnam; pp.101-104. [PDF, 2.9MB]

(2015) Krzysztof Wolk & Krzysztof Marasek: Unsupervised comparable corpora preparation and exploration for bi-lingual translation equivalents. [IWSLT 2015] Proceedings of the International Workshop on Spoken Language Translation, December 3-4, 2015, Da Nang, Vietnam; pp.118-125. [PDF,5.3MB]

(2014) Ondřej Bojar, Vojtěch Diatka, Pavel Rychlý, Pavel Straňák, Vit Suchomel, Aleš Tamchyna, & Daniel Zeman: HindEnCorp – Hindi-English and Hindi-only corpus for machine translation.  LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.3550-3555. [PDF, 107KB]

(2014) Chenhui Chu, Toshiaki Nakazawa, & Sadao Kurohashi: Constructing a Chinese-Japanese parallel corpus from Wikipedia. LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harp`a Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.642-647. [PDF, 266KB]

(2014) Hernani Costa, Gloria Corpas Pastor, & Miriam Seghiri: iCompileCorpora: a web-based application to semi-automatically compile multilingual comparable corpora. Translating and the Computer 36: proceedings. Asling: International Society for Advancement in Language Technology, 27-28 November 2014; pp.51-55. [PDF, 119KB]

(2014) Sandipat Dandapat & Declan Groves: MTWatch: a tool for the analysis of noisy parallel data. LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.41-45. [PDF, 190KB]

(2014) Jennifer Drexler, Pushpendre Rastogi, Jacqueline Aguilar, Benjamin Van Durme, & Matt Post: A Wikipedia-based corpus for contextualized machine translation. LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.3593-3596. [PDF, 72KB]

(2014) Miquel Esplŕ-Gomis, Filip Klubička, Nikola Ljubešić, Sergio Ortiz-Rojas, Vassilis Papavassiliou, & Prokopis Prokopidis: Comparing two acquisition systems for automatically building an English-Croatian parallel corpus from multilingual websites.  LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.1252-1258. [PDF, 158KB]

(2014) Najeh Hajlaoui, David Kolovratnik, Jaakko Väyrynen, Ralf Steinberger, & Daniel Varga: DCEP – digital corpus of the European Parliament. LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.3164-3171. [PDF, 252KB]

(2014) Ann Irvine & Chris Callison-Burch: Using comparable corpora to adapt MT models to new domains. [WMT 2014] Proceedings of the Ninth Workshop on Statistical Machine Translation, Baltimore, Maryland USA, June 26-27, 2014; pp.437-444. [PDF, 274KB]

(2014) B.R.Laranjeira, V.P.Moreira, A.Villavicencio, C.Ramisch, & M.J.Finatto: Comparing the quality of focused crawlers and of the translation resources obtained from them. LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.3572-3578. [PDF, 803KB]

(2014) Wang Ling, Luís Marujo, Chris Dyer, Alan Black & Isabel Trancoso: Crowdsourcing high-quality parallel data extraction from Twitter. [WMT 2014] Proceedings of the Ninth Workshop on Statistical Machine Translation, Baltimore, Maryland USA, June 26-27, 2014; pp.426-436. [PDF, 459KB]

(2014) Thomas Mayer & Michael Cysouw: Creating a massively parallel Bible corpus. LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.3158-3163. [PDF, 575KB]

(2014) Mircea Petic & Daniela Gîfu: Transliteration and alignment of parallel texts from Cyrillic to Latin. LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.1819-1823. [PDF, 462KB]

(2014) Anita Rácz, István Nagy T., Veronika Vincze: 4FX: light verb constructions in a multilingual parallel corpus. LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.710-715. [PDF, 140KB]

 (2014) Jayendra Rakesh Yeka, Prasanth Kolachina, & Dipti Misra Sharma: Benchmarking of English-Hindi parallel corpora. LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.1812-1818. [PDF, 627KB]

 (2014) Lise Rebout & Philippe Langlais: An iterative approach for mining parallel sentences in a comparable corpus. LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.648-655. [PDF, 201KB]

(2014) Michael Rosner & Kurt Sultana: Automatic methods for the extension of a bilingual dictionary using comparable corpora. LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.3790-3797. [PDF, 195KB]

(2014) Raphael Rubino, Antonio Toral, Nikola Ljubešić, & Gema Ramírez-Sánchez: Quality estimation for synthetic parallel data generation. LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.1843-1849. [PDF, 248KB]

(2014) Raivis Skadiņš, Jörg Tiedemann, Roberts Rozis & Daiga Deksne: Billions of parallel words for free: building and using the EU Bookshop corpus.  LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.1850-1855. [PDF, 521KB]

(2014) Liang Tian, Derek F.Wong, Lidia S.Chao, Paula Quaresma, Francisco Oliveira, Yi Lu, Shuo Li, Yiming Wang, & Longyue Wang: UM-Corpu: a large English-Chinese parallel corpus for statistical machine translation. LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.1837-1842. [PDF, 605KB]

(2014) Dan Tufiş, Radu Ion, Ştefan Dumitrescu, & Dan Ştefănescu: Large SMT data-sets extracted from Wikipedia. LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.656-663. [PDF, 232KB]

(2014) Pavel Vondřička: Aligning parallel texts with InterText. LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.1875-1879. [PDF, 476KB]

(2014) Shikun Zhang, Wang Ling, & Chris Dyer: Dual subtitles as parallel corpora. LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.1869-1874. [PDF, 254KB]

(2013) Proceedings of the 6th Workshop on Building and Using Comparable Corpora, Sofia, Bulgaria, August 8, 2013

(2013) Haithem Afli, Loďc Barrault & Holger Schwenk: Multimodal comparable corpora as resources for extracting parallel data: parallel phrases extraction. International Joint Conference on Natural Language Processing, Nagoya, Japan, 14-18 October 2013; pp.286-292. [PDF, 1586KB]

(2013) Ahmet Aker, Monica Paramita, & Robert Gaizauskas: Extracting bilingual terminologies from comparable corpora.  ACL-2013: Proceedings of the 51st Meeting of the Association for Computational Linguistics, Sofia, Bulgaria, August 4-9 2013; pp.402-411. [PDF, 204KB]

(2013) Daniel Andrade, Masaaki Tsuchida, Takashi Onishi, & Kai Ishikawa: Synonym acquisition using bilingual comparable corpora. International Joint Conference on Natural Language Processing, Nagoya, Japan, 14-18 October 2013; pp.1077-1081. [PDF, 402KB]

(2013) Daniel Andrade, Masaaki Tsuchida, Takashi Onishi, & Kai Ishikawa: Translation acquisition using synonym sets. [NAACL-HLT 2013] The 2013 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 9-14 June 2013, Atlanta, Georgia; pp.655-660. [PDF, 500KB]

(2013) Dhouha Bouamor, Nasredine Semmar, & Pierre Zweigenbaum: Building specialized bilingual lexicons using word sense disambiguation. International Joint Conference on Natural Language Processing, Nagoya, Japan, 14-18 October 2013; pp.952-956. [PDF, 117KB]

(2013) Dhouha Bouamor, Nasredine Semmar, & Pierre Zweigenbaum: Context vector disambiguation for bilingual lexicon extraction from comparable corpora.  ACL-2013: Proceedings of the 51st Meeting of the Association for Computational Linguistics, Short papers, Sofia, Bulgaria, August 4-9 2013; pp.759-764. [PDF, 199KB]

(2013) Dhouha Bouamor, Nasredine Semmar, & Pierre Zweigenbaum: Towards a generic approach for bilingual lexicon extraction from comparable corpora. Proceedings of the XIV Machine Translation Summit, Nice, September 2-6, 2013; ed. K.Sima’an, M.L.Forcada, D.Grasmick, H.Depraetere, A.Way; pp. 143-150. [PDF, 531KB]

(2013) Chenhui Chu, Toshiaki Nakazawa, & Sadao Kurohashi: Accurate parallel fragment extraction from quasi-comparable corpora using alignment model and translation lexicon. International Joint Conference on Natural Language Processing, Nagoya, Japan, 14-18 October 2013; pp.1144-1150. [PDF, 415KB]

(2013) Béatrice Daille: TTC: terminology extraction, translation tools and comparable corpora. Proceedings of the XIV Machine Translation Summit, Nice, September 2-6, 2013; ed. K.Sima’an, M.L.Forcada, D.Grasmick, H.Depraetere, A.Way; p.449. [PDF, 272KB]

(2013) Rima Harastani, Béatrice Daille & Emmanuel Morin: Ranking translation candidates acquired from comparable corpora. International Joint Conference on Natural Language Processing, Nagoya, Japan, 14-18 October 2013; pp.401-409. [PDF, 276KB]

(2013) Amir Hazem & Emmanuel Morin: Word co-occurrence counts prediction for bilingual terminology extraction from comparable corpora. International Joint Conference on Natural Language Processing, Nagoya, Japan, 14-18 October 2013; pp.1392-1400. [PDF, 162KB]

(2013) Felix Hieber, Laura Jehl, & Stefan Riezler: Task alternation in parallel sentence retrieval for Twitter translation. ACL-2013: Proceedings of the 51st Meeting of the Association for Computational Linguistics, Short papers, Sofia, Bulgaria, August 4-9 2013; pp.323-327. [PDF, 208KB]

(2013) Ann Irvine & Chris Callison-Burch: Combining bilingual and comparable corpora for low resource machine translation. WMT 2013: 8th Workshop on Statistical Machine Translation, Proceedings of the Workshop, August 8-9, 2013, Sofia, Bulgaria; pp.262-270. [PDF, 225KB]

(2013) Ann Irvine, Chris Quirk, & Hal Daumé III: Monolingual marginal matching for translation model adaptation. [EMNLP 2013] Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, Washington, USA, 18-21 October 2013; pp.1077-1088. [PDF, 247KB]

(2013) Ann Irvine: Statistical machine translation in low resource settings. [NAACL-HLT 2013] Proceedings of the NAACL HLT 2013 Student Research Workshop, 13 June 2013, Atlanta, Georgia; pp.54-61. [PDF, 185KB]

(2013) Ekaterina Lapshinova-Koltunski: VARTRA: a comparable corpus for analysis of translation variation. Proceedings of the 6th Workshop on Building and Using Comparable Corpora, Sofia, Bulgaria, August 8, 2013; pp.77-86. [PDF, 336KB]

(2013) Taesung Lee & Seung-won Hwang: Bootstrapping entity translation on weakly comparable corpora.  ACL-2013: Proceedings of the 51st Meeting of the Association for Computational Linguistics, Sofia, Bulgaria, August 4-9 2013; pp.631-640. [PDF, 624KB]

(2013) Lian Tze Lim, Lay-Ki Soon, Tek Yong Lim, Enya Kong Tang, & Bali Ranaivo-Malançon: Context-dependent multilingual lexical lookup for under-resourced languages.  ACL-2013: Proceedings of the 51st Meeting of the Association for Computational Linguistics, Short papers, Sofia, Bulgaria, August 4-9 2013; pp.294-299. [PDF, 373KB]

(2013) Wang Ling, Guang Xiang, Chris Dyer, Alan Black, & Isabel Trancoso: Microblogs as parallel corpora.  ACL-2013: Proceedings of the 51st Meeting of the Association for Computational Linguistics, Sofia, Bulgaria, August 4-9 2013; pp.176-186. [PDF, 431KB]

(2013) Xiaodong Liu, Kevin Duh, & Yuji Matsumoto: Topic models + word alignment = a flexible framework for extracting bilingual dictionary from comparable corpus. Proceedings of the Seventeenth Conference on Computational Natural Language Learning, Sofia, Bulgaria, 8-9 August 2013; pp.212-221. [PDF, 487KB]

(2013) Oscar Mendoza Rivera, Ruslan Mitkov, & Gloria Corpas Pastor: A flexible framework for collocation retrieval and translation from parallel and comparable corpora. [MT Summit XIV] Proceedings of the Workshop on Multi-word Units in Machine Translation and Translation Technology, Nice, September 3, 2013; pp.18-25. [PDF, 356KB]

(2013) John Richardson, Toshiaki Nakazawa, & Sadao Kurohashi: Robust transliteration mining from comparable corpora with bilingual topic models. International Joint Conference on Natural Language Processing, Nagoya, Japan, 14-18 October 2013; pp.261-269. [PDF, 460KB]

(2013) Itsuki Toyota, Zi Long, Lijuan Dong, Takehito Utsuro, & Mikio Yamamoto: Compositional translation of technical terms by integrating patent families as a parallel corpus and a comparable corpus. [MT Summit XIV] Proceedings of the 5th Workshop on Patent Translation, Nice, September 2, 2013; pp.16-23. [PDF, 2058KB]

(2013) Dan Tufiş, Radu Ion, Ştefan Daniel Dumitrescu, & Dan Ştefănescu: Wikipedia as an SMT training corpus. Proceedings of Recent Advances in Natural  Language Processing, Hissar, Bulgaria, 7-13 September 2013; pp.702-709. [PDF, 220KB]

(2013) Zede Zhu, Miao Li, Lei Chen, & Zhenxin Yang: Building comparable corpora based on bilingual LDA model.  ACL-2013: Proceedings of the 51st Meeting of the Association for Computational Linguistics, Short papers, Sofia, Bulgaria, August 4-9 2013; pp.278-281. [PDF, 127KB]

(2012) [BUCC 2012] The 5th Workshop on Building and Using Comparable Corpora: “Language Resources for Machine Translation in Less-Resourced Languages and Domains”,  LREC 2012 Workshop, 26 May 2012, Istanbul, Turkey

(2012) Ahmet Aker, Evangelos Kanoulas, & Robert Gaizauskas: A light way to collect comparable corpora from the Web. LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.15-20. [PDF, 856KB]

(2012) Walid Aransa, Holger Schwenk, & Loic Barrault: Semi-supervised transliteration mining from parallel and comparable corpora. IWSLT-2012: 9th International Workshop on Spoken Language Translation, Hong Kong, December 6th-7th, 2012; pp. 185-192. [PDF, 650KB]

(2012) Emma Barker & Rob Gaizauskas: Assessing the comparability of news texts. LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.3996-4003. [PDF, 348KB]

(2012) Julien Bourdaillet & Philippe Langlais: Identifying infrequent translations by aligning non parallel sentences. AMTA-2012: the Tenth Biennial Conference of the Association for Machine Translation in the Americas. Proceedings, San Diego, CA, October 28 – November 1, 2012; 10pp. [PDF, 331KB]

(2012) Bruno Cartoni & Thomas Meyer: Extracting directional and comparable corpora from a multilingual corpus for translation studies.  LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.2132-2137. [PDF, 352KB]

(2012) Béatrice Daille: Building bilingual terminologies from comparable corpora: the TTC TermSuite.  [BUCC 2012] The 5th Workshop on Building and Using Comparable Corpora: “Language Resources for Machine Translation in Less-Resourced Languages and Domains”,  LREC 2012 Workshop, 26 May 2012, Istanbul, Turkey; pp.29-32. [PDF, 317KB]

(2012) Estelle Delpech, Béatrice Daille, Emmanuel Morin, & Claire Lemaire: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking. Proceedings of COLING 2012: Technical Papers, Mumbai, December 2012; pp.745-761. [PDF, 319KB]

(2012) Estelle Delpech, Béatrice Daille, Emmanuel Morin, & Claire Lemaire: Identification of fertile translations in medical comparable corpora: a morpho-compositional approach. AMTA-2012: the Tenth Biennial Conference of the Association for Machine Translation in the Americas. Proceedings, San Diego, CA, October 28 – November 1, 2012; 10pp. [PDF, 154KB]

(2012) Roger Granada, Lucelene Lopes, Carlos Ramisch, Cassia Trojahn, Renata Vieira, & Aline Villavicencio: A comparable corpus based on aligned multilingual ontologies. [ACL 2012] Proceedings of the First Workshop on Multilingual Modeling, Jeju, Republic of Korea, 8-14 July 2012; pp.25-31. [PDF, 108KB]

(2012) Amir Hazem & Emmanuel Morin: ICA for bilingual lexicon extraction from comparable corpora.  [BUCC 2012] The 5th Workshop on Building and Using Comparable Corpora: “Language Resources for Machine Translation in Less-Resourced Languages and Domains”,  LREC 2012 Workshop, 26 May 2012, Istanbul, Turkey; pp.126-133. [PDF, 474KB]

(2012) Iustina Ilisei, Diana Inkpen, Gloria Corpas, & Ruslan Mitkov: Romanian translational corpora: building comparable corpora for translation studies.  [BUCC 2012] The 5th Workshop on Building and Using Comparable Corpora: “Language Resources for Machine Translation in Less-Resourced Languages and Domains”,  LREC 2012 Workshop, 26 May 2012, Istanbul, Turkey; pp.56-61. [PDF, 357KB]

(2012) Radu Ion: PEXACC: a parallel sentence mining algorithm from comparable corpora. LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.2181-2188. [PDF, 852KB]

(2012) Elena Irimia: Experimenting with extracting lexical dictionaries from comparable corpora for English-Romanian language pair.  [BUCC 2012] The 5th Workshop on Building and Using Comparable Corpora: “Language Resources for Machine Translation in Less-Resourced Languages and Domains”,  LREC 2012 Workshop, 26 May 2012, Istanbul, Turkey; pp.49-55. [PDF, 738KB]

(2012) Hiroyuki Kaji, Takashi Tsunakawa, & Yoshihiro Komatsubara: Improving compositional translation with comparable corpora.  [BUCC 2012] The 5th Workshop on Building and Using Comparable Corpora: “Language Resources for Machine Translation in Less-Resourced Languages and Domains”,  LREC 2012 Workshop, 26 May 2012, Istanbul, Turkey; pp.134-142. [PDF, 449KB]

(2012) Mahdi Khademian, Kaveh Taghipour, Saab Mansour, & Shahram Khadivi: A holistic approach to bilingual sentence fragment extraction from comparable corpora.  LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.4073-4079. [PDF, 416KB]

(2012) Adam Kilgarriff & George Tambouratzis: The PRESEMT project. [BUCC 2012] The 5th Workshop on Building and Using Comparable Corpora: “Language Resources for Machine Translation in Less-Resourced Languages and Domains”,  LREC 2012 Workshop, 26 May 2012, Istanbul, Turkey; pp.27-28. [PDF, 355KB]

(2012) Aimée Lahaussois & Séverine Guillaume: A viewing and processing tool for the analysis of a comparable corpus of Kiranti mythology.  [BUCC 2012] The 5th Workshop on Building and Using Comparable Corpora: “Language Resources for Machine Translation in Less-Resourced Languages and Domains”,  LREC 2012 Workshop, 26 May 2012, Istanbul, Turkey; pp.33-41. [PDF, 778KB]

(2012) Chunyang Liu, Qi Liu, Yang Liu, & Maosong Sun: THUTR: a translation retrieval system. Proceedings of COLING 2012: Demonstration Papers, Mumbai, December 2012; pp. 321-328. [PDF, 306KB]

(2012) Nikola Ljubešić, Špela Vintar, & Darja Fišer: Multi-word term extraction from comparable corpora by combining contextual and constituent clues.  [BUCC 2012] The 5th Workshop on Building and Using Comparable Corpora: “Language Resources for Machine Translation in Less-Resourced Languages and Domains”,  LREC 2012 Workshop, 26 May 2012, Istanbul, Turkey; pp.143-147. [PDF, 376KB]

(2012) Philipp Petrenz & Bonnie Webber: Robust cross-lingual genre classification through comparable corpora. [BUCC 2012] The 5th Workshop on Building and Using Comparable Corpora: “Language Resources for Machine Translation in Less-Resourced Languages and Domains”,  LREC 2012 Workshop, 26 May 2012, Istanbul, Turkey; pp.1-9. [PDF, 538KB]

(2012) Mārcis Pinnis, Radu Ion, Dan Ştefănescu, Fangzhong Su, Inguna Skadiņa, Andrejs Vasiļjevs, &  Bogdan Babych: ACCURAT toolkit for multi-level alignment and information extraction from comparable corpora. [ACL 2012] Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju, Republic of Korea, 10 July 2012, System Demonstrations; pp.91-96. [PDF, 235KB]

(2012) Magdalena Plamada & Martin Volk: Towards a Wikipedia-extracted Alpine corpus.  [BUCC 2012] The 5th Workshop on Building and Using Comparable Corpora: “Language Resources for Machine Translation in Less-Resourced Languages and Domains”,  LREC 2012 Workshop, 26 May 2012, Istanbul, Turkey; pp.81-87. [PDF, 364KB]

(2012) Reinhard Rapp, Serge Sharoff, & Bogdan Babych: Identifying word translations from comparable documents without a seed lexicon.  LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.460-466. [PDF, 329KB]

(2012) Robert Remus & Mathias Bank: Textual characteristics of different-sized corpora.  [BUCC 2012] The 5th Workshop on Building and Using Comparable Corpora: “Language Resources for Machine Translation in Less-Resourced Languages and Domains”,  LREC 2012 Workshop, 26 May 2012, Istanbul, Turkey; pp.148-152. [PDF, 421KB]

(2012) Hervé Saint-Amand, Jason Smith, & Magdalena Plamada: Parallel corpus extraction from CommonCrawl. Machine Translation Marathon 2012 September 3-8, Edinburgh, UK; 10 slides [PDF of PPT, 70KB]

(2012) Rahma Sellami, Fatiha Sadat, & Lamia Hadrich Belguith: Exploiting Wikipedia as a knowledge base for the extraction of linguistic resources: application on Arabic-French comparable corpora and bilingual lexicons. AMTA-2012: Fourth workshop on computational approaches to Arabic script-based languages. Proceedings, San Diego, November 1, 2012; pp.72-79. [PDF, 889KB]

(2012) Serge Sharoff: Beyond translation memories: finding similar documents in comparable corpora. [Aslib 2012] Translating and the Computer 34, 29-30 November 2012, One Birdcage Walk, London, UK; 7pp. [PDF, 145KB], presentation: 47 slides [PDF, 849KB]

(2012) Inguna Skadiņa: Analysis and evaluation of comparable corpora for under-resourced areas of machine translation.  [BUCC 2012] The 5th Workshop on Building and Using Comparable Corpora: “Language Resources for Machine Translation in Less-Resourced Languages and Domains”,  LREC 2012 Workshop, 26 May 2012, Istanbul, Turkey; pp.17-19. [PDF, 370KB]

(2012) Inguna Skadiņa, Ahmet Aker, Nikos Mastropavlos, Fangzhong Su, Dan Tufis, Mateja Verlic, Andrejs Vasiļjevs, Bogdan Babych, Paul Clough, Robert Gaizauskas, Nikos Glaros, Monica Lestari Paramita, & Mārcis Pinnis: Collecting and using comparable corpora for statistical machine translation.  LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.438-445. [PDF, 608KB]

(2012) Sanja Štajner & Ruslan Mitkov: Using comparable corpora to track diachronic and synchronic changes in lexical density and lexical richness.  [BUCC 2012] The 5th Workshop on Building and Using Comparable Corpora: “Language Resources for Machine Translation in Less-Resourced Languages and Domains”,  LREC 2012 Workshop, 26 May 2012, Istanbul, Turkey; pp.88-97. [PDF, 383KB]

(2012) Dan Ştefănescu, Radu Ion, & Sabine Hunsicker: Hybrid parallel sentence mining from comparable corpora.  EAMT 2012: Proceedings of the 16th Annual Conference of the European Association for Machine Translation, Trento, Italy, May 28-30 2012, ed. Mauro Cettolo, Marcello Federico, Lucia Specia, Andy Way; pp.137-144. [PDF, 493KB]

(2012) Dan Ştefănescu: Mining for term translations in comparable corpora.  [BUCC 2012] The 5th Workshop on Building and Using Comparable Corpora: “Language Resources for Machine Translation in Less-Resourced Languages and Domains”,  LREC 2012 Workshop, 26 May 2012, Istanbul, Turkey; pp.98-103. [PDF, 616KB]

(2012) Fangzhong Su & Bogdan Babych: Development and application of a cross-language document comparability metric.  LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.3956-3962. [PDF, 582KB]

(2012) Fangzhong Su & Bogdan Babych: Measuring comparability of documents in non-parallel corpora for efficient extraction of (semi-)parallel translation equivalents. EACL Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra): Proceedings of the workshop, 23-24 April 2012, Avignon, France; pp.10-19. [PDF, 188KB]

(2012) Akihiro Tamura, Taro Watanabe, & Eiichiro Sumita: Bilingual lexicon extraction from comparable corpora using label propagation. EMNLP-CoNLL 2012: Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Proceedings of the conference, July 12-14, Jeju Island, Korea; pp.24-36. [PDF, 347KB]

(2012) Ivan Vulić & Marie-Francine Moens: Detecting highly confident word translations from comparable corpora without any prior knowledge. [EACL 2012] Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, Avignon, France, April 23-27, 2012; pp. 449-459. [PDF, 339KB]

(2012) Yunqing Xia, Guoyu Tang, Peng Jin, & Xia Yang: CLTC: a Chinese-English cross-lingual topic corpus.  LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.532-537. [PDF, 955KB]

(2012) Manuela Yapomo, Gloria Corpas, & Ruslan Mitkov: CLIR- and ontology-based approach for bilingual extraction of comparable documents.  [BUCC 2012] The 5th Workshop on Building and Using Comparable Corpora: “Language Resources for Machine Translation in Less-Resourced Languages and Domains”,  LREC 2012 Workshop, 26 May 2012, Istanbul, Turkey; pp.121-125. [PDF, 380KB]

(2012) ACCURAT: Analysis and evaluation of comparable corpora for under resourced areas of machine translation. [Project paper at] EAMT 2012: Proceedings of the 16th Annual Conference of the European Association for Machine Translation, Trento, Italy, May 28-30 2012, ed. Mauro Cettolo, Marcello Federico, Lucia Specia, Andy Way; p.205. [PDF, 72KB]

(2011) Vamshi Ambati, Sanjika Hewavitharana, Stephan Vogel, & Jaime Carbonell: Active learning with multiple annotations for comparable data classification task. ACL 2011: Proceedings of the Fourth Workshop on Building and Using Comparable Corpora, Portland, Oregon, USA, 24 June 2011; pp.69-77. [PDF, 201KB]

(2011) Anja Belz & Eric Kow: Unsupervised alignment of comparable data and text resources. ACL 2011: Proceedings of the Fourth Workshop on Building and Using Comparable Corpora, Portland, Oregon, USA, 24 June 2011; pp.102-109. [PDF, 143KB]

(2011) Abhijit Bhole, Goutham Tholpadi, & Raghavendra Udupa: Mining multi-word named entity equivalents from comparable corpora. [IJCNLP 2011] Proceedings of the 2011 Named Entities Workshop, Chiang Mai, Thailand, November 12, 2011; pp.65-72. [PDF, 820KB]

(2011) Bruno Cartoni, Sandrine Zufferey, Thomas Meyer, & Andrei Popescu-Belis: How comparable are parallel corpora? Measuring the distribution of general vocabulary and connectives. ACL 2011: Proceedings of the Fourth Workshop on Building and Using Comparable Corpora, Portland, Oregon, USA, 24 June 2011; pp.78-86. [PDF, 153KB]

(2011) Mauro Cettolo, Nicola Bertoldi, & Marcello Federico: Bootstrapping Arabic-Italian SMT through comparable texts and pivot translation. [EAMT 2011]: proceedings of the 15th conference of the European Association for Machine Translation, 30-31 May 2011, Leuven, Belgium; eds. Mikel L.Forcada, Heidi Depraetere, Vincent Vandeghinste; pp.249-256. [PDF, 354KB]; presentation, 12 slides [PDF]

(2011) Darja Fišer & Nikola Ljubešić: Bilingual lexicon extraction from comparable corpora for closely related languages.  [RANLP 2011] Proceedings of Recent Advances in Natural Language Processing, Hissar, Bulgaria, 12-14 September 2011; pp.125-131. [PDF, 95KB]

(2011) Darja Fišer, Nikola Ljubešić, Špela Vintar, & Senja Pollak: Building and using comparable corpora for domain-specific bilingual lexicon extraction. ACL 2011: Proceedings of the Fourth Workshop on Building and Using Comparable Corpora, Portland, Oregon, USA, 24 June 2011; pp.19-26. [PDF, 287KB]

(2011) Amir Hazem, Emmanuel Morin & Sebastian Peńa Saldarriaga: Bilingual lexicon extraction from comparable corpora as metasearch. ACL 2011: Proceedings of the Fourth Workshop on Building and Using Comparable Corpora, Portland, Oregon, USA, 24 June 2011; pp.35-43. [PDF, 159KB]

(2011) Sanjika Hewavitharana & Stephan Vogel: Extracting parallel phrases from comparable data. ACL 2011: Proceedings of the Fourth Workshop on Building and Using Comparable Corpora, Portland, Oregon, USA, 24 June 2011; pp.61-68. [PDF, 217KB]

(2011) Miguel A.Jiménez-Crespo: To adapt or not to adapt in web localization: a contrastive genre-based study of original and localised legal sections in corporate websites. Journal of Specialised Translation 15 (January 2011); pp.2-27. [PDF, 237KB]

(2011) Kevin Knight: Putting a value on comparable data [abstract]. ACL 2011: Proceedings of the Fourth Workshop on Building and Using Comparable Corpora, Portland, Oregon, USA, 24 June 2011; p.1. [PDF, 92KB]

(2011) Bo Li, Eric Gaussier, & Akiko Aizawa: Clustering comparable corpora for bilingual lexicon extraction. ACL-HLT 2011: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Short papers, Portland, Oregon, June 19-24, 2011; pp.473-478. [PDF, 216KB]

(2011) Emmanuel Morin & Emmanuel Prochasson: Bilingual lexicon extraction from comparable corpora enhanced with parallel corpora. ACL 2011: Proceedings of the Fourth Workshop on Building and Using Comparable Corpora, Portland, Oregon, USA, 24 June 2011; pp.27-34. [PDF, 125KB]

(2011) Emmanuel Prochasson & Pascale Fung: Rare word translation extraction from aligned comparable documents. ACL-HLT 2011: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, Portland, Oregon, June 19-24, 2011; pp.1327-1335. [PDF, 157KB]

 (2011) Matthew Snover, Xiang Li, Wen-Pin Lin, Zheng Chen, Suzanne Tamang, Mingmin Ge, Adam Lee, Qi Li, Hao Li, Sam Anzaroot, & Heng Ji: Cross-lingual slot filling from comparable corpora. ACL 2011: Proceedings of the Fourth Workshop on Building and Using Comparable Corpora, Portland, Oregon, USA, 24 June 2011; pp.110-119. [PDF, 271KB]

(2011) Ivan Vulić, Wim De Smet, & Marie-Francine Moens: Identifying word translations from comparable corpora using latent topic models. ACL-HLT 2011: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Short papers, Portland, Oregon, June 19-24, 2011; pp.479-484. [PDF, 153KB]

(2010) Mauro Cettolo, Marcello Federico, & Nicola Bertoldi: Mining parallel fragments from comparable texts. Proceedings of the 7th International Workshop on Spoken Language Translation, 2-3 December 2010, Paris, France; pp.227-234. [PDF, 393KB]

(2010) Diptesh Chatterjee, Sudeshna Sarkar, & Arpit Mishra: Co-occurrence graph based iterative bilingual lexicon extraction from comparable corpora. [Coling 2010] Proceedings of the 4th Workshop on Cross Lingual Information Access, Beijing, China, 28 August 2010; pp.35-42. [PDF, 618KB]

(2010) Do Thi Ngoc Diep, Laurent Besacier, & Eric Castelli: A fully unsupervised approach for mining parallel data from comparable corpora. EAMT 2010: Proceedings of the 14th Annual conference of the European Association for Machine Translation, 27-28 May 2010, Saint-Raphaël, France. Proceedings ed.Viggo Hansen and François Yvon; 8pp. [PDF, 856KB]; presentation: 30 slides [PDF, 2621KB]

(2010) Andreas Eisele & Jia Xu: Improving machine translation performance using comparable corpora.  [LREC 2010] Proceedings of the 3rd Workshop on Building and Using Comparable Corpora, Malta, 22 May 2010; pp.35-41. [PDF, 104KB]

(2010) Pascale Fung, Emmanuel Prochasson, & Simon Shi: Trillions of comparable documents. [LREC 2010] Proceedings of the 3rd Workshop on Building and Using Comparable Corpora, Malta, 22 May 2010; pp.26-34. [PDF, 199KB]

(2010) Pablo Gamallo Otero & Isaac González López: Wikipedia as multilingual source of comparable corpora. [LREC 2010] Proceedings of the 3rd Workshop on Building and Using Comparable Corpora, Malta, 22 May 2010; pp.21-25. [PDF, 180KB]

(2010) Degen Huang, Lian Zhao, Lishuang Li, & Haitao Yu: Mining large-scale comparable corpora from Chinese-English news collections. Coling 2010: 23rd International Conference on Computational Linguistics, 23-27 August 2010, Beijing International Convention Center, Beijing, China, Posters volume; pp.472-480. [PDF, 255KB]

(2010) Hiroyuki Kaji, Takashi Tsunakawa, & Daisuke Okada: Using comparable corpora to adapt a translation model to domains. LREC 2010: proceedings of the  seventh international conference on Language Resources and Evaluation, 17-23 May 2010, Valletta, Malta; pp.2182-2188. [PDF, 432KB]

(2010) Lianhau Lee, Aiti Aw, Min Zhang, & Haizhou Li: EM-based hybrid model for bilingual terminology extraction from comparable corpora. Coling 2010: 23rd International Conference on Computational Linguistics, 23-27 August 2010, Beijing International Convention Center, Beijing, China, Posters volume; pp.639-646. [PDF, 114KB]

(2010) Bo Li & Eric Gaussier: Improving corpus comparability for bilingual lexicon extraction from comparable corpora. Coling 2010: 23rd International Conference on Computational Linguistics. Proceedings of the conference, 23-27 August 2010, Beijing International Convention Center, Beijing, China; pp.644-652. [PDF, 276KB]

(2010) Bin Lu, Tao Jiang, Kapo Chow, & Benjamin K. Tsou: Building a large English-Chinese parallel corpus from comparable patents and its experimental application to SMT.  [LREC 2010] Proceedings of the 3rd Workshop on Building and Using Comparable Corpora, Malta, 22 May 2010; pp.42-49. [PDF, 271KB]

(2010) Inguna Skadiņa, Andrejs Vasiļjevs, Raivis Skadiņš, Robert Gaizauskas, Dan Tufiş, & Tatiana Gornostay: Analysis and evaluation of comparable corpora for under resourced areas of machine translation. [LREC 2010] Proceedings of the 3rd Workshop on Building and Using Comparable Corpora, Malta, 22 May 2010; pp.6-14. [PDF, 318KB]

(2010) Jason R.Smith, Chris Quirk, & Kristina Toutanova: Extracting parallel sentences from comparable corpora using document level alignment. NAACL HLT 2010: Human Language Technologies: the 2010 annual conference of the North American Chapter of the Association for Computational Linguistics. Proceedings… June 2-4, 2010, Los Angeles, California; pp.403-411. [PDF, 313KB]

Concordances

(2013) Adam Kilgarriff: Terminology finding, parallel corpora and bilingual word sketches in the Sketch Engine. [Aslib 2013] Translating and the Computer 35, 28-29 November 2013, etc.venues, Paddington, London, UK; 6pp. [PDF, 1018KB]; presentation, 23 slides [PDF of PPT, 957KB]

(2012) Ming-Hong Bai, Yu-Ming Hsieh, Keh-Jiann Chen, & Jason S.Chang: DOMCAT: a bilingual concordancer for domain-specific computer assisted translation. [ACL 2012] Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju, Republic of Korea, 10 July 2012, System Demonstrations; pp.55-60. [PDF, 215KB]

(2012) Paola Valli: How long is a piece of string? Concordance searches and user behavior investigated. [Aslib 2012] Translating and the Computer 34, 29-30 November 2012, One Birdcage Walk, London, UK; 11pp. [PDF, 359KB], presentation: 20 slides [PDF, 2909KB]

(2010) Alain Désilets: WeBiText: multilingual concordancer built from public high quality web content. AMTA 2010: the Ninth conference of the Association for Machine Translation in the Americas, Denver, Colorado, October 31 – November 4, 2010; abstract

Corpora see Bilingual corpora, Comparable corpora, Monolingual corpora, Multilingual corpora

Crowd sourcing

(2014) Shinsuke Goto, Donghui Lin, & Toru Ishida: Crowdsourcing for evaluating machine translation quality. LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.3456-3463. [PDF, 214]

(2014) Miguel A.Jiménez Crespo: Beyond prescription: what empirical studies are telling us about localization crowdsourcing. Translating and the Computer 36: proceedings. Asling: International Society for Advancement in Language Technology, 27-28 November 2014; pp.27-35. [PDF, 142KB]

(2014) Mitesh M.Khapra, Ananthakrishnan Ramanathan, Anoop Kunchukuttan, Karthik Visweswariah, & Pushpak Bhattacharyya: When transliteration met crowdsourcing: an empirical study of transliteration via crowdsourcing using efficient, non-redundant and fair quality control. LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.196-202. [PDF, 180KB]

(2014) Wang Ling, Luís Marujo, Chris Dyer, Alan Black & Isabel Trancoso: Crowdsourcing high-quality parallel data extraction from Twitter. [WMT 2014] Proceedings of the Ninth Workshop on Statistical Machine Translation, Baltimore, Maryland USA, June 26-27, 2014; pp.426-436. [PDF, 459KB]

(2014) Erin Lyons: Far from the maddening crowd: integrating collaborative translation technologies into healthcare services in the developing world. Translating and the Computer 36: proceedings. Asling: International Society for Advancement in Language Technology, 27-28 November 2014; pp.165-173. [PDF, 291KB]

(2014) Eduard Šubert & Ondřej Bojar: Twitter Crowd Translation – design and objectives. Translating and the Computer 36: proceedings. Asling: International Society for Advancement in Language Technology, 27-28 November 2014; pp.217-227. [PDF, 324KB]

(2014) XLike: cross-lingual knowledge extraction. Project duration: January 2012 – December 2014. Proceedings of the 17th annual conference of the European Association for Machine Translation, EAMT 2014, Dubrovnik, Croatia, 16th-18th June 2014, edited by Marko Tadić, Philipp Koehn, Johann Roturier, Andy Way; p.131. [PDF, 417KB]

(2013) Anoop Kunchukuttan, Rajen Chatterjee, Shourya Roy, Abhijit Mishra, & Pushpak Bhattacharyya: TransDoop: a map-reduce based crowdsourced translation for complex domains. ACL-2013: Proceedings of the 51st Meeting of the Association for Computational Linguistics, System demonstrations, Sofia, Bulgaria, August 4-9 2013; pp.175-180. [PDF, 1061KB]

(2013) Michael Matuschek, Christian M.Meyer, & Iryna Gurevych: Multilingual knowledge in aligned Wiktionary and OmegaWiki for translation applications. Translation: Computation, Corpora,  Cognition 3 (1), June 2013; pp.87-118. [PDF, 2898KB]

(2013) Aram Morera-Mesa, J.J.Collins, & David Filip: Selected crowdsourced translation practices. [Aslib 2013] Translating and the Computer 35, 28-29 November 2013, etc.venues, Paddington, London, UK; 15pp. [PDF, 918KB]; presentation, 33 slides [PDF of PPT, 304KB]

(2013) Rabih Zbib, Gretchen Markiewicz, Spyros Matsoukas, Richard Schwartz, & John Makhoul: Systematic comparison of professional and crowdsourced reference translations for machine translation. [NAACL-HLT 2013] The 2013 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 9-14 June 2013, Atlanta, Georgia; pp.612-616. [PDF, 89KB]

(2013) XLIKE: cross-lingual knowledge extraction (XLike). Proceedings of the XIV Machine Translation Summit, Nice, September 2-6, 2013; ed. K.Sima’an, M.L.Forcada, D.Grasmick, H.Depraetere, A.Way; p.451. [PDF, 191KB]

(2012) Anoop Kunchukuttan, Shourya Roy, Pratik Patel, Kushal Ladha, Somya Gupta, Mitesh Khapra, & Pushpak Bhattacharyya: Experiences in resource generation for machine translation through crowdsourcing.  LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.384-391. [PDF, 467KB]

(2012) Dawn Lawrie, James Mayfield, Paul McNamee, & Douglas W.Oard: Creating and curating a cross-language person-entity linking collection.  LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.3106-3110. [PDF, 312KB]

(2012) Victor Muntés-Mulero, Patricia Paladini, Marc Solé, & Jawad Manzoor: Multiplying the potential of crowdsourcing with machine translation. AMTA-2012: the Tenth Biennial Conference of the Association for Machine Translation in the Americas. Proceedings, San Diego, CA, October 28 – November 1, 2012; 7pp. [PDF, 246KB]

(2012) Michael Paul, Eiichiro Sumita, Luisa Bentivogli, & Marcello Federico: Crowd-based MT evaluation for non-English target languages. EAMT 2012: Proceedings of the 16th Annual Conference of the European Association for Machine Translation, Trento, Italy, May 28-30 2012, ed. Mauro Cettolo, Marcello Federico, Lucia Specia, Andy Way; pp.229-237. [PDF, 260KB]

(2012) Matt Post, Chris Callison-Burch, & Miles Osborne: Constructing parallel corpora for six Indian languages via crowdsourcing. WMT 2012: 7th Workshop on Statistical Machine Translation. Proceedings of the workshop, June 7-8, 2012, Montréal, Canada; pp.401-409. [PDF, 388KB]

(2012) Marion Potet, Emmanuelle Esperança-Rodier, Laurent Besacier, & Hervé Blanchon: Collection of a large database of French-English SMT output corrections.  LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.4043-4048. [PDF, 357KB]

(2012) Midori Tatsumi, Takako Aikawa, Kentaro Yamamoto, & Hitoshi Isahara: How good is crowd post-editing? Its potential and limitations. AMTA-2012: Workshop on post-editing technology and practice. Proceedings, San Diego, October 28, 2012; 10pp. [PDF, 92KB]

(2012) ACCEPT: Automated Community Content Editing porTal. [Project paper at] EAMT 2012: Proceedings of the 16th Annual Conference of the European Association for Machine Translation, Trento, Italy, May 28-30 2012, ed. Mauro Cettolo, Marcello Federico, Lucia Specia, Andy Way; p.89. [PDF, 74KB]

(2012) Confident MT: estimating translation quality for improved statistical machine translation. [Project paper at] EAMT 2012: Proceedings of the 16th Annual Conference of the European Association for Machine Translation, Trento, Italy, May 28-30 2012, ed. Mauro Cettolo, Marcello Federico, Lucia Specia, Andy Way; p.98. [PDF, 424KB]

(2011) Luisa Bentivogli, Marcello Federico, Giovanni Moretti, & Michael Paul: Getting expert quality from the crowd for machine translation evaluation. MT Summit XIII: the Thirteenth Machine Translation Summit [organized by the] Asia-Pacific Association for Machine Translation (AAMT), 19-23 September 2011, Xiamen, China; pp.521-528. [PDF, 337KB]

(2011) Karën Fort, Gilles Adda, & K.Bretonnel Cohen: Amazon Mechanical Turk: gold mine or coal mine?  Computational Linguistics 37 (2), pp. 413-420 [PDF, 96KB]

(2011) Chang Hu, Philip Resnik, Yakov Kronrod, Vladimir Eidelman, Olivia Buzek, & Benjamin B.Bederson: The value of monolingual crowdsourcing in a real-world translation scenrio: simulation using Haitian Creole emergency SMS messages. [WMT 2011] Proceedings of the 6th Workshop on Statistical Machine Translation, Edinburgh, Scotland, UK, July 30-31, 2011; pp.399-404 . [PDF, 164KB]

(2011) Shasha Liao, Cheng Wu, & Juan Huerta: Evaluating human correction quality for machine translation from crowdsourcing.  [RANLP 2011] Proceedings of Recent Advances in Natural Language Processing, Hissar, Bulgaria, 12-14 September 2011; pp.598-603. [PDF, 487KB]

(2011) Matteo Negri, Luisa Bentivogli, Yashar Mehdad, Danilo Giampiccolo, & Alessandro Marchetti: Divide and conquer: crowdsourcing the creation of cross-lingual textual entailment corpora.  [EMNLP 2011] Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, Edinburgh, Scotland, UK, July 27-31, 2011; pp.670-679. [PDF, 493KB]

(2011) Omar F.Zaidan & Chris Callison-Burch: Crowdsourcing translation: professional quality from non-professionals. ACL-HLT 2011: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, Portland, Oregon, June 19-24, 2011; pp.1220-1229. [PDF, 241KB]

(2010) Gilles Adda & Joseph Mariani: Language resources & Amazon Mechanical Turk: ethical, legal and other issues. LREC 2010: Le gal Issues for Sharing Language Resources - LISLR2010 Workshop, 17 May 2010, Valletta, Malta; 21slides. [PDF of PPT, 244KB]

(2010) Vamshi Ambati, Stephen Vogel, & Jaime Carbonell: Active learning and crowd-sourcing for machine translation. LREC 2010: proceedings of the seventh international conference on Language Resources and Evaluation, 17-23 May 2010, Valletta, Malta; pp.2169-2174. [PDF, 436KB]

(2010) Vamshi Ambati & Stephan Vogel: Can crowds build parallel corpora for machine translation systems? Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, Los Angeles, CA, June 2010; pp.62-65. [PDF, 89KB]

(2010) Michael Denkowski, Hasan Al-Haj, & Alon Lavie: Turker-assisted paraphrasing for English-Arabic machine translation. Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, Los Angeles, CA, June 2010; pp.66-70. [PDF, 237KB]

(2010) Michael Denkowski & Alon Lavie: Exploring normalization techniques for human judgments of machine translation adequacy collected using Amazon Mechanical Turk. Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, Los Angeles, CA, June 2010; pp.57-61. [PDF, 248KB]

(2010) Alain Désilets: Collaborative translation: technology, crowdsourcing, and the translator perspective. Introduction to workshop at AMTA 2010: the Ninth conference of the Association for Machine Translation in the Americas, Denver, Colorado, October 31, 2010; 2pp. [PDF, 114KB]

(2010) Bill Dolan: Building partnerships with language communities: the importance of shared technology and shared data. META-FORUM 2010: Challenges for multilingual Europe, November 17/18 2010, Brussels, Belgium; 38 slides [PDF of PPT, 1245KB]

(2010) Qin Gao & Stephan Vogel: Consensus versus expertise: a case study of word alignment with Mechanical Turk. Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, Los Angeles, CA, June 2010; pp.30-34. [PDF, 389KB]

(2010) Yakov Kronrod, Philip Resnik, Olivia Buzek, Chang Hu, Alex Quinn, & Benjamin B.Bederson: Improving translation via targeted paraphrasing. Contribution to workshop of ‘Collaborative translation’ at AMTA 2010: the Ninth conference of the Association for Machine Translation in the Americas, Denver, Colorado, October 31, 2010; 4pp. [PDF, 79KB]

(2010) Robert Munro, Steven Bethard, Victor Kuperman, Vicky Tzuyin Lai, Robin Melnick, Christopher Potts, Tyler Schnoebelen, & Harry Tily: Crowdsourcing and language studies: the new generation of linguistic data. Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, Los Angeles, CA, June 2010; pp.122-130. [PDF, 310KB]

(2010) Robert Munro: Crowdsourcing translation for emergency response in Haiti: the global collaboration of local knowledge. Contribution to workshop of ‘Collaborative translation’ at AMTA 2010: the Ninth conference of the Association for Machine Translation in the Americas, Denver, Colorado, October 31, 2010; 4pp. [PDF, 331KB]

(2010) Sharon O’Brien & Reinhard Schäler: Next generation translation and localization: users are taking charge. Translating and the Computer 32, 18-19 November 2010, London; 10pp. [PDF, 599KB]

(2010) Mike O’Malley: The challenges of distributed parallel corpora. AMTA 2010: the Ninth conference of the Association for Machine Translation in the Americas, Denver, Colorado, October 31 – November 4, 2010; abstract

(2010) Willem Stoeller: Community translation. Translingual Europe 2010, Hotel Maritim, Berlin, Germany, Monday June 7th 2010; 19pp. [PDF, 4871KB]

(2010) Anas Tawileh: Managing social translation: online tools for translators’ communities. Translating and the Computer 32, 18-19 November 2010, London; 8pp. [PDF, 22KB]

(2010) Jost Zetzsche: Crowdsourcing and the professional translator. Contribution to workshop of ‘Collaborative translation’ at AMTA 2010: the Ninth conference of the Association for Machine Translation in the Americas, Denver, Colorado, October 31, 2010; 1p. [PDF, 54KB]

Data elicitation

(2011) Sergei Nirenburg & Marjorie McShane: Morphological aspects of computer-driven elicitation of  knowledge about any language [abstract]. Machine Translation and Morphologically- rich Languages: Research Workshop of the Israel Science Foundation, University of Haifa, Israel, 26 January, 2011; presentation: 47 slides [PDF of PPT, 1907KB]

 (2011) Keiji Yasuda, Hideo Okuma, Masao Utiyama, & Eiichiro Sumita: Annotating data selection for improving machine translation. IWSLT 2011: Proceedings of the International Workshop on Spoken Language Translation, San Francisco, December 8-9, 2011, ed. Marcello Federico, Mei-Yuh Hwang, Margit Rödder, Sebastian Stüker; pp.269-274. [PDF, 352KB]

(2010) Vamshi Ambati, Stephan Vogel & Jaime Carbonell: Active learning-based elicitation for semi-supervised word alignment. ACL 2010: the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, July 11-16, 2010: Proceedings of the Conference Short Papers; pp.365-370. [PDF, 182KB]

Dictionaries see Lexical resources

Domain identification

(2013) Tsutomu Hirao, Tomoharu Iwata, & Masaaki Nagata: Latent semantic matching: application to cross-language text categorization without alignment information.  ACL-2013: Proceedings of the 51st Meeting of the Association for Computational Linguistics, Short papers, Sofia, Bulgaria, August 4-9 2013; pp.212-216. [PDF, 846KB]

(2013) Vivi Nastase & Carlo Strapparava: Bridging languages through etymology: the case of cross language text categorization.  ACL-2013: Proceedings of the 51st Meeting of the Association for Computational Linguistics, Sofia, Bulgaria, August 4-9 2013; pp.651-659. [PDF, 229KB]

(2013) Magdalena Plamadă & Martin Volk: Mining for domain-specific text from Wikipedia.  Proceedings of the 6th Workshop on Building and Using Comparable Corpora, Sofia, Bulgaria, August 8, 2013; pp.112-120. [PDF, 382KB]

(2011) Zhengxian Gong, Min Zhang, & Guodong Zhou: Cache-based document-level statistical machine translation. [EMNLP 2011] Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, Edinburgh, Scotland, UK, July 27-31, 2011; pp.909-919. [PDF, 330KB]

(2011) Zhengxian Gong, Guodong Zhou, & Liangyou Li: Improve SMT with source-side “topic-document” distributions. MT Summit XIII: the Thirteenth Machine Translation Summit [organized by the] Asia-Pacific Association for Machine Translation (AAMT), 19-23 September 2011, Xiamen, China; pp.496-501. [PDF, 297KB]

(2011) Bruno Pouliquen, Christophe Mazenc & Aldo Iorio: Tapta: a user-driven translation system for patent documents based on domain-aware statistical machine translation. [EAMT 2011]: proceedings of the 15th conference of the European Association for Machine Translation, 30-31 May 2011, Leuven, Belgium; eds. Mikel L.Forcada, Heidi Depraetere, Vincent Vandeghinste; pp.5-12. [PDF, 342KB]; presentation, 15 slides [PDF, 1642KB]

 (2011) Ivan Vulić, Wim De Smet, & Marie-Francine Moens: Identifying word translations from comparable corpora using latent topic models. ACL-HLT 2011: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Short papers, Portland, Oregon, June 19-24, 2011; pp.479-484. [PDF, 153KB]

Domain restriction, adaptation and specification

(2015) Jinhua Du, Andy Way, Zhengwei Qiu, Asanka Wasala, & Reinhard Schaler: Domain adaptation for social localisation-based SMT: a case study using the Trommons platform. MT Summit XV, October 30 – November 3, 2015, Miami, Florida, USA. Proceedings of MT Summit XV: Fourth Workshop on Post-editing Technology and Practice (WPTP 4); p.57-65. [PDF, 450KB]

(2015) Nadir Durrani, Hassan Sajjad, Shafiq Joty, Ahmed Abdelali, & Stephan Vogel: Using joint models or domain adaptation in statistical machine translation. MT Summit XV, October 30 – November 3, 2015, Miami, Florida, USA. Proceedings of MT Summit XV: vol.1: MT Researchers’ Track; pp.117-130. [PDF, 637KB]

(2015) Matthias Huck, Alexandra Birch, & Barry Haddow: Mixed domain vs. multi-domain statistical machine translation. MT Summit XV, October 30 – November 3, 2015, Miami, Florida, USA. Proceedings of MT Summit XV: vol.1: MT Researchers’ Track; pp.240-255. [PDF, 578KB]

(2015) Minh-Thang Luong & Christopher Manning: Stanford neural machine translation systems for spoken language domains. [IWSLT 2015] Proceedings of the International Workshop on Spoken Language Translation, December 3-4, 2015, Da Nang, Vietnam; pp.76-79. [PDF, 1.2MB]

(2015) Keisuke Noguchi & Takashi Ninomiya: Resampling approach for instance-based domain adaptation from patent domain to newspaper domain in statistical machine translation. MT Summit XV, October 30 – November 3, 2015, Miami, Florida, USA. Proceedings of MT Summit XV: Sixth Workshop on Patent and Scientific Literature Translation (PSLT6); pp.81-88. [PDF, 447KB]

(2015) Krzysztof Wolk & Krzysztof Marasek: PJAIT systems for the IWSLT 2015 evaluation campaign enhanced by comparable corpora. [IWSLT 2015] Proceedings of the International Workshop on Spoken Language Translation, December 3-4, 2015, Da Nang, Vietnam; pp.101-104. [PDF, 2.9MB]

(2014) Marine Carpuat, Cyril Goutte, & George Foster: Linear mixture models for robust machine translation. [WMT 2014] Proceedings of the Ninth Workshop on Statistical Machine Translation, Baltimore, Maryland USA, June 26-27, 2014; pp.499-509. [PDF, 329KB]

(2014) Mauro Cettolo, Nicola Bertoldi, & Marcello Federico: The repetition rate of text as a predictor of the effectiveness of machine translation adaptation. AMTA 2014: proceedings of the eleventh conference of the Association for Machine Translation in the Americas, Vancouver, BC, October 22-26; pp. 166-179. [PDF, 954KB]

(2014) Boxing Chen, Roland Kuhn, & George Foster: A comparison of mixture and vector space techniques for translation model adaptation. AMTA 2014: proceedings of the eleventh conference of the Association for Machine Translation in the Americas, Vancouver, BC, October 22-26; pp.124-138 [PDF, 521KB]

(2014) Eva Hasler, Barry Haddow, & Philipp Koehn: Dynamic topic adaptation for SMT using distribution profiles. [WMT 2014] Proceedings of the Ninth Workshop on Statistical Machine Translation, Baltimore, Maryland USA, June 26-27, 2014; pp.445-456. [PDF, 740KB]

(2014) Ann Irvine & Chris Callison-Burch: Using comparable corpora to adapt MT models to new domains. [WMT 2014] Proceedings of the Ninth Workshop on Statistical Machine Translation, Baltimore, Maryland USA, June 26-27, 2014; pp.437-444. [PDF, 274KB]

(2014) Yi Lu, Longyue Wang, Derek F.Wong, Lidia S.Chao, Yiming Wang, & Francisco Oliveira: Domain adaptation for medical text translation using web resources. [WMT 2014] Proceedings of the Ninth Workshop on Statistical Machine Translation, Baltimore, Maryland USA, June 26-27, 2014; pp.233-238. [PDF, 485KB]

(2014) Saab Mansour & Herman Ney: Translation model based weighting for phrase extraction. Proceedings of the 17th annual conference of the European Association for Machine Translation, EAMT 2014, Dubrovnik, Croatia, 16th-18th June 2014; pp.35-43. [PDF, 448KB]

(2014) Saab Mansour & Hermann Ney: Unsupervised adaptation for statistical machine translation. [WMT 2014] Proceedings of the Ninth Workshop on Statistical Machine Translation, Baltimore, Maryland USA, June 26-27, 2014; pp.457-465. [PDF, 321KB]

(2014) Shachar Mirkin & Laurent Besacier: Data selection for compact adapted SMT models. AMTA 2014: proceedings of the eleventh conference of the Association for Machine Translation in the Americas, Vancouver, BC, October 22-26; pp.301-314. [PDF, 610KB]

(2014) Katsuhito Sudoh, Masaaki Nagata, Shinsuke Mori, & Tatsuya Kawahara: Japanese-to-English patent translation system based on domain-adapted word segmentation and post-ordering. AMTA 2014: proceedings of the eleventh conference of the Association for Machine Translation in the Americas, Vancouver, BC, October 22-26; pp.234-248. [PDF, 743KB]

(2014) Longyue Wang, Yi Lu, Derek F.Wong, Lidia Chao, Yiming Wang, & Francisco Oliveira: Combining domain adaptation approaches for medical text translation. [WMT 2014] Proceedings of the Ninth Workshop on Statistical Machine Translation, Baltimore, Maryland USA, June 26-27, 2014; pp.254-259. [PDF, 348KB]

(2014) Marion Weller, Alexander Fraser, & Ulrich Heid: Combining bilingual terminology mining and morphological modeling for domain adaptation in SMT. Proceedings of the 17th annual conference of the European Association for Machine Translation, EAMT 2014, Dubrovnik, Croatia, 16th-18th June 2014, edited by Marko Tadić, Philipp Koehn, Johann Roturier, Andy Way; pp.11-18. [PDF, 387KB]

 (2013) Mihael Arcan, Susan Marie Thomas, Derek de Brandt, & Paul Buitelaar: Translating the FINREP taxonomy using a domain-specific corpus. Proceedings of the XIV Machine Translation Summit, Nice, September 2-6, 2013; ed. K.Sima’an, M.L.Forcada, D.Grasmick, H.Depraetere, A.Way; pp.199-206. [PDF, 473KB]

(2013) Pratyush Banerjee, Raphael Rubino, Johann Roturier, & Josef van Genabith: Quality estimation-guided data selection for domain adaptation of SMT. Proceedings of the XIV Machine Translation Summit, Nice, September 2-6, 2013; ed. K.Sima’an, M.L.Forcada, D.Grasmick, H.Depraetere, A.Way; pp.107-108. [PDF, 658KB]

(2013) Peter Bell, Fergus McInnes, Siva Reddy Gangireddy, Mark Sinclair, Alexandra Birch, & Steve Renals: The UEDIN English ASR system for the IWSLT 2013 evaluation. [IWSLT 2013] Proceedings of the 10th International Workshop on Spoken Language Translation, Heidelberg, Germany, Dec.5-6, 2013; 6pp. [PDF, 173KB]

(2013) Nicola Bertoldi, Mauro Cettolo, & Marcello Federico: Cache-based online adaptation for machine translation enhanced computer assisted translation. Proceedings of the XIV Machine Translation Summit, Nice, September 2-6, 2013; ed. K.Sima’an, M.L.Forcada, D.Grasmick, H.Depraetere, A.Way; pp.35-42. [PDF, 471KB]

(2013) Dhouha Bouamor, Adrian Popescu, Nasredine Semmar, & Pierre Zweigenbaum: Building specialized bilingual lexicons using large-scale background knowledge. [EMNLP 2013] Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, Washington, USA, 18-21 October 2013; pp. 479-489. [PDF, 246KB]

(2013) Pierrette Bouillon: Automated Community Content Editing PorTal (ACCEPT). Proceedings of the XIV Machine Translation Summit, Nice, September 2-6, 2013; ed. K.Sima’an, M.L.Forcada, D.Grasmick, H.Depraetere, A.Way; p.407. [PDF, 182KB]

(2013) Mauro Cettolo, Christophe Servan, Nicola Bertoldi, Marcello Federico, Loďc Barrault, & Holger Schwenk: Issues in incremental adaptation of statistical MT from human post-edits. Proceedings of MT Summit XIV Workshop on Post-editing Technology and Practice (WPTP-2), Nice, France, 2 September 2013; pp. 111-118. [PDF, 186KB]

(2013) Boxing Chen, Roland Kuhn, & George Foster: Vector space model for adaptation in statistical machine translation. ACL-2013: Proceedings of the 51st Meeting of the Association for Computational Linguistics, Sofia, Bulgaria, August 4-9 2013; pp.1285-1293. [PDF, 207KB]

(2013) Lei Cui, Xilun Chen, Dongdong Zhang, Shujie Liu, Mu Li, & Ming Zhou: Multi-domain adaptation for SMT using multi-task learning. [EMNLP 2013] Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, Washington, USA, 18-21 October 2013; pp.1055-1065. [PDF, 302KB]

(2013) Kevin Duh, Graham Neubig, Katsuhito Sudoh, & Hajime Tsukada: Adapation data selection using neural language models: experiments in machine translation.  ACL-2013: Proceedings of the 51st Meeting of the Association for Computational Linguistics, Short papers, Sofia, Bulgaria, August 4-9 2013; pp.678-683. [PDF, 215KB]

(2013) Mirela-Ştefania Duma & Cristina Vertan: Integration of machine translation in on-line multilingual applications – domain adaptation. Translation: Computation, Corpora,  Cognition 3 (1), June 2013; pp.67-74. [PDF, 395KB]

(2013) Nadir Durrani, Barry Haddow, Kenneth Heafield & Philipp Koehn: Edinburgh’s machine translation systems for European language pairs. WMT 2013: 8th Workshop on Statistical Machine Translation, Proceedings of the Workshop, August 8-9, 2013, Sofia, Bulgaria; pp.114-121. [PDF, 208KB]

(2013) Marcello Federico, Philipp Koehn, Holger Schwenk, & Marco Trombetti: Matecat: Machine Translation Enhanced Computer Assisted Translation. Proceedings of the XIV Machine Translation Summit, Nice, September 2-6, 2013; ed. K.Sima’an, M.L.Forcada, D.Grasmick, H.Depraetere, A.Way; p.425. [PDF, 192KB]

(2013) Lluís Formiga, Marta R. Costa-jussŕ, José B. Marińo, José A. R. Fonollosa, Alberto Barrón-Cedeńo & Lluis Marquez: The TALP-UPC phrase-based translation systems for WMT13: system combination with morphology generation, domain adaptation and corpus filtering. WMT 2013: 8th Workshop on Statistical Machine Translation, Proceedings of the Workshop, August 8-9, 2013, Sofia, Bulgaria; pp.134-140. [PDF, 159KB]

(2013) George Foster, Boxing Chen, & Roland Kuhn: Simulating discriminative training for linear mixture adaptation in statistical machine translation. Proceedings of the XIV Machine Translation Summit, Nice, September 2-6, 2013; ed. K.Sima’an, M.L.Forcada, D.Grasmick, H.Depraetere, A.Way; pp.183-190. [PDF, 556KB]

(2013) Than-Le Ha, Teresa Herrmann, Jan Niehues, Mohammed Mediani, Eunah Cho, Yuqi Zhang, Isabel Slawik, & Alex Waibel: The KIT translation systems for IWSLT 2013. [IWSLT 2013] Proceedings of the 10th International Workshop on Spoken Language Translation, Heidelberg, Germany, Dec.5-6, 2013; 7pp. [PDF, 147KB]

(2013) Sanjika Hewavitharana, Dennis N.Mehay, Sankaranarayanan Ananthakrishnan, & Prem Natarajan: Incremental topic-based translation model adaptation for conversational spoken language translation.  ACL-2013: Proceedings of the 51st Meeting of the Association for Computational Linguistics, Short papers, Sofia, Bulgaria, August 4-9 2013; pp.697-701. [PDF, 279KB]

(2013) Felix Hieber, Laura Jehl, & Stefan Riezler: Task alternation in parallel sentence retrieval for Twitter translation. ACL-2013: Proceedings of the 51st Meeting of the Association for Computational Linguistics, Short papers, Sofia, Bulgaria, August 4-9 2013; pp.323-327. [PDF, 208KB]

(2013) An-Chang Hsieh, Hen-Hsen Huang, & Hsin-His Chen: Uses of monolingual in-domain corpora for cross-domain adaptation with hybrid MT approaches. Proceedings of the Second Workshop on Hybrid Approaches to Translation, Sofia, Bulgaria, August 8, 2013; pp.117-122. [PDF, 275KB]

(2013) Ann Irvine, Chris Quirk, & Hal Daumé III: Monolingual marginal matching for translation model adaptation. [EMNLP 2013] Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, Washington, USA, 18-21 October 2013; pp.1077-1088. [PDF, 247KB]

(2013) Yun Jin, Oh-Woog Kwon, Seung-Hoon Na & Young-Gil Kim: Patent translation as technical document translation: customizing a Chinese-Korean MT system to patent domain. [MT Summit XIV] Proceedings of the 5th Workshop on Patent Translation, Nice, September 2, 2013; pp.28-33. [PDF, 1378KB]

(2013) Samuel Läubli, Mark Fishel, Manuela Weibel, & Martin Volk: Statistical machine translation for automobile marketing texts. Proceedings of the XIV Machine Translation Summit, Nice, September 2-6, 2013; ed. K.Sima’an, M.L.Forcada, D.Grasmick, H.Depraetere, A.Way; pp.265-272. [PDF, 541KB]

(2013) Stephan Peitz, Saab Mansour, Jan-Thorsten Peter, Christoph Schmidt, Joern Wuebker, Matthias Huck, Markus Freitag, & Hermann Ney: The RWTH Aachen machine translation system for WMT 2013. WMT 2013: 8th Workshop on Statistical Machine Translation, Proceedings of the Workshop, August 8-9, 2013, Sofia, Bulgaria; pp.193-199. [PDF, 220KB]

(2013) Hassan Sajjad, Francisco Guzmán, Preslav Nakov, Ahmed Abdelali, Kenton Murray, Fahad Al Obaidli, & Stephan Vogel: QCRI at IWSLT 2013: experiments in Arabic-English and English-Arabic spoken language translation.  [IWSLT 2013] Proceedings of the 10th International Workshop on Spoken Language Translation, Heidelberg, Germany, Dec.5-6, 2013; 8pp. [PDF, 176KB]

(2013) Rico Sennrich, Holger Schwenk & Walid Aransa: A multi-domain translation model framework for statistical machine translation. ACL-2013: Proceedings of the 51st Meeting of the Association for Computational Linguistics, Sofia, Bulgaria, August 4-9 2013; pp.832-840. [PDF, 1387KB]

(2013) Longyue Wang, Derek F.Wong, Lidia S.Chao, Junwen Xing, Yi Lu, & Isabel Trancoso: Edit distance: a new data selection criterion for domain adaptation in SMT. Proceedings of Recent Advances in Natural  Language Processing, Hissar, Bulgaria, 7-13 September 2013; pp.727-732. [PDF, 231KB]

(2013) Petra Wolf & Ulrike Bernardi: Hybrid domain adaptation for a rule based MT system. Proceedings of the XIV Machine Translation Summit, Nice, September 2-6, 2013; ed. K.Sima’an, M.L.Forcada, D.Grasmick, H.Depraetere, A.Way; pp.321-328. [PDF, 415KB]

(2013) Heng Yu, Jinsong Su, Yajuan Lü, & Qun Liu: A topic-triggered language model for statistical machine translation. International Joint Conference on Natural Language Processing, Nagoya, Japan, 14-18 October 2013; pp.447-454. [PDF, 130KB]

(2013) Jiajun Zhang & Chengqing Zong: Learning a phrase-based translation model from monolingual data with application to domain adaptation.  ACL-2013: Proceedings of the 51st Meeting of the Association for Computational Linguistics, Sofia, Bulgaria, August 4-9 2013; pp.1425-1434. [PDF, 476KB]

(2013) Conghui Zhu, Taro Watanabe, Eiichiro Sumita, & Tiejun Zhao: Hierarchical phrase table combination for machine translation. ACL-2013: Proceedings of the 51st Meeting of the Association for Computational Linguistics, Sofia, Bulgaria, August 4-9 2013; pp.802-810. [PDF, 1464KB]

(2012) A.Ryan Aminzadeh, Jennifer Drexler, Timothy Anderson, & Wade Shen: Improved phrase translation modeling using MAP adaptation. TSD 2012: 15th International Conference on Text, Speech and Dialogue, Brno, Czech Republic, September 3-7, 2012; abstract #496, 1p. [HTML]

(2012) Mihael Arcan, Christian Federmann, & Paul Buitelaar: Experiments with term translation. Proceedings of COLING 2012: Technical Papers, Mumbai, December 2012; pp.67-82. [PDF, 178KB]

(2012) Mihael Arcan, Paul Buitelaar, & Christian Federmann: Using domain-specific and collaborative resources for term translation. SSST-6, Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation, Jeju, Republic of Korea, 12 July 2012; pp.86-94. [PDF, 122KB]

(2012) Amittai Axelrod, QingJun Li, & William D.Lewis: Applications of data selection via cross-entropy difference for real-world statistical machine translation. IWSLT-2012: 9th International Workshop on Spoken Language Translation, Hong Kong, December 6th-7th, 2012; pp.201-208. [PDF, 610KB]

(2012) Ming-Hong Bai, Yu-Ming Hsieh, Keh-Jiann Chen, & Jason S.Chang: DOMCAT: a bilingual concordancer for domain-specific computer assisted translation. [ACL 2012] Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju, Republic of Korea, 10 July 2012, System Demonstrations; pp.55-60. [PDF, 215KB]

(2012) Pratyush Banerjee, Sudip Kumar Naskar, Johann Roturier, Andy Way, & Josef van Genabith: Domain adaptation in SMT of user-generated forum content guided by OOV word reduction: normalization and/or supplementary data? EAMT 2012: Proceedings of the 16th Annual Conference of the European Association for Machine Translation, Trento, Italy, May 28-30 2012, ed. Mauro Cettolo, Marcello Federico, Lucia Specia, Andy Way; pp.169-176. [PDF, 160KB]

(2012) Pratyush Banerjee, Sudip Kumar Naskar, Johann Roturier, Andy Way, & Josef van Genabith: Translation quality-based supplementary data selection by incremental update of translation models. Proceedings of COLING 2012: Technical Papers, Mumbai, December 2012; pp.149-165. [PDF, 149KB]

(2012) Núria Bel, Vassilis Papavasiliou, Prokopis Prokopidis, Antonio Toral, & Victoria Arranz: Mining and exploiting domain-specific corpora in the PANACEA platform. [BUCC 2012] The 5th Workshop on Building and Using Comparable Corpora: “Language Resources for Machine Translation in Less-Resourced Languages and Domains”,  LREC 2012 Workshop, 26 May 2012, Istanbul, Turkey; pp.24-26. [PDF, 469KB]

(2012) Nicola Bertoldi, Mauro Cettolo, Marcello Federico, & Christian Buck: Evaluating the learning curve of domain adaptive statistical machine translation systems. WMT 2012: 7th Workshop on Statistical Machine Translation. Proceedings of the workshop, June 7-8, 2012, Montréal, Canada; pp.433-441. [PDF, 141KB]

(2012) Nicola Bertoldi & Marcello Federico: Practical domain adaptation in SMT. [Tutorial at] AMTA-2012: the Tenth Biennial Conference of the Association for Machine Translation in the Americas. Proceedings, San Diego, CA, October 28 – November 1, 2012; presentation, 44 slides. [PDF of PPT, 5006KB]

(2012) Arianna Bisazza & Marcello Federico: Cutting the long tail: hybrid language models for translation style adaptation. [EACL 2012] Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, Avignon, France, April 23-27, 2012; pp. 439-448. [PDF, 266KB]

(2012) Frédéric Blain, Holger Schwenk, & Jean Senellart: Incremental adaptation using translation information and post-editing analysis. IWSLT-2012: 9th International Workshop on Spoken Language Translation, Hong Kong, December 6th-7th, 2012; pp.229-236. [PDF, 873KB]

(2012) Han-Bin Chen, Hen-Hsen Huang, Hsin-His Chen, & Ching-Ting Tan: A simplification-translation-restoration framework for cross-domain SMT applications. Proceedings of COLING 2012: Technical Papers, Mumbai, December 2012; pp.545-560. [PDF, 744KB]

(2012) Jinying Chen, Jacob Devlin, Huaigu Cao, Rohit Prasad, & Premkumar Natarajan: Automatic tune set generation for machine translation with limited in-domain data. EAMT 2012: Proceedings of the 16th Annual Conference of the European Association for Machine Translation, Trento, Italy, May 28-30 2012, ed. Mauro Cettolo, Marcello Federico, Lucia Specia, Andy Way; pp.161-168. [PDF, 287KB]

(2012) Jonathan H.Clark, Alon Lavie, & Chris Dyer: One system, many domains: open-domain statistical machine translation via feature augmentation. AMTA-2012: the Tenth Biennial Conference of the Association for Machine Translation in the Americas. Proceedings, San Diego, CA, October 28 – November 1, 2012; 10pp. [PDF, 396KB]

(2012) Béatrice Daille: Building bilingual terminologies from comparable corpora: the TTC TermSuite.  [BUCC 2012] The 5th Workshop on Building and Using Comparable Corpora: “Language Resources for Machine Translation in Less-Resourced Languages and Domains”,  LREC 2012 Workshop, 26 May 2012, Istanbul, Turkey; pp.29-32. [PDF, 317KB]

(2012) Hal Daumé III, Marine Carpuat, Alex Fraser, & Chris Quirk: Domain adaptation in machine translation: findings from the 2012 Johns Hopkins University Summer Workshop. Keynote [abstract]. AMTA-2012: the Tenth Biennial Conference of the Association for Machine Translation in the Americas. Proceedings, San Diego, CA, October 28 – November 1, 2012; 1p. [PDF, 158KB]

(2012) Estelle Delpech, Béatrice Daille, Emmanuel Morin, & Claire Lemaire: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking. Proceedings of COLING 2012: Technical Papers, Mumbai, December 2012; pp.745-761. [PDF, 319KB]

(2012) Qing Dou & Kevin Knight: Large scale decipherment for out-of-domain machine translation. EMNLP-CoNLL 2012: Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Proceedings of the conference, July 12-14, Jeju Island, Korea; pp.266-275. [PDF, 578KB]

(2012) Vladimir Eidelman, Jordan Boyd-Graber, & Philip Resnick: Topic models for dynamic translation model adaptation. [ACL 2012] Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju, Republic of Korea, 8-14 July 2012, Short Papers; pp.115-119. [PDF, 140KB]

(2012) Atefeh Farzindar & Wael Khreich: Evaluation of domain adaptation techniques for TRANSLI in a real-world environment. AMTA-2012: the Tenth Biennial Conference of the Association for Machine Translation in the Americas. Proceedings, San Diego, CA, October 28 – November 1, 2012; 5pp. [PDF, 136KB]

(2012) Lluís Formiga, Carlos A.Henríquez Q., Adolfo Hernández, José B.Marińo, Enric Monte, & José A.R.Fonollosa: The TALP-UPC phrase-based translation systems for WMT12: morphology simplification and domain adaptation. WMT 2012: 7th Workshop on Statistical Machine Translation. Proceedings of the workshop, June 7-8, 2012, Montréal, Canada; pp.275-282. [PDF, 500KB]

(2012) Li Gong, Aurélien Max, & François Yvon: Towards contextual adaptation for any-text translation. IWSLT-2012: 9th International Workshop on Spoken Language Translation, Hong Kong, December 6th-7th, 2012; pp.292-299. [PDF, 727KB]

(2012) Roger Granada, Lucelene Lopes, Carlos Ramisch, Cassia Trojahn, Renata Vieira, & Aline Villavicencio: A comparable corpus based on aligned multilingual ontologies. [ACL 2012] Proceedings of the First Workshop on Multilingual Modeling, Jeju, Republic of Korea, 8-14 July 2012; pp.25-31. [PDF, 108KB]

(2012) Barry Haddow & Philipp Koehn: Analysing the effect of out-of-domain data on SMT systems. WMT 2012: 7th Workshop on Statistical Machine Translation. Proceedings of the workshop, June 7-8, 2012, Montréal, Canada; pp.422-432. [PDF, 155KB]

(2012) Eva Hasler, Barry Haddow, & Philipp Koehn: Sparse lexicalised features and topic adaptation for SMT. IWSLT-2012: 9th International Workshop on Spoken Language Translation, Hong Kong, December 6th-7th, 2012; pp.268-275. [PDF, 630KB]

(2012) Amir Hazem & Emmanuel Morin: ICA for bilingual lexicon extraction from comparable corpora.  [BUCC 2012] The 5th Workshop on Building and Using Comparable Corpora: “Language Resources for Machine Translation in Less-Resourced Languages and Domains”,  LREC 2012 Workshop, 26 May 2012, Istanbul, Turkey; pp.126-133. [PDF, 474KB]

(2012) Claire Jaja, Douglas M.Briesch, Jamal Laoudi, & Claire R.Voss: Assessing divergence measures for automated document routing in an adaptive MT system.  LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.3963-3970. [PDF, 722KB]

(2012) Laura Jehl, Felix Hieber, & Stefan Riezler: Twitter translation using translation-based cross-lingual retrieval. WMT 2012: 7th Workshop on Statistical Machine Translation. Proceedings of the workshop, June 7-8, 2012, Montréal, Canada; pp.410-421. [PDF, 164KB]

(2012) Maxim Khalilov & Rahzeb Choudury: Building English-Chinese and Chinese-English MT engines for the computer software domain. EAMT 2012: Proceedings of the 16th Annual Conference of the European Association for Machine Translation, Trento, Italy, May 28-30 2012, ed. Mauro Cettolo, Marcello Federico, Lucia Specia, Andy Way; pp.7-11. [PDF, 193KB]

 (2012) Patrik Lambert, Holger Schwenk, & Frédéric Blain: Automatic translation of scientific documents in the HAL archive.  LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.3933-3936. [PDF, 329KB]

(2012) Nikola Ljubešić, Špela Vintar, & Darja Fišer: Multi-word term extraction from comparable corpora by combining contextual and constituent clues.  [BUCC 2012] The 5th Workshop on Building and Using Comparable Corpora: “Language Resources for Machine Translation in Less-Resourced Languages and Domains”,  LREC 2012 Workshop, 26 May 2012, Istanbul, Turkey; pp.143-147. [PDF, 376KB]

(2012) Shixiang Lu, Wei Wei, Xiaoyin Fu, & Bo Xu: Translation model based cross-lingual language model adaptation: from word models to phrase models. EMNLP-CoNLL 2012: Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Proceedings of the conference, July 12-14, Jeju Island, Korea; pp.512-522. [PDF, 196KB]

(2012) Saab Mansour & Hermann Ney: A simple and effective weighted phrase extraction for machine translation adaptation. IWSLT-2012: 9th International Workshop on Spoken Language Translation, Hong Kong, December 6th-7th, 2012; pp.193-200. [PDF, 641KB]

(2012) Evgeny Matusov: Incremental re-training of a hybrid English-French MT system with customer translation memory data. AMTA-2012: the Tenth Biennial Conference of the Association for Machine Translation in the Americas. Proceedings, San Diego, CA, October 28 – November 1, 2012; 8pp. [PDF, 116KB]

(2012) Jan Niehues & Alex Waibel: Detailed analysis of different strategies for phrase table adaptation in SMT. AMTA-2012: the Tenth Biennial Conference of the Association for Machine Translation in the Americas. Proceedings, San Diego, CA, October 28 – November 1, 2012; 10pp. [PDF,

(2012) Lene Offersgaard & Dorte Haltrup Hansen: SMT systems for less-resourced languages based on domain-specific data.  [BUCC 2012] The 5th Workshop on Building and Using Comparable Corpora: “Language Resources for Machine Translation in Less-Resourced Languages and Domains”,  LREC 2012 Workshop, 26 May 2012, Istanbul, Turkey; pp.75-80. [PDF, 380KB]

(2012) Tsuyoshi Okita, Antonio Toral, & Josef van Genabith: Topic modeling-based domain adaptation for system combination. COLING 2012: Second Workshop on Applying Machine Learning Techniques to Optimise the Division of Labour in Hybrid MT (ML4HMT, Mumbai, December 2012; pp.45-53. [PDF, 135KB]

(2012) Pavel Pecina, Antonio  Toral, Vassilis Papavassiliou, Prokopis Prokopidis, & Josef van Genabith: Domain adaptation of statistical machine translation using web-crawled resources: a case study. EAMT 2012: Proceedings of the 16th Annual Conference of the European Association for Machine Translation, Trento, Italy, May 28-30 2012, ed. Mauro Cettolo, Marcello Federico, Lucia Specia, Andy Way; pp.145-152. [PDF, 201KB]

(2012) Pavel Pecina, Antonio Toral, & Josef van Genabith: Simple and effective parameter tuning for domain adaptation of statistical machine translation. Proceedings of COLING 2012: Technical Papers, Mumbai, December 2012; pp.2209-2224. [PDF, 166KB]

(2012) Stephan Peitz, Saab Mansour, Markus Freitag, Minwei Feng, Matthias Huck, Joern Wuebker, Malte Nuhn, Markus Nußbaum-Thom, & Hermann Ney: The RWTH Aachen speech recognition and machine translation system for IWSLT 2012. IWSLT-2012: 9th International Workshop on Spoken Language Translation, Hong Kong, December 6th-7th, 2012; pp.69-76. [PDF, 623KB]; presentation, 19 slides [PDF of PPT, 165KB]

(2012) Magdalena Plamada & Martin Volk: Towards a Wikipedia-extracted Alpine corpus.  [BUCC 2012] The 5th Workshop on Building and Using Comparable Corpora: “Language Resources for Machine Translation in Less-Resourced Languages and Domains”,  LREC 2012 Workshop, 26 May 2012, Istanbul, Turkey; pp.81-87. [PDF, 364KB]

(2012) Majid Razmara, George Foster, Baskaran Sankaran, & Anoop Sarkar: Mixing multiple translation models in statistical machine translation. [ACL 2012] Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju, Republic of Korea, 8-14 July 2012; pp.940-949. [PDF, 229KB]

(2012) Robert Remus & Mathias Bank: Textual characteristics of different-sized corpora.  [BUCC 2012] The 5th Workshop on Building and Using Comparable Corpora: “Language Resources for Machine Translation in Less-Resourced Languages and Domains”,  LREC 2012 Workshop, 26 May 2012, Istanbul, Turkey; pp.148-152. [PDF, 421KB]

(2012) Raphaël Rubino, Stéphane Huet, Fabrice Lefčvre, & Georges Linarčs: Statistical post-editing of machine translation for domain adaptation. EAMT 2012: Proceedings of the 16th Annual Conference of the European Association for Machine Translation, Trento, Italy, May 28-30 2012, ed. Mauro Cettolo, Marcello Federico, Lucia Specia, Andy Way; pp.221-228. [PDF, 225KB]

(2012) Nick Ruiz & Marcello Federico: MDI adaptation for the lazy: avoiding normalization in LM adaptation for lecture translation. IWSLT-2012: 9th International Workshop on Spoken Language Translation, Hong Kong, December 6th-7th, 2012; pp.244-251. [PDF, 665KB]

(2012) Rico Sennrich: Mixture-modeling with unsupervised clusters for domain adaptation in statistical machine translation. EAMT 2012: Proceedings of the 16th Annual Conference of the European Association for Machine Translation, Trento, Italy, May 28-30 2012, ed. Mauro Cettolo, Marcello Federico, Lucia Specia, Andy Way; pp.185-192. [PDF, 217KB]

(2012) Rico Sennrich: Perplexity minimization for translation model domain adaptation in statistical machine translation.  [EACL 2012] Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, Avignon, France, April 23-27, 2012; pp. 539-549. [PDF, 160KB]

(2012) Kashif Shah, Loďc Barrault, & Holger Schwenk: A general framework to weight heterogenous parallel data for model adaptation in statistical machine translation.  AMTA-2012: the Tenth Biennial Conference of the Association for Machine Translation in the Americas. Proceedings, San Diego, CA, October 28 – November 1, 2012; 10pp. [PDF, 194KB]

(2012) Chunqi Shi, Donghui Lin, & Toru Ishida: Service composition scenarios for task-oriented translation. LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.2951-2958. [PDF, 622KB]

(2012) Inguna Skadiņa: Analysis and evaluation of comparable corpora for under-resourced areas of machine translation.  [BUCC 2012] The 5th Workshop on Building and Using Comparable Corpora: “Language Resources for Machine Translation in Less-Resourced Languages and Domains”,  LREC 2012 Workshop, 26 May 2012, Istanbul, Turkey; pp.17-19. [PDF, 370KB]

(2012) Jinsong Su, Hua Wu, Haifeng Wang, Yidong Chen, Xiaodong Shi, Huailin Dong, & Qun Liu: Translation model adaptation for statistical machine translation with monolingual topic information. [ACL 2012] Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju, Republic of Korea, 8-14 July 2012; pp.459-468. [PDF, 368KB]

(2012) John Tinsley, Alexandru Ceausu, Jian Zhang, Heidi Depraetere, & Joeri Van de Walle: IPTranslator: facilitating patent search with machine translation. AMTA-2012: the Tenth Biennial Conference of the Association for Machine Translation in the Americas. Proceedings, San Diego, CA, October 28 – November 1, 2012; 9pp. [PDF, 419KB]

(2012) Wei Wang, Klaus Macherey, Wolfgang Macherey, Franz Och, & Peng Xu: Improved domain adaptation for statistical machine translation.  AMTA-2012: the Tenth Biennial Conference of the Association for Machine Translation in the Americas. Proceedings, San Diego, CA, October 28 – November 1, 2012; 10pp. [PDF, 103KB]

(2011) Pratyush Banerjee, Hala Almaghout, Sudip Naskar, Johann Roturier, Jie Jiang, Andy Way, & Josef van Genabith: The DCU machine translation systems for IWSLT 2011. IWSLT 2011: Proceedings of the International Workshop on Spoken Language Translation, San Francisco, December 8-9, 2011, ed. Marcello Federico, Mei-Yuh Hwang, Margit Rödder, Sebastian Stüker; pp.41-48. [PDF, 274KB]

(2011) Pratyush Banerjee, Sudip Kumar Naskar, Johann Roturier, Andy Way, & Josef van Genabith: Domain adaptation in statistical machine translation of user-forum data using component-level mixture modelling. MT Summit XIII: the Thirteenth Machine Translation Summit [organized by the] Asia-Pacific Association for Machine Translation (AAMT), 19-23 September 2011, Xiamen, China; pp.285-292. [PDF, 137KB]

(2011) Arianna Bisazza, Nick Ruiz, & Marcello Federico: Fill-up versus interpolation methods for phrase-based SMT adaptation. IWSLT 2011: Proceedings of the International Workshop on Spoken Language Translation, San Francisco, December 8-9, 2011, ed. Marcello Federico, Mei-Yuh Hwang, Margit Rödder, Sebastian Stüker; pp.136-143. [PDF, 323KB]

(2011) Alexandru Ceauşu, John Tinsley, Jian Zhang, & Andy Way: Experiments on domain adaptation for patent machine translation in the PLuTO project. [EAMT 2011]: proceedings of the 15th conference of the European Association for Machine Translation, 30-31 May 2011, Leuven, Belgium; eds. Mikel L.Forcada, Heidi Depraetere, Vincent Vandeghinste; pp.21-28. [PDF, 832KB]; presentation, 16 slides [PDF, 824KB]

(2011) Han-Bin Chen, Hen-Hsen Huang, Jengwei Tjiu, Ching-Ting Tan, & Hsin-His Chen: Identification and translation of significant patterns for cross-domain SMT applications. MT Summit XIII: the Thirteenth Machine Translation Summit [organized by the] Asia-Pacific Association for Machine Translation (AAMT), 19-23 September 2011, Xiamen, China; pp.277-284. [PDF, 137KB]

(2011) Hal Daumé & Jagadeesh Jagarlamudi: Domain adaptation for machine translation by mining unseen words. ACL-HLT 2011: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Short papers, Portland, Oregon, June 19-24, 2011; pp.407-412. [PDF, 80KB]

(2011) Kevin Duh, Katsuhito Sudoh, Tomoharu Iwata, & Hajime Tsukada: Alignment inference and Bayesian adaptation for machine translation. MT Summit XIII: the Thirteenth Machine Translation Summit [organized by the] Asia-Pacific Association for Machine Translation (AAMT), 19-23 September 2011, Xiamen, China; pp.114-121. [PDF, 117KB]

(2011) Kevin Duh, Akinori Fujino, & Masaaki Nagata: Is machine translation ripe for cross-lingual sentiment classification?  ACL-HLT 2011: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Short papers, Portland, Oregon, June 19-24, 2011; pp.429-433. [PDF, 95KB]

(2011) Cristina Espańa-Bonet, Ramona Enache, Adam Slaski, Aarne Ranta, Lluís Mŕrquez, & Meritxell Gonzŕlez: Patent translation within the MOLTO project. [MT Summit XIII] 4th Workshop on Patent Translation, Shoichi Yokoyama (ed,), Xiamen, China, September 23, 2011; pp.70-78. [PDF, 217KB]

(2011) Souhir Gahbiche-Braham, Hélčne Bonneau-Maynard, & François Yvon: Two ways to use a noisy parallel news corpus for improving statistical machine translation. ACL 2011: Proceedings of the Fourth Workshop on Building and Using Comparable Corpora, Portland, Oregon, USA, 24 June 2011; pp.44-51. [PDF, 176KB]

(2011) Monica Gavrila & Natalia Elita: Comparing corpus-based MT approaches using restricted resources. [LIHMT] International Workshop on Using Linguistic Information for Hybrid Machine Translation, 18th November 2011, Universitat Politčcnica de Catalunya, Barcelona; pp.43-49. [PDF, 304KB]

(2011) Miguel A.Jiménez-Crespo: To adapt or not to adapt in web localization: a contrastive genre-based study of original and localised legal sections in corporate websites. Journal of Specialised Translation 15 (January 2011); pp.2-27. [PDF, 237KB]

(2011) Mitesh M.Khapra, Salil Joshi, & Pushpak Bhattacharyya: It takes two to tango: a bilingual unsupervised approach for estimating sense distributions using expectation maximization. [IJCNLP 2011] Proceedings of the 5th International Joint Conference on Natural Language Processing, Chiang Mai, Thailand, November 8-13, 2011; pp.695-704. [PDF, 280KB]

(2011) Patrik Lambert, Holger Schwenk, Christophe Servan, & Sadaf Abdul-Rauf: Investigations on translation model adaptation using monolingual data. [WMT 2011] Proceedings of the 6th Workshop on Statistical Machine Translation, Edinburgh, Scotland, UK, July 30-31, 2011; pp.284-293. [PDF, 124KB]

(2011) Thomas Lavergne, Alexandre Allauzen, Hai-Son Le, & François Yvon: LIMSI’s experiments in domain adaptation for IWSLT11. IWSLT 2011: Proceedings of the International Workshop on Spoken Language Translation, San Francisco, December 8-9, 2011, ed. Marcello Federico, Mei-Yuh Hwang, Margit Rödder, Sebastian Stüker; pp.62-67. [PDF, 358KB]

(2011) Abby Levenberg, Miles Osborne, & David Matthews: Multi-stream language models for statistical machine translation.  [WMT 2011] Proceedings of the 6th Workshop on Statistical Machine Translation, Edinburgh, Scotland, UK, July 30-31, 2011; pp.177-186. [PDF, 232KB]

(2011) John McCrae, Maurizio Espinoza, Elena Monteil-Ponsoda, Guadalupe Aguado-de-Cea, & Philipp Cimiano: Combining statistical and semantic approaches to the translation of ontologies and taxonomies. Proceedings of SSST-5, Fifth Workshop on Syntax, Semantics and Structure in Statistical Translation, ACL HLT 2011, Portland, Oregon, USA, June 2011; pp.116-125. [PDF, 576KB]

(2011) Paul Maergner, Kevin Kilgour, Ian Lane & Alex Waibel: Unsupervised vocabulary selection for simultaneous lecture translation. IWSLT 2011: Proceedings of the International Workshop on Spoken Language Translation, San Francisco, December 8-9, 2011, ed. Marcello Federico, Mei-Yuh Hwang, Margit Rödder, Sebastian Stüker; pp.214-221. [PDF, 2127KB]

(2011) Saab Mansour, Joern Wuebker, & Hermann Ney: Combining translation and language model scoring for domain-specific data filtering. IWSLT 2011: Proceedings of the International Workshop on Spoken Language Translation, San Francisco, December 8-9, 2011, ed. Marcello Federico, Mei-Yuh Hwang, Margit Rödder, Sebastian Stüker; pp.222-229. [PDF, 257KB]

(2011) Emmanuel Morin & Emmanuel Prochasson: Bilingual lexicon extraction from comparable corpora enhanced with parallel corpora. ACL 2011: Proceedings of the Fourth Workshop on Building and Using Comparable Corpora, Portland, Oregon, USA, 24 June 2011; pp.27-34. [PDF, 125KB]

(2011) Jan Niehues & Alex Waibel: Using Wikipedia to translate domain-specific terms in SMT. IWSLT 2011: Proceedings of the International Workshop on Spoken Language Translation, San Francisco, December 8-9, 2011, ed. Marcello Federico, Mei-Yuh Hwang, Margit Rödder, Sebastian Stüker; pp.230-237. [PDF, 271KB]

(2011) Pavel Pecina, Antonio Toral, Andy Way, Vassilis Papavassiliou, Prokopis Prokopidis, & Maria Giagkou: Towards using web-crawled data for domain adaptation in statistical machine translation. [EAMT 2011]:  proceedings of the 15th conference of the European Association for Machine Translation, 30-31 May 2011, Leuven, Belgium; eds. Mikel L.Forcada, Heidi Depraetere, Vincent Vandeghinste; pp.297-304. [PDF, 363KB]; presentation, 25 slides [PDF]

(2011) Anders Sřgaard & Martin Haulrich: Sentence-level instance-weighting for graph-based and transition-based dependency parsing. IWPT 2011: 12th International Confernce on Parsing Technologies, October 5-7, 2011, Dublin City University; pp.43-47. [PDF, 86KB]

(2011) Linfeng Song, Haitao Mi, Yajuan Lü, & Qun Liu: Bagging-based system combination for domain adaptation. MT Summit XIII: the Thirteenth Machine Translation Summit [organized by the] Asia-Pacific Association for Machine Translation (AAMT), 19-23 September 2011, Xiamen, China; pp.293-298. [PDF, 99KB]

(2011) George Tambouratzis, Fotini Simistira, Sokratis Sofianopoulos, Nikos Tsimboukakis, & Marina Vassiliou: A resource-light phrase scheme for language-portable MT. [EAMT 2011]: proceedings of the 15th conference of the European Association for Machine Translation, 30-31 May 2011, Leuven, Belgium; eds. Mikel L.Forcada, Heidi Depraetere, Vincent Vandeghinste; pp.185-1 92. [PDF, 186KB]

(2011) Špela Vintar & Darja Fišer: Enriching Slovene WordNet with domain-specific terms. Translation: Computation, Corpora,  Cognition 1 (1), December 2011; pp.29-44. [PDF, 631KB]

(2011) Joern Wuebker, Matthias Huck, Saab Mansour, Markus Freitag, Minwei Feng, Stephan Peitz, Christoph Schmidt, & Hermann Ney: The RWTH Aachen machine translation system for IWSLT 2011. IWSLT 2011: Proceedings of the International Workshop on Spoken Language Translation, San Francisco, December 8-9, 2011, ed. Marcello Federico, Mei-Yuh Hwang, Margit Rödder, Sebastian Stüker; pp.106-113. [PDF, 193KB]

(2010) Pratyush Banerjee, Jinhua Du, Sudip Naskar, Baoli Li, Andy Way, & Josef van Genabith: Combining multi-domain statistical machine translation models using automatic classifiers. AMTA 2010: the Ninth conference of the Association for Machine Translation in the Americas, Denver, Colorado, October 31 – November 4, 2010; 10pp. [PDF, 304KB]

(2010) Josep Maria Crego, Aurélien Max, & François Yvon: Local lexical adaptation in machine translation through triangulation: SMT helping SMT. Coling 2010: 23rd International Conference on Computational Linguistics. Proceedings of the conference, 23-27 August 2010, Beijing International Convention Center, Beijing, China; pp.232-240. [PDF, 197KB]

(2010) Kevin Duh, Katsuhito Sudoh, & Hajime Tsukada: Analysis of translation model adaptation in statistical machine translation.  Proceedings of the 7th International Workshop on Spoken Language Translation, 2-3 December 2010, Paris, France; pp.243-250. [PDF, 556KB]

 (2010) George Foster, Cyril Goutte, & Roland Kuhn: Discriminative instance weighting for domain adaptation in statistical machine translation.  [EMNLP 2010] Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, MIT, Massachusetts, USA, 9-11 October 2010; pp.451-459. [PDF, 256KB]

 (2010) Laura Elisabeth Jehl: Machine translation for Twitter. Master of Science, University of Edinburgh, 2010. iv, 53pp. [PDF, 398KB]

(2010) Hiroyuki Kaji, Takashi Tsunakawa, & Daisuke Okada: Using comparable corpora to adapt a translation model to domains. LREC 2010: proceedings of the  seventh international conference on Language Resources and Evaluation, 17-23 May 2010, Valletta, Malta; pp.2182-2188. [PDF, 432KB]

(2010) Petr Knoth, Trevor Collins, Elsa Sklavounou, & Zdenek Zdrahal: Facilitating cross-language retrieval and machine translation by multilingual domain ontologies. [LREC 2010] Workshop on Supporting eLearning with Language Resources  and Semantic Data, Valletta, Malta, 22 May 2010; 42 slides. [PDF of PPT, 440KB]

(2010) William D.Lewis, Chris Wendt, & David Bullock: Achieving domain specificity in SMT without overt siloing. LREC 2010: proceedings of the  seventh international conference on Language Resources and Evaluation, 17-23 May 2010, Valletta, Malta; pp.2878-2883. [PDF, 328KB]

(2010) Jan Niehues & Alex Waibel: Domain adaptation in statistical machine translation using factored translation models. EAMT 2010: Proceedings of the 14th Annual conference of the European Association for Machine Translation, 27-28 May 2010, Saint-Raphaël, France. Proceedings ed.Viggo Hansen and François Yvon; 7pp. [PDF, 581KB]; presentation: 27 slides [PDF, 887KB]

(2010) Mohammad Taher Pilevar & Heshaam Faili: PersianSMT: a first attempt to English-Persian statistical machine translation. JADT 2010: 10th International Conference on Statistical Analysis of Textual Data, 9-11 juin 2010, Rome, Italie; pp.1101-1111. [PDF, 945KB]

(2010) Germán Sanchis-Trilles & Mauro Cettolo: Online language model adaptation via n-gram mixtures for statistical machine translation. EAMT 2010: Proceedings of the 14th Annual conference of the European Association for Machine Translation, 27-28 May 2010, Saint-Raphaël, France. Proceedings ed.Viggo Hansen and François Yvon; 8pp. [PDF, 605KB]; presentation: 22 slides [PDF, 241KB]

(2010) Germán Sanchis-Trilles, Jesús Andrés-Ferrer, Guillem Gascó, Jesús González-Rubio, Pascual Martínez-Gómez, Martha-Alicia Rocha, Joan-Andreu Sánchez, & Francisco Casacuberta: UPV-PRHLT English-Spanish system for WMT10. ACL 2010: Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR. Proceedings of the workshop, 15-16 July 2010, Uppsala University, Uppsala, Sweden; pp. 172-176. [PDF, 90KB]

(2010) Kashif Shah, Loďc Barrault, & Holger Schwenk: Translation model adaptation by resampling. ACL 2010: Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR. Proceedings of the workshop, 15-16 July 2010, Uppsala University, Uppsala, Sweden; pp. 392-399. [PDF, 176KB]

(2010) Jörg Tiedemann: Context adaptation in statistical machine translation using models with exponentially decaying cache. Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing, ACL 2010, Uppsala, Sweden, 15 July 2010; pp.8-15. [PDF, 148KB]

(2010) Jörg Tiedemann: To cache or not to cache? Experiments with adaptive models in statistical machine translation.  ACL 2010: Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR. Proceedings of the workshop, 15-16 July 2010, Uppsala University, Uppsala, Sweden; pp. 189-194. [PDF, 127KB]

(2010) Bin Wei & Christopher Pal: Cross lingual adaptation: an experiment in sentiment classifications. ACL 2010: the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, July 11-16, 2010: Proceedings of the Conference Short Papers; pp.258-262. [PDF, 109KB]

Filtering see Cleaning and filtering

Knowledge representation see Ontologies

Knowledge resources

(2013) Timmy Oumai Wang & Mark Shuttleworth: Knowledge management issues in the workflow of translation memory systems. [Aslib 2013] Translating and the Computer 35, 28-29 November 2013, etc.venues, Paddington, London, UK; 14pp. [PDF, 416KB]; presentation, 18 slides [PDF of PPT, 263KB]

Language resources (see also Bilingual corpora, Lexical resources, Multilingual corpora, Scarce resources)

(2015) Andrzej Zydroń: FALCON: building the localization web. Proceedings of the 37th Conference Translating and the Computer, London, November 26-27, 2015; pp.33-36. [PDF, 125KB]

(2014) Michael Carl, Mercedes García Martínez, Bartolomé Mesa-Lao, & Nancy Underwood: CFT13: a resource for research into the post-editing process. LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.1757-1764. [PDF, 1011KB]

(2014) Grégoire Détrez, Víctor M.Sánchez-Cartagena, & Aarne Ranta: Sharing resources between free/open-source rule-based machine translation systems: Grammatical Framework and Apertium. LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.4394-4400. [PDF, 146KB]

(2014) Nizar Ghoula, Jacques Guyot, & Gilles Falquet: Terminology management revisited. Translating and the Computer 36: proceedings. Asling: International Society for Advancement in Language Technology, 27-28 November 2014; pp.56-65. [PDF, 916KB]

(2014) Jorge Gracia, Elena Montiel-Ponsoda, Daniel Vila-Suero, & Guadalupe Aguado-de-Cea: Enabling language resources to expose translations as linked data on the Web. LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.409-413. [PDF, 346KB]

(2014) Stelios Piperidis, Harris Papageorgiou, Christian Spurk, Georg Rehm, Khalid Choukri, Olivier Hamon, Nicoletta Calzolari, Riccardo del Gratta, Bernardo Magnini, & Christian Girardi: META-SHARE: one year after. LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.1208-1211. [PDF, 861KB]

(2014) Georg Rehm et al.: The strategic impact of META-NET on the regional, national and intenational level. LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.1517-1524. [PDF, 184KB]

(2013) Núria Bel, Marc Poch & Antonio Toral: PANACEA: platform for automatic, normalised annotation and cost-effective acquisition of language resources for human language technologies. Proceedings of the XIV Machine Translation Summit, Nice, September 2-6, 2013; ed. K.Sima’an, M.L.Forcada, D.Grasmick, H.Depraetere, A.Way; p.435. [PDF, 162KB]

(2013) Olzhas Makhambetov, Aibek Makazhanov, Zhandos Yessenbayev, Bakhyt Matkarimov, Islam Sabyrgaliyev, & Anuar Sharafudinov: Assembling the Kazakh language corpus. [EMNLP 2013] Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, Washington, USA, 18-21 October 2013; pp.1022-1031. [PDF, 184KB]

(2013) Marc Poch & Antonio Toral: PANACEA tutorial. Proceedings of the XIV Machine Translation Summit, Nice, September 3, 2013; 39 slides. [PDF of PPT, 1538KB]

(2013) Raivis Skadiņš, Mārcis Pinnis, Tatiana Gornostay, & Andrejs Vasiļjevs: Application of online terminology services in statistical machine translation. Proceedings of the XIV Machine Translation Summit, Nice, September 2-6, 2013; ed. K.Sima’an, M.L.Forcada, D.Grasmick, H.Depraetere, A.Way; pp.281-286. [PDF, 573KB]

(2012) Núria Bel, Vassilis Papavasiliou, Prokopis Prokopidis, Antonio Toral, & Victoria Arranz: Mining and exploiting domain-specific corpora in the PANACEA platform. [BUCC 2012] The 5th Workshop on Building and Using Comparable Corpora: “Language Resources for Machine Translation in Less-Resourced Languages and Domains”,  LREC 2012 Workshop, 26 May 2012, Istanbul, Turkey; pp.24-26. [PDF, 469KB]

(2012) Antonio Branco: Language technology for Portuguese: progress and prospects [abstract].  In: Crosslingual Language Technology in service of an integrated multilingual Europe, 4-5 May 2012, University of Hamburg, Germany; pp.31-32. [PDF]

(2012) António Branco: METANET4U: contribution to META-SHARE. META-FORUM, Brussels, June 19-21, 2012; 8 slides. [PDF of PPT, 150KB]

(2012) Nicola Cancedda: Private access to phrase tables for statistical machine translation. [ACL 2012] Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju, Republic of Korea, 8-14 July 2012, Short Papers; pp.23-27. [PDF, 135KB]

(2012) Mauro Cettolo, Christian Girardi, & Marcello Federico: WIT3: web inventory of transcribed and translated talks. EAMT 2012: Proceedings of the 16th Annual Conference of the European Association for Machine Translation, Trento, Italy, May 28-30 2012, ed. Mauro Cettolo, Marcello Federico, Lucia Specia, Andy Way; pp.261-268. [PDF, 197KB]

(2012) Dan Cristea & Ionuţ Cristian Pistol: Multilingual linguistic workflows [abstract].  In: Crosslingual Language Technology in service of an integrated multilingual Europe, 4-5 May 2012, University of Hamburg, Germany; pp.28-30. [PDF]

(2012) Christian Federmann, Ioanna Giannopoulou, Christian Girardi, Olivier Hamon, Dimitris Mavroeidis, Salvatore Minutoli, & Marc Schröder: META-SHARE v2: an open network of repositories for language resources including data and tools.  LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.3300-3303. [PDF, 588KB]

(2012) Darja Fišer: Language resources and tools for semantically enhanced processing of Slovene [abstract]. In: Crosslingual Language Technology in service of an integrated multilingual Europe, 4-5 May 2012, University of Hamburg, Germany; p.22. [PDF]

(2012) Monica Gavrila, Walther v.Hahn, & Cristina Vertan: Same domain different discourse style: a case study on language resources for data-driven machine translation.  LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.3441-3446. [PDF, 344KB]

(2012) Maria Gavrilidou, Penny Labropoulou, Elina Desipri, Stelios Piperidis, Haris Papageorgiou, Monica Monachini, Francesca Frontini, Thierry Declerck, Gil Francopoulo, Victoria Arranz, & Valerie Mapelli: The META-SHARE metadata scheme for the description of language resources.  LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.1090-1097. [PDF, 710KB]

(2012) Maria Gavrilidou: Using the META-SHARE model implementation for describing and documenting language resources. Tutorial at: LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; 4pp. [PDF, 49KB]

(2012) Masood Ghayoomi: From grammar rule extraction to treebanking: a bootstrapping approach. LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.1912-1919. [PDF, 1203KB]

(2012) Nancy Ide: MultiMASC: an open linguistic infrastructure for language research.  [BUCC 2012] The 5th Workshop on Building and Using Comparable Corpora: “Language Resources for Machine Translation in Less-Resourced Languages and Domains”,  LREC 2012 Workshop, 26 May 2012, Istanbul, Turkey; pp.42-48. [PDF, 515KB]

(2012) Judith Klavans: Government catalog of language resources (GCLR) [abstract]. AMTA-2012: the Tenth Biennial Conference of the Association for Machine Translation in the Americas. Proceedings, San Diego, CA, October 28 – November 1, 2012; 1p. [PDF, 8KB]

(2012) Xuansong Li, Stephanie M.Strassel, Heng Ji, Kira Griffitt, & Joe Ellis: Linguistic resources for entity linking evaluation: from monolingual to cross-lingual.  LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.3098-3105. [PDF, 537KB]

(2012) Joseph Mariani: Language resources and evaluation for a multilingual Europe [abstract]. In: Crosslingual Language Technology in service of an integrated multilingual Europe, 4-5 May 2012, University of Hamburg, Germany; pp. 3-5. [PDF]

(2012) Elaine Marsh: Return on investment for government human language technology systems [abstract]. AMTA-2012: the Tenth Biennial Conference of the Association for Machine Translation in the Americas. Proceedings, San Diego, CA, October 28 – November 1, 2012; 1p. [PDF, 86KB]

(2012) Tomás Pariente: Text analytics and big data. META-FORUM, Brussels, June 19-21, 2012; 23 slides. [PDF of PPT, 4103KB]

(2012) Bolette Sandford Pedersen: The META-NET language white paper series: overview and key results. META-FORUM, Brussels, June 19-21, 2012; 13 slides. [PDF of PPT, 3758KB]

(2012) Stelios Piperidis: META-SHARE: the open exchange platform: overview – current state – towards v3.0. META-FORUM, Brussels, June 19-21, 2012; 28 slides. [PDF of PPT, 2925KB]

(2012) Marc Poch, Antonio Toral, & Núria Bel: Language resources factory: case study on the acquisition of translation memories.  [EACL 2012] Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics, Avignon, France, April 23-27, 2012; pp. 1-5. [PDF, 93KB]

(2012) Adam Przepiórkowski: Polish language resources and tools: towards multilinguality [abstract]. In: Crosslingual Language Technology in service of an integrated multilingual Europe, 4-5 May 2012, University of Hamburg, Germany; p.31. [PDF]

(2012) Mike Rosner & Jan Joachimsen: Maltese: mixed language and multilingual technology [abstract]. In: Crosslingual Language Technology in service of an integrated multilingual Europe, 4-5 May 2012, University of Hamburg, Germany; pp.27-28. [PDF]

(2012) Rahma Sellami, Fatiha Sadat, & Lamia Hadrich Belguith: Exploiting Wikipedia as a knowledge base for the extraction of linguistic resources: application on Arabic-French comparable corpora and bilingual lexicons. AMTA-2012: Fourth workshop on computational approaches to Arabic script-based languages. Proceedings, San Diego, November 1, 2012; pp.72-79. [PDF, 889KB]

(2012) Ralf Steinberger, Andreas Eisele, Szymon Klocek, Spyridon Pilos, & Patrick Schlüter: DGT-TM: a freely available translation  memory in 22 languages.  LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.454-459. [PDF, 471KB]

(2012) Jörg Tiedemann, Dorte Haltrup Hansen, Lene Offersgaard, Sussi Olsen, & Matthias Zumpe: A distributed resource repository for cloud-based machine translation.  LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.2207,2213. [PDF, 441KB]

(2012) Tamás Váradi: CESAR: comprehensive language resources and tools for Europe. HLT Days 27-28 September 2012, Warsaw; 40 slides [PDF of PPT, 2119KB]

(2012) Tamás Váradi: The contribution of CESAR to META-SHARE. META-FORUM, Brussels, June 19-21, 2012; 17 slides. [PDF of PPT, 1817KB]

(2012) Tamás Váradi & Marko Tadić: Central and South-East European resources in META-SHARE. Proceedings of COLING 2012: Demonstration Papers, Mumbai, December 2012; pp. 431-437. [PDF, 1205KB]

(2012) Andrejs Vasiļjevs: META-NORD overview. META-FORUM, Brussels, June 19-21, 2012; 36 slides. [PDF of PPT, 8607KB]

(2012) Yunqing Xia, Guoyu Tang, Peng Jin, & Xia Yang: CLTC: a Chinese-English cross-lingual topic corpus.  LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.532-537. [PDF, 955KB]

(2012) Andrius Utka: Multilingual resources and their application for the Lithuanian language [abstract]. In: Crosslingual Language Technology in service of an integrated multilingual Europe, 4-5 May 2012, University of Hamburg, Germany; p.23. [PDF]

(2012) Andrejs Vasiļjevs, Markus Forsberg, Tatiana Gornostay, Dorte H.Hansen, Kristin M.Jóhannsdóttir, Krister Lindén, Gunn I.Lyse, Lene Offersgaard, Ville Oksanen, Sussi Olsen, Bolette S.Pedersen, Eiríkur Rögnvaldsson, Roberts Rozis, Inguna Skadiņa, & Koenraad De Smet: Creation of an open shared language resource repository in the Nordic and Baltic countries.  LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.1076-1083. [PDF, 1336KB]

(2012) Andrejs Vasiļjevs, Tatiana Gornostay, Inguna Skadiņa, Daiga Deksne, Raivis Skadiņš, & Mārcis Pinnis: Recent advances in the development and sharing of language resources and tools for Latvian [abstract].  In: Crosslingual Language Technology in service of an integrated multilingual Europe, 4-5 May 2012, University of Hamburg, Germany; pp.24-25. [PDF]

(2012) CESAR: Central and Southeast European resources. [Project paper at] EAMT 2012: Proceedings of the 16th Annual Conference of the European Association for Machine Translation, Trento, Italy, May 28-30 2012, ed. Mauro Cettolo, Marcello Federico, Lucia Specia, Andy Way; p.84. [PDF, 113KB]

(2012) PANACEA (Platform for Automatic, Normalised Annotation and Cost-Effective Acquisition of Language Resources for Human Language Technologies). [Project paper at] EAMT 2012: Proceedings of the 16th Annual Conference of the European Association for Machine Translation, Trento, Italy, May 28-30 2012, ed. Mauro Cettolo, Marcello Federico, Lucia Specia, Andy Way; p.90. [PDF, 87KB]

(2011) Pushpak Bhattacharyya: IndoWordNet and multilingual resource conscious word sense disambiguation. Proceedings of the 8th international NLPSC workshop. Special theme: Human-machine interaction in translation, Copenhagen Business School, 20-21 August 2011; ed.Bernadette Sharp, Michael Zock, Michael Carl, Arnt Lykke Jakobsen (Copenhagen Studies in Language 41), Frederiksberg: Samfundslitteratur, 2011; pp.29-30. [PDF, 677KB]

(2011) Svetla Koeva: Furthering natural language processing in Bulgaria. META-FORUM 2011: Solutions for multilingual Europe, June 27/28 2011, Hotel Marriott, Budapest, Hungary; 40 slides [PDF of PPT, 3276KB]

(2011) Luís Marujo, Nuno Grazina, Tiago Luís, Wang Ling, Luísa Coheur, & Isabel Trancoso: BP2EP – adaptation of Brazilian Portuguese texts to European Portuguese. [EAMT 2011]: proceedings of the 15th conference of the European Association for Machine Translation, 30-31 May 2011, Leuven, Belgium; eds. Mikel L.Forcada, Heidi Depraetere, Vincent Vandeghinste; pp.129-136. [PDF, 352KB]

(2011) Maciej Ogrodniczuk & Adam Przepiórkowski: Polish LRTs: CESAR’s story.  META-FORUM 2011: Solutions for multilingual Europe, June 27/28 2011, Hotel Marriott, Budapest, Hungary; 14 slides [PDF of PPT, 324KB]

(2011) Stelios Piperidis: META-SHARE: an open resource exchange infrastructure for stimulating research and innovation.  META-FORUM 2011: Solutions for multilingual Europe, June 27/28 2011, Hotel Marriott, Budapest, Hungary; 37 slides [PDF of PPT, 5302KB]

(2011) Marko Tadić: The CESAR project: enabling LRT for 70m+ speakers.  META-FORUM 2011: Solutions for multilingual Europe, June 27/28 2011, Hotel Marriott, Budapest, Hungary; 24 slides [PDF of PPT, 810KB]

(2011) Paul Thompson, Yoshinobu Kano, John McNaught, Steve Pettifer, Teresa Attwood, John Keane, & Sophia Ananiadou: Promoting interoperability of resources in META-SHARE. [IJCNLP 2011] Proceedings of Workshop on Language Resources, Technology and Services in the Sharing Paradigm, Chiang Mai, Thailand, November 12, 2011; pp.50-58. [PDF, 326KB]

(2011) Antonio Toral, Pavel Pecina, Andy Way, & Marc Poch: Towards a user-friendly webservice architecture for statistical machine translation in the PANACEA project. [EAMT 2011]: proceedings of the 15th conference of the European Association for Machine Translation, 30-31 May 2011, Leuven, Belgium; eds. Mikel L.Forcada, Heidi Depraetere, Vincent Vandeghinste; pp.63-70. [PDF, 511KB]

(2011) Tamás Váradi: Hungarian language technology - from platform to alliance.  META-FORUM 2011: Solutions for multilingual Europe, June 27/28 2011, Hotel Marriott, Budapest, Hungary; 19 slides [PDF of PPT, 1069KB]

 (2011)  LIWP – EU language industry web platform. (European Machine Translation Projects.) [EAMT 2011]: proceedings of the 15th conference of the European Association for Machine Translation, 30-31 May 2011, Leuven, Belgium; eds. Mikel L.Forcada, Heidi Depraetere, Vincent Vandeghinste; p.339. [PDF, 40KB]

(2011) PANACEA (Platform for automatic, normalised annotation and cost-effective acquisition of language resources for human language technologies). (European Machine Translation Projects.) [EAMT 2011]: proceedings of the 15th conference of the European Association for Machine Translation, 30-31 May 2011, Leuven, Belgium; eds. Mikel L.Forcada, Heidi Depraetere, Vincent Vandeghinste; p.349. [PDF, 106KB]

(2010) Gilles Adda & Joseph Mariani: Language resources & Amazon Mechanical Turk: ethical, legal and other issues. LREC 2010: Le gal Issues for Sharing Language Resources - LISLR2010 Workshop, 17 May 2010, Valletta, Malta; 21slides. [PDF of PPT, 244KB]

(2010) Mossab Al-Hunaity: Utilizing web service technology to create Danish Arabic language resources. LREC 2010: Web Services and Processing Pipelines in HLT - WSPP2010 Workshop, 17 May 2010, Valletta, Malta; 20 slides. [PDF of PPT, 505KB]

 (2010) Lynne Bowker & Elizabeth Marshman: Toward a model of active and situated learning in the teaching of computer-aided translation: introducing the CERTT project [abstract].  Journal of Translation Studies 13 (1-2), Special issue: The teaching of computer-aided translation, ed. Chan Sin-wai; pp. 199-226.

(2010) Arif Bramantoro, Ulrich Schäfer, & Toru Ishida: Towards an integrated architecture for composite language services and multiple linguistic processing components. LREC 2010: proceedings of the  seventh international conference on Language Resources and Evaluation, 17-23 May 2010, Valletta, Malta; pp.3506-3511. [PDF, 375KB]

(2010) Jennifer DeCamp: Language Technology Resource Center. EAMT 2010: Proceedings of the 14th Annual conference of the European Association for Machine Translation, 27-28 May 2010, Saint-Raphaël, France. Proceedings ed.Viggo Hansen and François Yvon; 7pp. [PDF, 718KB]; presentation: 9 slides [PDF, 341KB]

(2010) Alice Dijkstra: Dutch/Flemish HLT cooperation.  Translingual Europe 2010, Hotel Maritim, Berlin, Germany, Monday June 7th 2010, Panel on Small Languages; 9pp. [PDF, 440KB]

(2010) Sabine Kirchmeier-Andersen: Linguistic diversity and language change – future challenges for MT. Translingual Europe 2010, Hotel Maritim, Berlin, Germany, Monday June 7th 2010; 34pp. [PDF, 3094KB]

 (2010) Swaran Lata: Human language computing in Indian languages – a holistic perspective. META-FORUM 2010: Challenges for multilingual Europe, November 17/18 2010, Brussels, Belgium; 74 slides [PDF of PPT, 7783KB]

 (2010) Rūta Marcinkevičienė & Daiva Vitkutė-Adžgauskienė: Developing the human language technology infrastructure in Lithuania.  Human Language Technologies—The Baltic Perspective, 4th International Conference, Riga, Latvia, October 7-8, 2010; 24 slides [PDF of PPT, 6452KB]

 (2010) Einar Meister, Jaak Vilo & Neeme Kahusk: National programme for Estonian language technology: a pre-final summary. Human Language Technologies—The Baltic Perspective, 4th International Conference ,Riga, Latvia, October 7-8, 2010; 24 slides [PDF of PPT, 543KB]

 (2010) Robert Munro, Steven Bethard, Victor Kuperman, Vicky Tzuyin Lai, Robin Melnick, Christopher Potts, Tyler Schnoebelen, & Harry Tily: Crowdsourcing and language studies: the new generation of linguistic data. Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, Los Angeles, CA, June 2010; pp.122-130. [PDF, 310KB]

(2010) Stelios Piperidis: META-SHARE: the open resource exchange facility. META-FORUM 2010: Challenges for multilingual Europe, November 17/18 2010, Brussels, Belgium; 32 slides [PDF of PPT, 6471KB]

(2010) Marc Poch: PANACEA – platform for the automatic, normalized annotation and cost-effective acquisition of language resources. (European Community supported project.) Presented at EAMT 2010: 14th Annual conference of the European Association for Machine Translation, 28 May 2010, Saint-Raphaël, France. 12 slides. [PDF, 294KB]

 (2010) Georg Rehm: META-NET and META-SHARE: an overview. Human Language Technologies – the Baltic Perpective, 4th International Conference, Riga, Latvia, October 8, 2010; 52 slides [PDF of PPT, 9770KB]

(2010) Mike Rosner: Maltese on the brink. Translingual Europe 2010, Hotel Maritim, Berlin, Germany, Monday June 7th 2010; 17pp. [PDF, 1286KB]

(2010) Víctor M.Sánchez-Cartagena & Juan Antonio Pérez-Ortiz: Tradubi: open-source social translation for the Apertium machine translation platform. Fourth Machine Translation Marathon “Open Source Tools for Machine Translation”, 25-30 January, Dublin, Ireland; Prague Bulletin of Mathematical Linguistics, no.93, January 2010; pp.47-56. [PDF, 223KB]

 (2010) Iguna Skadiņa, Ilze Auziņa, Normunds Grūzītis, Kristīna Levāne-Petrova, Gunta Nešpore, Raivis Skadiņš, & Andrejs Vasiļjevs: Language resources and technology for humanities in Latvia 2004-2010.  Human Language Technologies—The Baltic Perspective, 4th International Conference, Riga, Latvia, October 7-8, 2010; 30 slides [PDF of PPT, 1204KB]

(2010) Zhiyi Song, Stephanie Strassel, Gary Krug, & Kazuaki Maeda: Enhanced infrastructure for creation and collection of translation resources. LREC 2010: proceedings of the  seventh international conference on Language Resources and Evaluation, 17-23 May 2010, Valletta, Malta; pp.1309-1314. [PDF, 324KB]

(2010) Andrejs Vasiljevs: Big solutions for small languages. Translingual Europe 2010, Hotel Maritim, Berlin, Germany, Monday June 7th 2010, Panel on Small Languages; 7pp. [PDF, 2236KB]

(2010) Karthik Visweswariah, Vijil Chenthamarakshan, & Nandakishore Kambhatla: Urdu and Hindi: translation and sharing of linguistic resources. Coling 2010: 23rd International Conference on Computational Linguistics, 23-27 August 2010, Beijing International Convention Center, Beijing, China, Posters volume; pp.1283-1291. [PDF, 217KB]

(2010) John Hendrik Weitzmann & Prodromos Tsiavos: Language resources and legal issues: problems and solutions for basic and industrial research.  META-FORUM 2010: Challenges for multilingual Europe, November 17/18 2010, Brussels, Belgium; 39 slides [PDF of PPT, 1428KB]

Lexical resources and lexical acquisition

(2015) Oliver Adams, Graham Neubig, Trevor Cohn & Steven Bird: Inducing bilingual lexicons from small quantities of sentence-aligned phonemic transcriptions. [IWSLT 2015] Proceedings of the International Workshop on Spoken Language Translation, December 3-4, 2015, Da Nang, Vietnam; pp.248-255. [PDF, 2.8MB]

(2015) Gerard de Melo: Wiktionary-based word embeddings. MT Summit XV, October 30 – November 3, 2015, Miami, Florida, USA. Proceedings of MT Summit XV: vol.1: MT Researchers’ Track; pp.346-359. [PDF, 709KB] 

(2014) Judit Ács: Pivot-based multilingual dictionary building using Wiktionary.  LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.1938-1942. [PDF, 81KB]

(2014) Krasimir Angelov: Bootstrapping open-source English-Bulgarian computational dictionary. LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.1018-1023. [PDF, 158KB]

(2014) Anabela Barreiro, Fernando Batista, Ricardo Ribeiro, Helena Moniz, & Isabel Trancoso: OpenLogos semantico-syntactic knowledge-rich bilingual dictionaries. LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.3774-3781. [PDF, 163KB]

(2014) Olga Beregovaya & David Landan: Source content analysis and training data selection impact on an MT-driven program design. Proceedings of the 17th annual conference of the European Association for Machine Translation, EAMT 2014, Dubrovnik, Croatia, 16th-18th June 2014, edited by Marko Tadić, Philipp Koehn, Johann Roturier, Andy Way; p.59. [PDF, 303KB]

(2014) Kurt Eberle: AutoLearn<Word>. Translating and the Computer 36: proceedings. Asling: International Society for Advancement in Language Technology, 27-28 November 2014; pp.145-154. [PDF, 446KB]

 (2014) Maud Ehrmann, Francesco Cecconi, Daniele Vannella, John McCrae, Philipp Cimiano, & Roberto Navigli: Representing multilingual data as linked data: the case of BabelNet 2.0.  LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.401-408. [PDF, 674KB]

(2014) Miquel Esplŕ-Gomis, Víctor M.Sánchez-Cartegna, Felipe Sánchez-Martínez, Rafael C.Carrasco, Mikel L.Forcada, & Juan Antonio Pérez-Ortiz: An efficient method to assist non-expert users in extending dictionaries by assigning stems and inflectional paradigms to unknknown words. Proceedings of the 17th annual conference of the European Association for Machine Translation, EAMT 2014, Dubrovnik, Croatia, 16th-18th June 2014, edited by Marko Tadić, Philipp Koehn, Johann Roturier, Andy Way; pp.19-26. [PDF, 311KB]

(2014) Mozhgan Ghassemiazghandi & Tengku Sepora Tengku Mahadi: Losses and gains in computer-assisted translation: some remarks on online translation of English to Malay. Translating and the Computer 36: proceedings. Asling: International Society for Advancement in Language Technology, 27-28 November 2014; pp.194-201. [PDF, 160KB]

(2014) B.R.Laranjeira, V.P.Moreira, A.Villavicencio, C.Ramisch, & M.J.Finatto: Comparing the quality of focused crawlers and of the translation resources obtained from them. LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.3572-3578. [PDF, 803KB]

(2014) John Richardson, Toshiaki Nakazawa, & Sadao Kurohashi: Bilingual dictionary construction with transliteration filtering. LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.1013-1017. [PDF, 164KB]

(2014) Michael Rosner & Kurt Sultana: Automatic methods for the extension of a bilingual dictionary using comparable corpora. LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.3790-3797. [PDF, 195KB]

(2014) Yves Scherrer & Benoît Sagot: A language-independent and fully unsupervised approach t lexicon induction and part-of-speech tagging for closely related languages. LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.502-508. [PDF, 184KB]

(2013) Judit Ács, Katalin Pajkossy, & András Kornai: Building basic vocabulary across 40 languages. Proceedings of the 6th Workshop on Building and Using Comparable Corpora, Sofia, Bulgaria, August 8, 2013; pp.52-58. [PDF, 381KB]

(2013) Mihael Arcan & Paul Buitelaar: MONNET: multilingual ontologies for networked knowledge. Proceedings of the XIV Machine Translation Summit, Nice, September 2-6, 2013; ed. K.Sima’an, M.L.Forcada, D.Grasmick, H.Depraetere, A.Way; p.431. [PDF, 241KB]

(2013) Brijesh Bhatt, Lahari Poddar, & Pushpak Bhattacharyya: IndoNet: a multilingual lexical knowledge network for Indian languages. ACL-2013: Proceedings of the 51st Meeting of the Association for Computational Linguistics, Short papers, Sofia, Bulgaria, August 4-9 2013; pp.268-272. [PDF, 364KB]

(2013) Dhouha Bouamor, Nasredine Semmar, & Pierre Zweigenbaum: Building specialized bilingual lexicons using word sense disambiguation. International Joint Conference on Natural Language Processing, Nagoya, Japan, 14-18 October 2013; pp.952-956. [PDF, 117KB]

(2013) Dhouha Bouamor, Adrian Popescu, Nasredine Semmar, & Pierre Zweigenbaum: Building specialized bilingual lexicons using large-scale background knowledge. [EMNLP 2013] Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, Washington, USA, 18-21 October 2013; pp. 479-489. [PDF, 246KB]

(2013) Dhouha Bouamor, Nasredine Semmar, & Pierre Zweigenbaum: Context vector disambiguation for bilingual lexicon extraction from comparable corpora.  ACL-2013: Proceedings of the 51st Meeting of the Association for Computational Linguistics, Short papers, Sofia, Bulgaria, August 4-9 2013; pp.759-764. [PDF, 199KB]

(2013) Dhouha Bouamor, Nasredine Semmar, & Pierre Zweigenbaum: Towards a generic approach for bilingual lexicon extraction from comparable corpora. Proceedings of the XIV Machine Translation Summit, Nice, September 2-6, 2013; ed. K.Sima’an, M.L.Forcada, D.Grasmick, H.Depraetere, A.Way; pp. 143-150. [PDF, 531KB]

(2013) Rahma Boujelbane, Mariem Ellouze Khemekhem, Siwar BenAyed, & Lamia Hadrich Belguith: Building bilingual lexicon to create dialect Tunisian corpora and adapt language model. Proceedings of the Second Workshop on Hybrid Approaches to Translation, Sofia, Bulgaria, August 8, 2013; pp.88-93. [PDF, 454KB]

(2013) Silvana Hartmann & Iryna Gurevych: FrameNet on the way to Babel: creating a bilingual FrameNet using Wiktionary as interlingual connection.  ACL-2013: Proceedings of the 51st Meeting of the Association for Computational Linguistics, Sofia, Bulgaria, August 4-9 2013; pp.1363-1373. [PDF, 271KB]

(2013) Amir Hazem & Emmanuel Morin: A comparison of smoothing techniques for bilingual lexicon extraction from comparable corpora.  Proceedings of the 6th Workshop on Building and Using Comparable Corpora, Sofia, Bulgaria, August 8, 2013; pp.24-33. [PDF, 546KB]

(2013) Amir Hazem & Emmanuel Morin: Word co-occurrence counts prediction for bilingual terminology extraction from comparable corpora. International Joint Conference on Natural Language Processing, Nagoya, Japan, 14-18 October 2013; pp.1392-1400. [PDF, 162KB]

(2013) Tsutomu Hirao, Tomoharu Iwata, & Masaaki Nagata: Latent semantic matching: application to cross-language text categorization without alignment information.  ACL-2013: Proceedings of the 51st Meeting of the Association for Computational Linguistics, Short papers, Sofia, Bulgaria, August 4-9 2013; pp.212-216. [PDF, 846KB]

(2013) Ann Irvine & Chris Callison-Burch: Supervised bilingual lexicon induction with multiple monolingual signals. [NAACL-HLT 2013] The 2013 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 9-14 June 2013, Atlanta, Georgia; pp.518-523. [PDF, 511KB]

(2013) Hong-Seok Kwon, Hyeong-Won Seo, & Jae-Hoon Kim: Bilingual lexicon extraction via pivot language and word alignment tool.  Proceedings of the 6th Workshop on Building and Using Comparable Corpora, Sofia, Bulgaria, August 8, 2013; pp.11-15. [PDF, 666KB]

(2013) Khang Nhut Lam & Jugal Kalita: Creating reverse bilingual dictionaries. [NAACL-HLT 2013] The 2013 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 9-14 June 2013, Atlanta, Georgia; pp.524-528. [PDF, 138KB]

(2013) Lian Tze Lim, Lay-Ki Soon, Tek Yong Lim, Enya Kong Tang, & Bali Ranaivo-Malançon: Context-dependent multilingual lexical lookup for under-resourced languages.  ACL-2013: Proceedings of the 51st Meeting of the Association for Computational Linguistics, Short papers, Sofia, Bulgaria, August 4-9 2013; pp.294-299. [PDF, 373KB]

(2013) Xiaodong Liu, Kevin Duh, & Yuji Matsumoto: Topic models + word alignment = a flexible framework for extracting bilingual dictionary from comparable corpus. Proceedings of the Seventeenth Conference on Computational Natural Language Learning, Sofia, Bulgaria, 8-9 August 2013; pp.212-221. [PDF, 487KB]

(2013) Michael Matuschek, Christian M.Meyer, & Iryna Gurevych: Multilingual knowledge in aligned Wiktionary and OmegaWiki for translation applications. Translation: Computation, Corpora,  Cognition 3 (1), June 2013; pp.87-118. [PDF, 2898KB]

(2013) Vassilis Papavassiliou, Prokopis Prokopidis, & Gregor Thurmair: A modular open-source focused crawler for mining monolingual and bilingual corpora from the web.  Proceedings of the 6th Workshop on Building and Using Comparable Corpora, Sofia, Bulgaria, August 8, 2013; pp.43-51. [PDF, 663KB]

(2013) Magdalena Plamadă & Martin Volk: Mining for domain-specific text from Wikipedia.  Proceedings of the 6th Workshop on Building and Using Comparable Corpora, Sofia, Bulgaria, August 8, 2013; pp.112-120. [PDF, 382KB]

(2013) Rahma Sellami, Fatiha Sadat, & Lamia Hadrich Belguith: Exploiting multiple resources for Japanese to English patent translation. [MT Summit XIV] Proceedings of the 5th Workshop on Patent Translation, Nice, September 2, 2013; pp.34-39. [PDF, 1300KB]

(2013) Jason R.Smith, Herve Saint-Amand, Magdalena Plamada, Philipp Koehn, Chris Callison-Burch, & Adam Lopez: Dirt cheap web-scale parallel text from the Common Crawl.  ACL-2013: Proceedings of the 51st Meeting of the Association for Computational Linguistics, Sofia, Bulgaria, August 4-9 2013; pp.1374-1383. [PDF, 179KB]

(2013) Itsuki Toyota, Zi Long, Lijuan Dong, Takehito Utsuro, & Mikio Yamamoto: Compositional translation of technical terms by integrating patent families as a parallel corpus and a comparable corpus. [MT Summit XIV] Proceedings of the 5th Workshop on Patent Translation, Nice, September 2, 2013; pp.16-23. [PDF, 2058KB]

(2013) Takashi Tsunakawa, Yosuke Yamamoto, & Hiroyuki Kaji: Improving calculation of contextual similarity for constructing a bilingual dictionary via a third language. International Joint Conference on Natural Language Processing, Nagoya, Japan, 14-18 October 2013; pp.1057-1061. [PDF, 689KB]

(2013) Ivan Vulić & Marie-Francine Moens: A study on bootstrapping bilingual vector spaces from non-parallel data (and nothing else). [EMNLP 2013] Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, Washington, USA, 18-21 October 2013; pp.1044-1054. [PDF, 261KB]

(2012) Dhouha Bouamor, Nasredine Semmar, & Pierre Zweigenbeaum: Automatic construction of a multi-word expressions bilingual lexicon: a statistical machine translation evaluation perspective. COLING 2012: Proceedings of the 3rd Workshop on Cognitive Aspects of the Lexicon (CogALex-III), Mumbai, December 2012; pp.95-107. [PDF, 210KB]

(2012) Valeria Caruso & Anna De Meo: What else can databases do to assist translators? Illustrating a rated inventory of Web dictionaries. [Aslib 2012] Translating and the Computer 34, 29-30 November 2012, One Birdcage Walk, London, UK; 12pp. [PDF, 848KB], presentation by Martin Thomas: 50 slides [PDF, 3336KB]

(2012) Estelle Delpech, Béatrice Daille, Emmanuel Morin, & Claire Lemaire: Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking. Proceedings of COLING 2012: Technical Papers, Mumbai, December 2012; pp.745-761. [PDF, 319KB]

(2012) Douwe Gelling & Trevor Cohn: Using senses in HMM word alignment. NAACL-HLT Workshop on the Induction of Linguistic Structure, Montréal, Canada, June 3-8, 2012; pp.39-46. [PDF, 156KB]

(2012) Amir Hazem & Emmanuel Morin: ICA for bilingual lexicon extraction from comparable corpora.  [BUCC 2012] The 5th Workshop on Building and Using Comparable Corpora: “Language Resources for Machine Translation in Less-Resourced Languages and Domains”,  LREC 2012 Workshop, 26 May 2012, Istanbul, Turkey; pp.126-133. [PDF, 474KB]

(2012) Elena Irimia: Experimenting with extracting lexical dictionaries from comparable corpora for English-Romanian language pair.  [BUCC 2012] The 5th Workshop on Building and Using Comparable Corpora: “Language Resources for Machine Translation in Less-Resourced Languages and Domains”,  LREC 2012 Workshop, 26 May 2012, Istanbul, Turkey; pp.49-55. [PDF, 738KB]

(2012) Angelina Ivanova: Evaluation of a bilingual dictionary extracted from Wikipedia.  [BUCC 2012] The 5th Workshop on Building and Using Comparable Corpora: “Language Resources for Machine Translation in Less-Resourced Languages and Domains”,  LREC 2012 Workshop, 26 May 2012, Istanbul, Turkey; pp.62-66. [PDF, 337KB]

(2012) Salil Joshi, Arindam Chatterjee, Arun Karthikeyan Karra, & Pushpak Bhattacharyya: Eating your own cooking: automatically linking wordnet synsets of two languages. Proceedings of COLING 2012: Demonstration Papers, Mumbai, December 2012; pp. 239-246. [PDF, 1527KB]

(2012) Mahdi Khademian, Kaveh Taghipour, Saab Mansour, & Shahram Khadivi: A holistic approach to bilingual sentence fragment extraction from comparable corpora.  LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.4073-4079. [PDF, 416KB]

(2012) Nikola Ljubešić, Špela Vintar, & Darja Fišer: Multi-word term extraction from comparable corpora by combining contextual and constituent clues.  [BUCC 2012] The 5th Workshop on Building and Using Comparable Corpora: “Language Resources for Machine Translation in Less-Resourced Languages and Domains”,  LREC 2012 Workshop, 26 May 2012, Istanbul, Turkey; pp.143-147. [PDF, 376KB]

(2012) Clara Inés López Rodríguez, Miriam Buendía Castro & Alejandro García Aragón: User needs to the test: evaluating a terminological knowledge base on the environment by trainee translators. Journal of Specialised Translation 18 (July 2012); pp.57-76. [PDF, 785KB]

(2012) Gerard de Melo & Gerhard Weikum: UWN: a large multilingual lexical knowledge base. [ACL 2012] Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju, Republic of Korea, 10 July 2012, System Demonstrations; pp.151-156. [PDF, 772KB]

(2012) Xinfan Meng, Furu Wei, Ge Xu, Longkai Zhang, Xiaohua Liu, Ming Zhou, & Houfeng Wang: Lost in translations? Building sentiment lexicons using context based machine translation. Proceedings of COLING 2012: Posters, Mumbai, December 2012; pp.829-838. [PDF, 232KB]

(2012) Christian M.Meyer & Iryna Gurevych: To exhibit is not to loiter: a multilingual, sense-disambiguated Wiktionary for measuring verb similarity. Proceedings of COLING 2012: Technical Papers, Mumbai, December 2012; pp.1763-1780. [PDF, 771KB]

(2012) Pavel Pecina, Antonio  Toral, Vassilis Papavassiliou, Prokopis Prokopidis, & Josef van Genabith: Domain adaptation of statistical machine translation using web-crawled resources: a case study. EAMT 2012: Proceedings of the 16th Annual Conference of the European Association for Machine Translation, Trento, Italy, May 28-30 2012, ed. Mauro Cettolo, Marcello Federico, Lucia Specia, Andy Way; pp.145-152. [PDF, 201KB]

(2012) Reinhard Rapp, Serge Sharoff, & Bogdan Babych: Identifying word translations from comparable documents without a seed lexicon.  LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.460-466. [PDF, 329KB]

(2012) Víctor M.Sánchez-Cartagena, Miquel Esplŕ-Gomis, & Juan Antonio Pérez-Ortiz: Source-language dictionaries help non-expert users to enlarge target-language dictionaries for machine translation.  LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.3422-3429. [PDF, 416KB]

(2012) Felipe Sánchez-Martínez, Rafael C.Carrasco, Miguel A.Martínez-Prieto, & Joaquín Adiego: Generalized bywords for bitext compression and translation spotting. Journal of Artificial Intelligence Research 43; pp.389-418. [PDF, 418KB]

(2012) Xabier Saralegi, Iker Manterola, & Ińaki San Vicente: Building a Basque-Chinese dictionary by using English as pivot.  LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.1443-1447. [PDF, 641KB]

(2012) Rahma Sellami, Fatiha Sadat, & Lamia Hadrich Belguith: Exploiting Wikipedia as a knowledge base for the extraction of linguistic resources: application on Arabic-French comparable corpora and bilingual lexicons. AMTA-2012: Fourth workshop on computational approaches to Arabic script-based languages. Proceedings, San Diego, November 1, 2012; pp.72-79. [PDF, 889KB]

(2012) Sanja Štajner & Ruslan Mitkov: Using comparable corpora to track diachronic and synchronic changes in lexical density and lexical richness.  [BUCC 2012] The 5th Workshop on Building and Using Comparable Corpora: “Language Resources for Machine Translation in Less-Resourced Languages and Domains”,  LREC 2012 Workshop, 26 May 2012, Istanbul, Turkey; pp.88-97. [PDF, 383KB]

(2012) Akihiro Tamura, Taro Watanabe, & Eiichiro Sumita: Bilingual lexicon extraction from comparable corpora using label propagation. EMNLP-CoNLL 2012: Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Proceedings of the conference, July 12-14, Jeju Island, Korea; pp.24-36. [PDF, 347KB]

(2012) Gregor Thurmair & Vera Aleksić: Creating term and lexicon entries from phrase tables. EAMT 2012: Proceedings of the 16th Annual Conference of the European Association for Machine Translation, Trento, Italy, May 28-30 2012, ed. Mauro Cettolo, Marcello Federico, Lucia Specia, Andy Way; pp.253-260. [PDF, 423KB]

(2012) Paola Valli: How long is a piece of string? Concordance searches and user behavior investigated. [Aslib 2012] Translating and the Computer 34, 29-30 November 2012, One Birdcage Walk, London, UK; 11pp. [PDF, 359KB], presentation: 20 slides [PDF, 2909KB]

(2012) Ivan Vulić & Marie-Francine Moens: Detecting highly confident word translations from comparable corpora without any prior knowledge. [EACL 2012] Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, Avignon, France, April 23-27, 2012; pp. 449-459. [PDF, 339KB]

(2012) Pidong Wang, Preslav Nakov, & Hwee Tou Ng: Source language adaptation for resource-poor machine translation. EMNLP-CoNLL 2012: Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Proceedings of the conference, July 12-14, Jeju Island, Korea; pp.286-296. [PDF, 171KB]

(2011) Vamshi Ambati, Sanjika Hewavitharana, Stephan Vogel, & Jaime Carbonell: Active learning with multiple annotations for comparable data classification task. ACL 2011: Proceedings of the Fourth Workshop on Building and Using Comparable Corpora, Portland, Oregon, USA, 24 June 2011; pp.69-77. [PDF, 201KB]

(2011) Miquel Esplŕ-Gomis, Victor M.Sánchez-Cartagena, & Juan Antonio Pérez-Ortiz: Enlarging monolingual dictionaries for machine translation with active learning and non-expert users. [RANLP 2011] Proceedings of Recent Advances in Natural Language Processing, Hissar, Bulgaria, 12-14 September 2011; pp.339-346. [PDF, 148KB]

(2011) Miquel Esplŕ-Gomis, Víctor M.Sánchez-Cartegna & Juan Antonio Pérez-Ortiz: Multimodal building of monolingual dictionaries for machine translation by non-expert users. MT Summit XIII: the Thirteenth Machine Translation Summit [organized by the] Asia-Pacific Association for Machine Translation (AAMT), 19-23 September 2011, Xiamen, China; pp.147-154. [PDF, 163KB]

(2011) Darja Fišer & Nikola Ljubešić: Bilingual lexicon extraction from comparable corpora for closely related languages.  [RANLP 2011] Proceedings of Recent Advances in Natural Language Processing, Hissar, Bulgaria, 12-14 September 2011; pp.125-131. [PDF, 95KB]

(2011) Darja Fišer, Nikola Ljubešić, Špela Vintar, & Senja Pollak: Building and using comparable corpora for domain-specific bilingual lexicon extraction. ACL 2011: Proceedings of the Fourth Workshop on Building and Using Comparable Corpora, Portland, Oregon, USA, 24 June 2011; pp.19-26. [PDF, 287KB]

(2011) Elena Grasso, Piercarlo Rossi, & Andrea Violato: Towards on-line knowledge sharing dictionaries for European law: the Legal Taxonomy Syllabus 3.0. Translating and the Computer 33, 17-18 November 2011, London; 9pp. [PDF, 107KB]

(2011) Amir Hazem, Emmanuel Morin & Sebastian Peńa Saldarriaga: Bilingual lexicon extraction from comparable corpora as metasearch. ACL 2011: Proceedings of the Fourth Workshop on Building and Using Comparable Corpora, Portland, Oregon, USA, 24 June 2011; pp.35-43. [PDF, 159KB]

(2011) Sanjika Hewavitharana & Stephan Vogel: Extracting parallel phrases from comparable data. ACL 2011: Proceedings of the Fourth Workshop on Building and Using Comparable Corpora, Portland, Oregon, USA, 24 June 2011; pp.61-68. [PDF, 217KB]

(2011) Matthias Huck, Saab Mansour, Simon Wiesler, & Hermann Ney: Lexicon models for hierarchical phrase-based machine translation. IWSLT 2011: Proceedings of the International Workshop on Spoken Language Translation, San Francisco, December 8-9, 2011, ed. Marcello Federico, Mei-Yuh Hwang, Margit Rödder, Sebastian Stüker; pp.191-198. [PDF, 256KB]

(2011) Johannes Knopp: Extending a multilingual lexical resource by bootstrapping named entity classification using Wikipedia’s category system. [IJCNLP 2011] Proceedings of the 5th Workshop on Cross Lingual Information Access, Chiang Mai, Thailand, November 13, 2011; pp.35-43. [PDF, 326KB]

(2011) Bo Li, Eric Gaussier, & Akiko Aizawa: Clustering comparable corpora for bilingual lexicon extraction. ACL-HLT 2011: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Short papers, Portland, Oregon, June 19-24, 2011; pp.473-478. [PDF, 216KB]

(2011) Emmanuel Morin & Emmanuel Prochasson: Bilingual lexicon extraction from comparable corpora enhanced with parallel corpora. ACL 2011: Proceedings of the Fourth Workshop on Building and Using Comparable Corpora, Portland, Oregon, USA, 24 June 2011; pp.27-34. [PDF, 125KB]

(2011) Seiji Okura, Yuji Yamamoto, Hajime Ito, Michael Kato, Miwako Shimazu, & Francis Bond: UTX 1.11, a simple and open user dictionary/terminology standard, and its effectiveness with multiple MT systems. MT Summit XIII: the Thirteenth Machine Translation Summit [organized by the] Asia-Pacific Association for Machine Translation (AAMT), 19-23 September 2011, Xiamen, China; pp.587-594. [PDF, 95KB]

(2011) Emmanuel Prochasson & Pascale Fung: Rare word translation extraction from aligned comparable documents. ACL-HLT 2011: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, Portland, Oregon, June 19-24, 2011; pp.1327-1335. [PDF, 157KB]

(2011) Markus Saers & Dekai Wu: Principled induction of phrasal bilexica. [EAMT 2011]: proceedings of the 15th conference of the European Association for Machine Translation, 30-31 May 2011, Leuven, Belgium; eds. Mikel L.Forcada, Heidi Depraetere, Vincent Vandeghinste; pp.313-320. [PDF, 373KB]

(2010)  Nasredine Semmar, Christophe Servan, Gaël de Chalendar, Benoît Le Ny, & Jean-Jacques Bouzaglou: A hybrid word alignment approach to improve translation lexicons with compound words and idiomatic expressions. Translating and the Computer 32, 18-19 November 2010, London; 10pp. [PDF, 95KB]; presentation, 37 slides [PDF of PPT, 934KB]

(2011) Petra Wolf, Ulrike Bernardi, Christian Federmann, & Sabine Hunsicker: From statistical term extraction to hybrid machine translation. [EAMT 2011]: proceedings of the 15th conference of the European Association for Machine Translation, 30-31 May 2011, Leuven, Belgium; eds. Mikel L.Forcada, Heidi Depraetere, Vincent Vandeghinste; pp.225-232. [PDF, 329KB]; presentation, 24 slides [PDF]

(2010) Kathryn Baker, Michael Bloodgood, Bonnie J.Dorr, Nathaniel W.Filardo, Lori Levin, & Christine Piatko: A modality lexicon and its use in automatic tagging. LREC 2010: proceedings of the  seventh international conference on Language Resources and Evaluation, 17-23 May 2010, Valletta, Malta; pp.1402-1407. [PDF, 375KB]

(2010) Timothy Baldwin, Jonathan Pool, & Susan M.Colowick: PanLex and LEXTRACT: translating all words of all languages of the world. Coling 2010: 23rd International Conference on Computational Linguistics, 23-27 August 2010, Beijing International Convention Center, Beijing, China, Demonstrations volume; pp.37-40. [PDF, 309KB]

(2010) Bruno Cartoni & Marie-Aude Lefer: The MuLeXFoR database: representing word-formation processes in a multilingual lexicographic environment. LREC 2010: proceedings of the  seventh international conference on Language Resources and Evaluation, 17-23 May 2010, Valletta, Malta; pp.840-843. [PDF, 284KB]

(2010) Diptesh Chatterjee, Sudeshna Sarkar, & Arpit Mishra: Co-occurrence graph based iterative bilingual lexicon extraction from comparable corpora. [Coling 2010] Proceedings of the 4th Workshop on Cross Lingual Information Access, Beijing, China, 28 August 2010; pp.35-42. [PDF, 618KB]

(2010) Josep Maria Crego, Aurélien Max, & François Yvon: Local lexical adaptation in machine translation through triangulation: SMT helping SMT. Coling 2010: 23rd International Conference on Computational Linguistics. Proceedings of the conference, 23-27 August 2010, Beijing International Convention Center, Beijing, China; pp.232-240. [PDF, 197KB]

 (2010) Hercules Dalianis, Hao-chun Xing, & Xin Zhang: Creating a reusable English-Chinese parallel corpus for bilingual dictionary construction. LREC 2010: proceedings of the  seventh international conference on Language Resources and Evaluation, 17-23 May 2010, Valletta, Malta; pp.1700-1705. [PDF, 409KB]

(2010) Do Thi Ngoc Diep, Laurent Besacier, & Eric Castelli: Improved Vietnamese-French parallel corpus mining using English language. Proceedings of the 7th International Workshop on Spoken Language Translation, 2-3 December 2010, Paris, France; pp.235-242. [PDF, 1182KB]

(2010) Steffen Eger & Ineta Sejane: Computing semantic similarity from bilingual dictionaries. JADT 2010: 10th International Conference on Statistical Analysis of Textual Data, 9-11 juin 2010, Rome, Italie; pp.1217-1225 [PDF, 918KB]

(2010) Pascale Fung, Emmanuel Prochasson, & Simon Shi: Trillions of comparable documents. [LREC 2010] Proceedings of the 3rd Workshop on Building and Using Comparable Corpora, Malta, 22 May 2010; pp.26-34. [PDF, 199KB]

(2010) Benoît Gaillard, Malek Boualem, & Olivier Collin: Query translation using Wikipedia-based resources for analysis and disambiguation. EAMT 2010: Proceedings of the 14th Annual conference of the European Association for Machine Translation, 27-28 May 2010, Saint-Raphaël, France. Proceedings ed.Viggo Hansen and François Yvon; 8pp. [PDF, 579KB]

 (2010) Filip Graliński: Mining parenthetical translations for Polish-English lexica [abstract]. CICLING 2010: 11th International Conference on Intelligent Text Processing and Computational Linguistics, March 21-27, 2010, Iaşi, Romania; 1p. [PDF, 58KB]

(2010) Matthias Huck, Martin Ratajczak, Patrick Lehnen, & Hermann Ney: A comparison of various types of extended lexicon models for statistical machine translation. AMTA 2010: the Ninth conference of the Association for Machine Translation in the Americas, Denver, Colorado, October 31 – November 4, 2010; 8pp. [PDF, 130KB]

(2010) Minwoo Jeong, Kristina Toutanova, Hisami Suzuki, & Chris Quirk: A discriminative lexicon model for complex morphology. AMTA 2010: the Ninth conference of the Association for Machine Translation in the Americas, Denver, Colorado, October 31 – November 4, 2010; 10pp. [PDF, 374KB]

(2010) Mitesh M.Khapra, Saurabh Sohoney, Anup Kulkarni, & Pushpak Bhattacharyya: Value for money: balancing annotation effort, lexicon building and accuracy for multilingual WSD. Coling 2010: 23rd International Conference on Computational Linguistics. Proceedings of the conference, 23-27 August 2010, Beijing International Convention Center, Beijing, China; pp.555-563. [PDF, 237KB]

(2010) Amit Kirschenbaum & Shuly Wintner: A general method for creating a bilingual transliteration dictionary. LREC 2010: proceedings of the  seventh international conference on Language Resources and Evaluation, 17-23 May 2010, Valletta, Malta; pp.273-276. [PDF, 361KB]

(2010) Adrien Lardilleux, Julien Gosme, & Yves Lepage: Bilingual lexicon induction: effortless evaluation of word alignment tools and production of resources for improbable language pairs. [LREC 2010]: Proceedings of the Second Workshop on African Language Technology, AFLAT 2010, Valetta, Malta; pp.252-256. [PDF, 330KB]

(2010) Bo Li & Eric Gaussier: Improving corpus comparability for bilingual lexicon extraction from comparable corpora. Coling 2010: 23rd International Conference on Computational Linguistics. Proceedings of the conference, 23-27 August 2010, Beijing International Convention Center, Beijing, China; pp.644-652. [PDF, 276KB]

(2010) Wang Ling, Tiago Luís, Joăo Graça, Luísa Coheur & Isabel Trancoso: Towards a general and extensible phrase-extraction algorithm. Proceedings of the 7th International Workshop on Spoken Language Translation, 2-3 December 2010, Paris, France; pp.313-320. [PDF, 382KB]

(2010) Reinhard Rapp & Michael Zock: The noisier the better: identifying multilingual word translations using a single monolingual corpus. [Coling 2010] Proceedings of the 4th Workshop on Cross Lingual Information Access, Beijing, China, 28 August 2010; pp.16-25. [PDF, 179KB]

(2010) Reinhard Rapp & Michael Zock: Utilizing citations of foreign words in corpus-based dictionary generation. [Coling 2010] Proceedings  of the Second Workshop on NLP Challenges in the Information Explosion Era, Beijing, China, 28 August 2010; pp.50-59. [PDF, 188KB]

(2010) Nasredine Semmar & Laib Meriama: Using a hybrid word alignment approach for automatic construction and updating of Arabic to French lexicons. LREC 2010: Workshop on Language Resources and Human Language Technology for Semitic Languages, Valletta, Malta, 17 May 2010; pp. 114-119. [PDF, 309KB]

(2010) Jakob Uszkoreit, Jay M.Ponte, Ashok C.Popat, & Moshe Dubiner: Large scale parallel document mining for machine translation. Coling 2010: 23rd International Conference on Computational Linguistics. Proceedings of the conference, 23-27 August 2010, Beijing International Convention Center, Beijing, China; pp.1101-1109. [PDF, 241KB]

(2010) David Vilar, Daniel Stein, Matthias Huck, & Hermann Ney: Jane: open source hierarchical translation, extended with reordering and lexicon models. ACL 2010: Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR. Proceedings of the workshop, 15-16 July 2010, Uppsala University, Uppsala, Sweden; pp. 262-270. [PDF, 136KB]

Limited domain see Domain restriction and specification

Limited resources see Scarce resources, Rapid development of MT

Low resourced languages see Scarce resources

Monolingual corpora

(2013) Yuki Arase & Ming Zhou: Machine translation detection from monolingual web-text.  ACL-2013: Proceedings of the 51st Meeting of the Association for Computational Linguistics, Sofia, Bulgaria, August 4-9 2013; pp.1597-1607. [PDF, 591KB]

(2013) An-Chang Hsieh, Hen-Hsen Huang, & Hsin-His Chen: Uses of monolingual in-domain corpora for cross-domain adaptation with hybrid MT approaches. Proceedings of the Second Workshop on Hybrid Approaches to Translation, Sofia, Bulgaria, August 8, 2013; pp.117-122. [PDF, 275KB]

(2013) Ann Irvine: Statistical machine translation in low resource settings. [NAACL-HLT 2013] Proceedings of the NAACL HLT 2013 Student Research Workshop, 13 June 2013, Atlanta, Georgia; pp.54-61. [PDF, 185KB]

(2013) Ann Irvine & Chris Callison-Burch: Supervised bilingual lexicon induction with multiple monolingual signals. [NAACL-HLT 2013] The 2013 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 9-14 June 2013, Atlanta, Georgia; pp.518-523. [PDF, 511KB]

(2013) Mike Lewis & Mark Steedman: Unsupervised induction of cross-lingual semantic relations. [EMNLP 2013] Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, Washington, USA, 18-21 October 2013; pp.681-692. [PDF, 149KB]

(2013) Linda Mitchell, Johann Roturier, & Sharon O’Brien: Community-based post-editing of machine-translated content: monolingual vs. bilingual. Proceedings of MT Summit XIV Workshop on Post-editing Technology and Practice (WPTP-2), Nice, France, 2 September 2013; pp. 35-43. [PDF, 216KB]

(2013) Vassilis Papavassiliou, Prokopis Prokopidis, & Gregor Thurmair: A modular open-source focused crawler for mining monolingual and bilingual corpora from the web.  Proceedings of the 6th Workshop on Building and Using Comparable Corpora, Sofia, Bulgaria, August 8, 2013; pp.43-51. [PDF, 663KB]

(2013) Majid Razmara, Maryam Siahbani, Gholamreza Haffari, & Anoop Sarkar: Graph propagation for paraphrasing out-of-vocabulary words in statistical machine translation.  ACL-2013: Proceedings of the 51st Meeting of the Association for Computational Linguistics, Sofia, Bulgaria, August 4-9 2013; pp.1105-1115. [PDF, 325KB]

(2013) George Tambouratzis, Sokratis Sofianopoulos, & Marina Vassiliou: Language-independent hybrid MT with PRESEMT. Proceedings of the Second Workshop on Hybrid Approaches to Translation, Sofia, Bulgaria, August 8, 2013; pp.123-130. [PDF, 186KB]

(2013) George Tambouratzis, Marina Vassiliou, & Sokratis Sofianopoulos: A review of the PRESEMT project. Proceedings of the XIV Machine Translation Summit, Nice, September 2-6, 2013; ed. K.Sima’an, M.L.Forcada, D.Grasmick, H.Depraetere, A.Way; p.437. [PDF, 191KB]

(2013) Elke Teich, Stefania Degaetano-Ortlieb, Hannah Kermes, & Ekaterina Lapshinova-Koltunski: Scientific registers and disciplinary diversification: a comparable corpus approach. Proceedings of the 6th Workshop on Building and Using Comparable Corpora, Sofia, Bulgaria, August 8, 2013; pp.59-68. [PDF, 388KB]

(2013) Xuchen Yao, Benjamin Van Durme, Chris Callison-Burch, & Peter Clark: A lightweight and high performance monolingual word aligner.  ACL-2013: Proceedings of the 51st Meeting of the Association for Computational Linguistics, Short papers, Sofia, Bulgaria, August 4-9 2013; pp.702-707. [PDF, 231KB]

(2013) Jiajun Zhang & Chengqing Zong: Learning a phrase-based translation model from monolingual data with application to domain adaptation.  ACL-2013: Proceedings of the 51st Meeting of the Association for Computational Linguistics, Sofia, Bulgaria, August 4-9 2013; pp.1425-1434. [PDF, 476KB]

(2013) Guangyou Zhou, Fang Liu, Yang Liu, Shizhu He, & Jun Zhao: Statistical machine translation improves question retrieval in community question answering via matrix factorization. ACL-2013: Proceedings of the 51st Meeting of the Association for Computational Linguistics, Sofia, Bulgaria, August 4-9 2013; pp.852-861. [PDF, 814KB]

(2012) Houda Bouamor, Aurélien Max, & Anne Vilnat: Validation of sub-sentential paraphrases acquired from parallel monolingual corpora. [EACL 2012] Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, Avignon, France, April 23-27, 2012; pp. 716-725. [PDF, 173KB]

(2012) Qing Dou & Kevin Knight: Large scale decipherment for out-of-domain machine translation. EMNLP-CoNLL 2012: Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Proceedings of the conference, July 12-14, Jeju Island, Korea; pp.266-275. [PDF, 578KB]

(2012) Jie Jiang, Andy Way, Nelson Ng, Rejwanul Haque, Mike Dillinger, & Jun Lu: Monolingual data optimisation for bootstrapping SMT engines. AMTA-2012: Monolingual machine translation-2012 workshop. Proceedings, San Diego, November 1, 2012; pp.17-26. [PDF, 736KB]

(2012) Adam Kilgarriff & George Tambouratzis: The PRESEMT project. [BUCC 2012] The 5th Workshop on Building and Using Comparable Corpora: “Language Resources for Machine Translation in Less-Resourced Languages and Domains”,  LREC 2012 Workshop, 26 May 2012, Istanbul, Turkey; pp.27-28. [PDF, 355KB]

(2012) Alexandre Klementiev, Ann Irvine, Chris Callison-Burch, & David Yarowsky: Toward statistical machine translation without parallel corpora. [EACL 2012] Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, Avignon, France, April 23-27, 2012; pp.130-140. [PDF, 344KB]

(2012) Takanori Kusumoto & Tomoyosi Akiba: Statistical machine translation without a source-side parallel corpus using word lattice and phrase extension.  LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.3929-3932. [PDF, 352KB]

(2012) André Lynum, Erwin Marsi, Lars Bungum, & Björn Gambäck: Disambiguating word translations with target language models. TSD 2012: 15th International Conference on Text, Speech and Dialogue, Brno, Czech Republic, September 3-7, 2012; abstract #477, 1p. [HTML]

(2012) Toshiaki Nakazawa & Sadao Kurohashi: Alignment by bilingual generation and monolingual derivation. Proceedings of COLING 2012: Technical Papers, Mumbai, December 2012; pp.1963-1978. [PDF, 1172KB]

(2012) Dávid Márk Nemeskey & Eszter Simon: Automatically generated NE tagged corpora for Englsih and Hungarian. [ACL 2012] Proceedings of NEWS 2012 Named Entities Workshop, July 12, 2012, Jeju, Republic of Korea; pp.38-46. [PDF, 110KB]

(2012) Malte Nuhn, Arne Mauser, & Hermann Ney: Deciphering foreign language by combining language models and context vectors. [ACL 2012] Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju, Republic of Korea, 8-14 July 2012; pp.156-164. [PDF, 198KB]

(2012) V.M.Sánchez-Cartagena, M.Esplŕ-Gomis, F.Sánchez-Martínez, & J.A.Pérez-Ortiz: Choosing the correct paradigm for unknown words in rule-based machine translation systems.  In: Free/Open-Source Rule-Based Machine Translation, ed.Cristina Espańa-Bonet and Aarne  Ranta. Proceedings of a Workshop held in Gothenburg, 14-15 June, 2012; pp.27-39. [PDF, 502KB]

(2012) Víctor M.Sánchez-Cartagena, Miquel Esplŕ-Gomis, & Juan Antonio Pérez-Ortiz: Source-language dictionaries help non-expert users to enlarge target-language dictionaries for machine translation.  LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.3422-3429. [PDF, 416KB]

(2012) Sokratis Sofianopoulos, Marina Vassiliou, & George Tambouratzis: Implementing a language-independent MT methodology. [ACL 2012] Proceedings of the First Workshop on Multilingual Modeling, Jeju, Republic of Korea, 8-14 July 2012; pp.1-10. [PDF, 285KB]

(2012) Aleš Tamchyna, Petra Galuščáková, Amir Kamran, Miloš Stanojević, & Ondřej Bojar: Selecting data for English-to-Czech machine translation. WMT 2012: 7th Workshop on Statistical Machine Translation. Proceedings of the workshop, June 7-8, 2012, Montréal, Canada; pp.374-381. [PDF, 110KB]

(2012) George Tambouratzis, Michalis Troullinos, Sokratis Sofianopoulos, & Marina Vassiliou: Accurate phrase alignment in a bilingual corpus for EBMT systems.  [BUCC 2012] The 5th Workshop on Building and Using Comparable Corpora: “Language Resources for Machine Translation in Less-Resourced Languages and Domains”,  LREC 2012 Workshop, 26 May 2012, Istanbul, Turkey; pp.104-111. [PDF, 294KB]

(2012) George Tambouratzis, Sokratis Sofianopoulos, & Marina Vassiliou: Evaluating the translation accuracy of a novel language-independent MT methodology. Proceedings of COLING 2012: Technical Papers, Mumbai, December 2012; pp.2569-2583. [PDF, 340KB]

(2012) Jinsong Su, Hua Wu, Haifeng Wang, Yidong Chen, Xiaodong Shi, Huailin Dong, & Qun Liu: Translation model adaptation for statistical machine translation with monolingual topic information. [ACL 2012] Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju, Republic of Korea, 8-14 July 2012; pp.459-468. [PDF, 368KB]

(2012) George Tambouratzis, Marina Vassiliou, & Sokratis Sofianopoulos: PRESEMT: pattern recognition-based statistically enhanced MT. EACL Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra): Proceedings of the workshop, 23-24 April 2012, Avignon, France; pp.65-68. [PDF, 170KB]

(2012) Sander Wubben, Antal van den Bosch, & Emiel Krahmer: Sentence simplification by monolingual machine translation. [ACL 2012] Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju, Republic of Korea, 8-14 July 2012; pp.1015-1024. [PDF, 139KB]

(2011) Vamshi Ambati, Stephan Vogel, & Jaime Carbonell: Multi-strategy approaches to active learning for statistical machine translation.  MT Summit XIII: the Thirteenth Machine Translation Summit [organized by the] Asia-Pacific Association for Machine Translation (AAMT), 19-23 September 2011, Xiamen, China; pp.122-129. [PDF, 432KB]

(2011) Ondřej Bojar & Aleš Tamchyna: Forms wanted: training SMT on monolingual data. Machine Translation and Morphologically- rich Languages: Research Workshop of the Israel Science Foundation, University of Haifa, Israel, 24 January, 2011; 2pp. [PDF, 75KB];  presentation: 20 slides [PDF of PPT, 111KB]

 (2011) Ondřej Bojar & Aleš Tamchyna: Improving translation model by monolingual data . [WMT 2011] Proceedings of the 6th Workshop on Statistical Machine Translation, Edinburgh, Scotland, UK, July 30-31, 2011; pp.330-336. [PDF, 107KB]

 (2011) Han-Bin Chen, Hen-Hsen Huang, Jengwei Tjiu, Ching-Ting Tan, & Hsin-His Chen: Identification and translation of significant patterns for cross-domain SMT applications. MT Summit XIII: the Thirteenth Machine Translation Summit [organized by the] Asia-Pacific Association for Machine Translation (AAMT), 19-23 September 2011, Xiamen, China; pp.277-284. [PDF, 137KB]

 (2011) Patrik Lambert, Holger Schwenk, Christophe Servan, & Sadaf Abdul-Rauf: Investigations on translation model adaptation using monolingual data. [WMT 2011] Proceedings of the 6th Workshop on Statistical Machine Translation, Edinburgh, Scotland, UK, July 30-31, 2011; pp.284-293. [PDF, 124KB]

 (2011) Gennadi Lembersky, Noam Ordan, & Shuly Wintner: Language models for machine translation: original vs. translated texts. [EMNLP 2011] Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, Edinburgh, Scotland, UK, July 27-31, 2011; pp.363-374. [PDF, 288KB]

(2011) Gennadi Lembersky, Noam Ordan, & Shuly Wintner: Language models for machine translation: original vs. translated texts. Machine Translation and Morphologically- rich Languages: Research Workshop of the Israel Science Foundation, University of Haifa, Israel, 26 January, 2011; 2pp. [PDF, 112KB]; presentation: 40 slides [PDF of PPT, 382KB]

 (2011) Zhifei Li, Jason Eisner, Ziyuan Wang, Sanjeev Khudanpur, & Brian Roark: Minimum imputed risk: unsupervised discriminative training for machine translation. [EMNLP 2011] Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, Edinburgh, Scotland, UK, July 27-31, 2011; pp.920-929. [PDF, 236KB]

 (2011) Jeff Ma, Spyros Matsoukas, & Richard Schwartz: Improving low-resource statistical machine translation with a novel semantic word clustering algorithm. MT Summit XIII: the Thirteenth Machine Translation Summit [organized by the] Asia-Pacific Association for Machine Translation (AAMT), 19-23 September 2011, Xiamen, China; pp.352-359. [PDF, 160KB]

(2011) Erwin Marsi, André Lynum, Lars Bungum, & Björn Gambäck: Word translation disambiguation without parallel texts. [LIHMT] International Workshop on Using Linguistic Information for Hybrid Machine Translation, 18th November 2011, Universitat Politčcnica de Catalunya, Barcelona; pp.66-74. [PDF, 898KB]

(2011) Nasredine Semmar & Dhouha Bouamor: A new hybrid machine translation approach using cross-language information retrieval and only target text corpora. [LIHMT] International Workshop on Using Linguistic Information for Hybrid Machine Translation, 18th November 2011, Universitat Politčcnica de Catalunya, Barcelona; pp.75-79. [PDF, 898KB]

(2011) Rui Wang & Chris Callison-Burch: Paraphrase fragment extraction from monolingual comparable corpora. ACL 2011: Proceedings of the Fourth Workshop on Building and Using Comparable Corpora, Portland, Oregon, USA, 24 June 2011; pp.52-60. [PDF, 303KB]

(2011) Jia Xu & Weiwei Sun: Generating virtual parallel corpus: a compatibility centric method. MT Summit XIII: the Thirteenth Machine Translation Summit [organized by the] Asia-Pacific Association for Machine Translation (AAMT), 19-23 September 2011, Xiamen, China; pp.406-413. [PDF, 859KB]

(2010) Wilker Aziz, Marc Dymetman, Shachar Mirkin, Lucia Specia, Nicola Cancedda, & Ido Dagan: Learning an expert from human annotations in statistical machine translation: the case of out-of-vocabulary words. EAMT 2010: Proceedings of the 14th Annual conference of the European Association for Machine Translation, 27-28 May 2010, Saint-Raphaël, France. Proceedings ed.Viggo Hansen and François Yvon; 8pp. [PDF, 585KB]; presentation: 21 slides [PDF, 777KB]

(2010) Chen Yuncong & Pascale Fung: Unsupervised synthesis of multilingual Wikipedia articles. Coling 2010: 23rd International Conference on Computational Linguistics. Proceedings of the conference, 23-27 August 2010, Beijing International Convention Center, Beijing, China; pp.197-205. [PDF, 1040KB]

(2010) Rashmi Gangadharaiah, Ralf D.Brown, & Jaime Carbonell: Monolingual distributional profiles for word substitution in machine translation. Coling 2010: 23rd International Conference on Computational Linguistics, 23-27 August 2010, Beijing International Convention Center, Beijing, China, Posters volume; pp.320-328. [PDF, 605KB]

(2010) Zhanyi Liu, Haifeng Wang, Hua Wu, & Sheng Li: Improving statistical machine translation with monolingual collocation. ACL 2010: the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, July 11-16, 2010: Conference proceedings; pp.825-833. [PDF, 405KB]

(2010) Yuval Marton: Improved statistical machine translation with hybrid phrasal paraphrases derived from monolingual text and shallow lexical resource. AMTA 2010: the Ninth conference of the Association for Machine Translation in the Americas, Denver, Colorado, October 31 – November 4, 2010; 10pp. [PDF, 293KB]

(2010) Smruthi Mukund, Debanjan Ghosh, & Rohini K.Srihari: Using cross-lingual projections to generate semantic role labeled corpus for Urdu – a resource poor language. Coling 2010: 23rd International Conference on Computational Linguistics. Proceedings of the conference, 23-27 August 2010, Beijing International Convention Center, Beijing, China; pp.797-805. [PDF, 300KB]

(2010) Reinhard Rapp & Michael Zock: The noisier the better: identifying multilingual word translations using a single monolingual corpus. [Coling 2010] Proceedings of the 4th Workshop on Cross Lingual Information Access, Beijing, China, 28 August 2010; pp.16-25. [PDF, 179KB]

 (2010) Stefan Riezler & Yi Liu: Query rewriting using monolingual statistical machine translation. Computational Linguistics 36 (3), pp. 569-582 [PDF, 145KB]

(2010) Xabier Saralegi & Maddalen Lopez de Lacalle: Dictionary and monolingual corpus-based query translation for Basque-English CLIR. LREC 2010: proceedings of the  seventh international conference on Language Resources and Evaluation, 17-23 May 2010, Valletta, Malta; pp.1353-1358. [PDF, 284KB]

(2010) Hiroyuki Shindo, Akinori Fujino, & Masaaki Nagata: Word alignment with synonym regularization. ACL 2010: the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, July 11-16, 2010: Proceedings of the Conference Short Papers; pp.137-141. [PDF, 492KB]

(2010) Yanli Sun, Sharon O’Brien, Minako O’Hagan, & Fred Hollowood: A novel statistical pre-processing model for rule-based machine translation system.  EAMT 2010: Proceedings of the 14th Annual conference of the European Association for Machine Translation, 27-28 May 2010, Saint-Raphaël, France. Proceedings ed.Viggo Hansen and François Yvon; 8pp. [PDF, 790KB]; presentation: 28 slides [PDF, 580KB]

(2010) Yulia Tsvetkov & Shuly Wintner: Extraction of multi-word expressions from small parallel corpora. Coling 2010: 23rd International Conference on Computational Linguistics, 23-27 August 2010, Beijing International Convention Center, Beijing, China, Posters volume; pp.1256-1264. [PDF, 257KB]

(2010) Gae-won You, Seung-won Hwang, Young-In Song, Long Jiang, & Zaiqing Nie: Mining name translations from entity graph mapping.  [EMNLP 2010] Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, MIT, Massachusetts, USA, 9-11 October 2010; pp.430-439. [PDF, 1542KB]

Multilingual corpora

(2015) Martin Benjamin, Amar Mukunda & Jeff Allen: Kamusi pre-D-source-side disambiguation and a sense aligned multilingual lexicon. Proceedings of the 37th Conference Translating and the Computer, London, November 26-27, 2015; pp.27-32. [PDF, 188KB]

(2015) Zied Elloumi, Hervé Blanchon, Gilles Serasset, & Laurent Besacier: METEOR for multiple target languages using DBnary. MT Summit XV, October 30 – November 3, 2015, Miami, Florida, USA. Proceedings of MT Summit XV: vol.1: MT Researchers’ Track; pp.80-89. [PDF, 569KB]

(2014) Judit Ács: Pivot-based multilingual dictionary building using Wiktionary.  LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.1938-1942. [PDF, 81KB]

(2014) Shyam S.Agrawal, Abhimane, Shweta Bansal, & Minakshi Mahakshi: Statistical analysis of multilingual text corpus and development of language models. LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.2436-2440. [PDF, 815KB]

(2014) Hernani Costa, Gloria Corpas Pastor, & Miriam Seghiri: iCompileCorpora: a web-based application to semi-automatically compile multilingual comparable corpora. Translating and the Computer 36: proceedings. Asling: International Society for Advancement in Language Technology, 27-28 November 2014; pp.51-55. [PDF, 119KB]

(2014) Maud Ehrmann, Francesco Cecconi, Daniele Vannella, John McCrae, Philipp Cimiano, & Roberto Navigli: Representing multilingual data as linked data: the case of BabelNet 2.0.  LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.401-408. [PDF, 674KB]

(2014) Tatiana Erekhinskaya, Meghana Satpute, & Dan Moldovan: Multilingual eXtended Word Net Knowledge Base semantic parsing and translation of glosses.  LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.2990-2994. [PDF, 82KB]

(2014) Miquel Esplŕ-Gomis, Filip Klubička, Nikola Ljubešić, Sergio Ortiz-Rojas, Vassilis Papavassiliou, & Prokopis Prokopidis: Comparing two acquisition systems for automatically building an English-Croatian parallel corpus from multilingual websites.  LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.1252-1258. [PDF, 158KB]

(2014) Najeh Hajlaoui, David Kolovratnik, Jaakko Väyrynen, Ralf Steinberger, & Daniel Varga: DCEP – digital corpus of the European Parliament. LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.3164-3171. [PDF, 252KB]

(2014) Valérie Hanoka & Benoît Sagot: YaMTG: an open-source heavily multilingual translation graph extracted from wiktionaries and parallel corpora. LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.3179-3186. [PDF, 245KB]

(2014) Lars Hellan, Dorothee Beermann, Tore Bruland, Mary Esther Kropp Dakubu, Montserrat Marimon: MultiVal – towards a multilingual valence lexicon. LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.2478-2485. [PDF, 247KB]

(2014) Guillaume Jacquet, Maud Ehrmann, & Ralf Steinberger: Clustering of multi-word named entity variants: multilingual evaluation. LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.2696-2700. [PDF, 217KB]

(2014) David Kamholz, Jonathan Pool, & Susan M.Colowick: PanLex: building a resource for panlingual lexical translation.  LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.2696-2700. [PDF, 145KB]

(2014) Thomas Mayer & Michael Cysouw: Creating a massively parallel Bible corpus. LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.3158-3163. [PDF, 575KB]

(2014) Anita Rácz, István Nagy T., Veronika Vincze: 4FX: light verb constructions in a multilingual parallel corpus. LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.710-715. [PDF, 140KB]

(2013) Judit Ács, Katalin Pajkossy, & András Kornai: Building basic vocabulary across 40 languages. Proceedings of the 6th Workshop on Building and Using Comparable Corpora, Sofia, Bulgaria, August 8, 2013; pp.52-58. [PDF, 381KB]

(2013) Yegin Genc, Elizabeth A.Lennon, Winter Mason, & Jeffrey V.Nickerson: Building ontologies from collaborative knowledge bases to search and interpret multilingual corpora.  Proceedings of the 6th Workshop on Building and Using Comparable Corpora, Sofia, Bulgaria, August 8, 2013; pp.87-94. [PDF, 1278KB]

(2013) Michael Matuschek, Christian M.Meyer, & Iryna Gurevych: Multilingual knowledge in aligned Wiktionary and OmegaWiki for translation applications. Translation: Computation, Corpora,  Cognition 3 (1), June 2013; pp.87-118. [PDF, 2898KB]

(2013) Motaz Saad, David Langlois, & Kamel Smaili: Comparing multilingual comparable articles based on opnions. Proceedings of the 6th Workshop on Building and Using Comparable Corpora, Sofia, Bulgaria, August 8, 2013; pp.105-111. [PDF, 512KB]

(2012) Eleftherios Avramidis, Marta R.Costa-jussŕ, Christian Federmann, Maite Melero, Pavel Pecina, & Josef van Genabith: A richly annotated, multilingual parallel corpus for hybrid machine translation. LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.2189-2193. [PDF, 433KB]

(2012) Cristina Bosco, Manuela Sanguinetti, & Leonardo Lesmo: The Parallel-TUT: a multilingual and multiformat treebank.  LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.3982-3987. [PDF, 460KB]

(2012) Bruno Cartoni & Thomas Meyer: Extracting directional and comparable corpora from a multilingual corpus for translation studies.  LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.2132-2137. [PDF, 352KB]

(2012) Yu Chen & Andreas Eisele: MultiUN v2: UN documents with multilingual alignments.  LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.2500-2504. [PDF, 405KB]

(2012) Susanne Jekat: Multilingual information management for special purposes [abstract].  In: Crosslingual Language Technology in service of an integrated multilingual Europe, 4-5 May 2012, University of Hamburg, Germany; p.15. [PDF]

(2012) Alan K.Melby: Terminology in the age of multilingual corpora. Journal of Specialised Translation 18 (July 2012); pp.7-29. [PDF, 329KB]

(2012) Gerard de Melo & Gerhard Weikum: UWN: a large multilingual lexical knowledge base. [ACL 2012] Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju, Republic of Korea, 10 July 2012, System Demonstrations; pp.151-156. [PDF, 772KB]

(2012) Hervé Saint-Amand, Jason Smith, & Magdalena Plamada: Parallel corpus extraction from CommonCrawl. Machine Translation Marathon 2012 September 3-8, Edinburgh, UK; 10 slides [PDF of PPT, 70KB]

(2012) Oscar Täckström: Nudging the envelope of direct transfer methods for multilingual named entity recognition. NAACL-HLT Workshop on the Induction of Linguistic Structure, Montréal, Canada, June 3-8, 2012; pp.55-63. [PDF, 188KB]

(2011) Steven Abney & Steven Bird: Towards a data model for the universal corpus. ACL 2011: Proceedings of the Fourth Workshop on Building and Using Comparable Corpora, Portland, Oregon, USA, 24 June 2011; pp.120-127. [PDF, 118KB]

(2011) Shane Bergsma, David Yarowsky, & Kenneth Church: Using large monolingual and bilingual corpora to improve coordination disambiguation. ACL-HLT 2011: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, Portland, Oregon, June 19-24, 2011; pp.1346-1355. [PDF, 157KB]

(2011) Michael Elhadad, Meni Adler, Yoav Goldberg, & Rafi Cohen: Topic models for morphologically rich languages and their usage to explore multilingual corpora [abstract]. Machine Translation and Morphologically- rich Languages: Research Workshop of the Israel Science Foundation, University of Haifa, Israel, 23 January, 2011; presentation: 64 slides [PDF of PPT, 655KB]

(2011) Kriste Krstovski & David A.Smith: A minimally supervised approach for detecting and ranking document translation pairs.  [WMT 2011] Proceedings of the 6th Workshop on Statistical Machine Translation, Edinburgh, Scotland, UK, July 30-31, 2011; pp.207-216. [PDF, 389KB]

(2011) Bin Lu, Ka Po Chow, & Benjamin K.Tsou: The cultivation of a Chinese-English-Japanese trilingual parallel corpus from comparable patents. MT Summit XIII: the Thirteenth Machine Translation Summit [organized by the] Asia-Pacific Association for Machine Translation (AAMT), 19-23 September 2011, Xiamen, China; pp.472-479. [PDF, 207KB]

(2011) Matteo Negri, Luisa Bentivogli, Yashar Mehdad, Danilo Giampiccolo, & Alessandro Marchetti: Divide and conquer: crowdsourcing the creation of cross-lingual textual entailment corpora.  [EMNLP 2011] Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, Edinburgh, Scotland, UK, July 27-31, 2011; pp.670-679. [PDF, 493KB]

(2011) Violeta Seretan & Eric Wehrli: FipsCoView: on-line visualisation of collocations extracted from multilingual parallel corpora. Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World (MWE 2011), Portland, Oregon, USA, 23 June 2011; pp.125-127. [PDF, 120KB]

(2011) Johanka Spoustová & Miroslav Spousta: Comparable fora. ACL 2011: Proceedings of the Fourth Workshop on Building and Using Comparable Corpora, Portland, Oregon, USA, 24 June 2011; pp.96-101. [PDF, 80KB]

(2011) Wolfgang Täger: The sentence-aligned European patent corpus. [EAMT 2011]: proceedings of the 15th conference of the European Association for Machine Translation, 30-31 May 2011, Leuven, Belgium; eds. Mikel L.Forcada, Heidi Depraetere, Vincent Vandeghinste; pp.177-184. [PDF, 234KB]

(2011) ACCURAT: analysis and evaluation of comparable corpora for under resourced areas of machine translation. (European Machine Translation Projects.) [EAMT 2011]: proceedings of the 15th conference of the European Association for Machine Translation, 30-31 May 2011, Leuven, Belgium; eds. Mikel L.Forcada, Heidi Depraetere, Vincent Vandeghinste; p.323. [PDF, 287KB]

(2010) Thomas Eckart & Uwe Quasthoff: Statistical corpus and language comparison using comparable corpora. [LREC 2010] Proceedings of the 3rd Workshop on Building and Using Comparable Corpora, Malta, 22 May 2010; pp.15-20. [PDF, 330KB]

(2010) Andreas Eisele & Yu Chen: MultiUN: a multilingual corpus from United Nation documents. LREC 2010: proceedings of the  seventh international conference on Language Resources and Evaluation, 17-23 May 2010, Valletta, Malta; pp.2868-2872. [PDF, 333KB]

(2010) Tomaž Erjavec: MULTEXT-East version 4: multilingual morphosyntactic specifications, lexicons and corpora. LREC 2010: proceedings of the  seventh international conference on Language Resources and Evaluation, 17-23 May 2010, Valletta, Malta; pp.2544-2547. [PDF, 324KB]

(2010) Pablo Gamallo Otero & Isaac González López: Wikipedia as multilingual source of comparable corpora. [LREC 2010] Proceedings of the 3rd Workshop on Building and Using Comparable Corpora, Malta, 22 May 2010; pp.21-25. [PDF, 180KB]

(2010) Tomoharu Iwata, Daichi Mochihashi, & Hiroshi Sawada: Learning common grammar from multilingual corpus. ACL 2010: the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, July 11-16, 2010: Proceedings of the Conference Short Papers; pp.184-188. [PDF, 966KB]

(2010) Adam Kilgarriff: Comparable corpora within and across languages, word frequency lists and the KELLY project. [LREC 2010] Proceedings of the 3rd Workshop on Building and Using Comparable Corpora, Malta, 22 May 2010; pp.1-5. [PDF, 113KB]

(2010) Petr Knoth, Trevor Collins, Elsa Sklavounou, & Zdenek Zdrahal: Facilitating cross-language retrieval and machine translation by multilingual domain ontologies. [LREC 2010] Workshop on Supporting eLearning with Language Resources  and Semantic Data, Valletta, Malta, 22 May 2010; 42 slides. [PDF of PPT, 440KB]

(2010) Adrien Lardilleux, Julien Gosme, & Yves Lepage: Bilingual lexicon induction: effortless evaluation of word alignment tools and production of resources for improbable language pairs. [LREC 2010]: Proceedings of the Second Workshop on African Language Technology, AFLAT 2010, Valetta, Malta; pp.252-256. [PDF, 330KB]

(2010) Els Lefever & Véronique Hoste: Construction of a benchmark data set for cross-lingual word sense disambiguation. LREC 2010: proceedings of the  seventh international conference on Language Resources and Evaluation, 17-23 May 2010, Valletta, Malta; pp.1584-1590. [PDF, 465KB]

(2010) Simon Mille & Leo Wanner: Syntactic dependencies for multilingual and multilevel corpus annotation. LREC 2010: proceedings of the  seventh international conference on Language Resources and Evaluation, 17-23 May 2010, Valletta, Malta; pp.1889-1896. [PDF, 395KB]

(2010) Gudrun Rawoens: Multilingual corpora in cross-lingusitic research: focus on the compilation of a Dutch-Swedish parallel corpus. JADT 2010: 10th International Conference on Statistical Analysis of Textual Data, 9-11 juin 2010, Rome, Italie; pp.1287-1294. [PDF, 417KB]

(2010) Fei Xia, Carrie Lewis, & William D.Lewis: The problems of language identification within hugely multilingual data sets.  LREC 2010: proceedings of the  seventh international conference on Language Resources and Evaluation, 17-23 May 2010, Valletta, Malta; pp.2790-2797. [PDF, 284KB]

(2010) Martin Volk, Noah Bubenhofer, Adrian Althaus, Maya Bangerter, Lenz Furrer, & Beni Ruef: Challenges in building a multilingual Alpine heritage corpus. LREC 2010: proceedings of the  seventh international conference on Language Resources and Evaluation, 17-23 May 2010, Valletta, Malta; pp.1653-1659. [PDF, 1296KB]

Ontologies

(2014) Anabela Barreiro, Fernando Batista, Ricardo Ribeiro, Helena Moniz, & Isabel Trancoso: OpenLogos semantico-syntactic knowledge-rich bilingual dictionaries. LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.3774-3781. [PDF, 163KB]

(2014) Maud Ehrmann, Francesco Cecconi, Daniele Vannella, John McCrae, Philipp Cimiano, & Roberto Navigli: Representing multilingual data as linked data: the case of BabelNet 2.0.  LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.401-408. [PDF, 674KB]

(2014) Jorge Gracia, Elena Montiel-Ponsoda, Daniel Vila-Suero, & Guadalupe Aguado-de-Cea: Enabling language resources to expose translations as linked data on the Web. LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.409-413. [PDF, 346KB]

(2013) Mihael Arcan & Paul Buitelaar: MONNET: multilingual ontologies for networked knowledge. Proceedings of the XIV Machine Translation Summit, Nice, September 2-6, 2013; ed. K.Sima’an, M.L.Forcada, D.Grasmick, H.Depraetere, A.Way; p.431. [PDF, 241KB]

(2013) Mihael Arcan & Paul Buitelaar: Ontology label translation. [NAACL-HLT 2013] Proceedings of the NAACL HLT 2013 Student Research Workshop, 13 June 2013, Atlanta, Georgia; pp.40-46. [PDF, 127KB]

(2013) Maria Pia di Buono, Johanna Monti, Mario Monteleone, & Federica Marano: Multi-word processing in an ontology-based cross-language information retrieval model for specific domain collections. [MT Summit XIV] Proceedings of the Workshop on Multi-word Units in Machine Translation and Translation Technology, Nice, September 3, 2013; pp.43-52. [PDF, 673KB]

(2013) Yegin Genc, Elizabeth A.Lennon, Winter Mason, & Jeffrey V.Nickerson: Building ontologies from collaborative knowledge bases to search and interpret multilingual corpora.  Proceedings of the 6th Workshop on Building and Using Comparable Corpora, Sofia, Bulgaria, August 8, 2013; pp.87-94. [PDF, 1278KB]

(2013) Meritxell Gonzŕlez, Maria Mateva, Ramona Enache, Cristina Espańa, Lluís Mŕrquez, Borislav Popov, & Aarne Ranta: MT techniques in a retrieval system of semantically enriched patents. Proceedings of the XIV Machine Translation Summit, Nice, September 2-6, 2013; ed. K.Sima’an, M.L.Forcada, D.Grasmick, H.Depraetere, A.Way; pp.273-280. [PDF, 908KB]

(2013) Clara López Rodríguez, Juan Antonio Prieto Velasco, & Maribel Tercedor Sánchez: Multimodal representation of specialised knowledge in ontology-based terminological databases: the case of EcoLexicon. Journal of Specialised Translation 20, July 2013; pp.49-67. [PDF, 336KB]

(2013) XLIKE: cross-lingual knowledge extraction (XLike). Proceedings of the XIV Machine Translation Summit, Nice, September 2-6, 2013; ed. K.Sima’an, M.L.Forcada, D.Grasmick, H.Depraetere, A.Way; p.451. [PDF, 191KB]

(2012) Mihael Arcan, Christian Federmann, & Paul Buitelaar: Experiments with term translation. Proceedings of COLING 2012: Technical Papers, Mumbai, December 2012; pp.67-82. [PDF, 178KB]

(2012) Mihael Arcan, Paul Buitelaar, & Christian Federmann: Using domain-specific and collaborative resources for term translation. SSST-6, Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation, Jeju, Republic of Korea, 12 July 2012; pp.86-94. [PDF, 122KB]

(2012) Kartik Asooja, Jorge Gracia, Nitish Aggarwal, & Asunión Gómez Pérez: Using cross-lingual explicit semantic analysis for improving ontology translation. COLING 2012: Second Workshop on Applying Machine Learning Techniques to Optimise the Division of Labour in Hybrid MT (ML4HMT), Mumbai, December 2012; pp.25-35. [PDF, 143KB]

(2012) Gerhard Budin: Terminological ontologies in multi-lingual cross-domain communities of practice [abstract]. In: Crosslingual Language Technology in service of an integrated multilingual Europe, 4-5 May 2012, University of Hamburg, Germany; p.19. [PDF]

(2012) Isabel Durán Muńoz, Gloria Corpas Pastor & Le An Ha: ProTermino: a comprehensive web-based terminological management tool based on knowledge representation. [Aslib 2012] Translating and the Computer 34, 29-30 November 2012, One Birdcage Walk, London, UK; 4pp. [PDF, 97KB]

(2012) Roger Granada, Lucelene Lopes, Carlos Ramisch, Cassia Trojahn, Renata Vieira, & Aline Villavicencio: A comparable corpus based on aligned multilingual ontologies. [ACL 2012] Proceedings of the First Workshop on Multilingual Modeling, Jeju, Republic of Korea, 8-14 July 2012; pp.25-31. [PDF, 108KB]

(2012) Oliver Kutz, Christoph Lange, Till Mossakowski, C.Maria Keet, Fabian Neuhaus, & Michael Gruninger: The Babel of the Semantic Web tongues – in search of the Rosetta stone of interoperability. Semantic Web Conference 2012; 6pp. [PDF, 813KB]

(2012) Cristina Vertan: Two approaches for integrating translation and retrieval in real applications. EACL Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra): Proceedings of the workshop, 23-24 April 2012, Avignon, France; pp. 59-64. [PDF, 226KB]

(2012) Manuela Yapomo, Gloria Corpas, & Ruslan Mitkov: CLIR- and ontology-based approach for bilingual extraction of comparable documents.  [BUCC 2012] The 5th Workshop on Building and Using Comparable Corpora: “Language Resources for Machine Translation in Less-Resourced Languages and Domains”,  LREC 2012 Workshop, 26 May 2012, Istanbul, Turkey; pp.121-125. [PDF, 380KB]

(2011) Fumiko Kano Glückstad: Application of classical psychology theory to terminological ontology alignment. Proceedings of the 8th international NLPSC workshop. Special theme: Human-machine interaction in translation, Copenhagen Business School, 20-21 August 2011; ed.Bernadette Sharp, Michael Zock, Michael Carl, Arnt Lykke Jakobsen (Copenhagen Studies in Language 41), Frederiksberg: Samfundslitteratur, 2011; pp.227-238. [PDF, 1187KB]

(2011) John McCrae, Maurizio Espinoza, Elena Monteil-Ponsoda, Guadalupe Aguado-de-Cea, & Philipp Cimiano: Combining statistical and semantic approaches to the translation of ontologies and taxonomies. Proceedings of SSST-5, Fifth Workshop on Syntax, Semantics and Structure in Statistical Translation, ACL HLT 2011, Portland, Oregon, USA, June 2011; pp.116-125. [PDF, 576KB]

(2011) Paul McNamee, James Mayfield, Dawn Lawrie, Douglas W.Oard, & David Doerman: Cross-language entity linking. [IJCNLP 2011] Proceedings of the 5th International Joint Conference on Natural Language Processing, Chiang Mai, Thailand, November 8-13, 2011; pp.255-263. [PDF, 409KB]

(2011) Junichi Tsujii: Resource-rich research on natural language processing and understanding. Keynote at: IWSLT 2011: Proceedings of the International Workshop on Spoken Language Translation, San Francisco, December 8-9, 2011, ed. Marcello Federico, Mei-Yuh Hwang, Margit Rödder, Sebastian Stüker

(2011) MONNET: multilingual ontologies for networked knowledge. (European Machine Translation Projects.) [EAMT 2011]: proceedings of the 15th conference of the European Association for Machine Translation, 30-31 May 2011, Leuven, Belgium; eds. Mikel L.Forcada, Heidi Depraetere, Vincent Vandeghinste; p.343. [PDF, 36KB]

(2010) Gosse Bouma: Cross-lingual ontology alignment using EuroWordNet and Wikipedia. LREC 2010: proceedings of the  seventh international conference on Language Resources and Evaluation, 17-23 May 2010, Valletta, Malta; pp.1023-1028. [PDF, 336KB]

(2010) Helena de Medeiros Caseli, Bruno Akio Sugiyama, & Junia Coutinho Anacleto: Using common sense to generate culturally contextualized machine translation. Proceedings of the NAACL HLT 2010 Young Investigators Workshop on Computational Approaches to Languages of the Americas, Los Angeles, CA, June 2010; pp.24-31. [PDF, 353KB]

(2010) Petr Knoth, Trevor Collins, Elsa Sklavounou, & Zdenek Zdrahal: Facilitating cross-language retrieval and machine translation by multilingual domain ontologies. [LREC 2010] Workshop on Supporting eLearning with Language Resources  and Semantic Data, Valletta, Malta, 22 May 2010; 42 slides. [PDF of PPT, 440KB]

Open source

(2015) Christophe Servan, Ngoc Tien Le, Ngoc Quang Luong, Benjamin Lecouteux, & Laurent Besacier: An open-source toolkit for word-level confidence estimation in machine translation. [IWSLT 2015] Proceedings of the International Workshop on Spoken Language Translation, December 3-4, 2015, Da Nang, Vietnam; pp.196-203. [PDF, 3.2MB]

(2014) Krasimir Angelov: Bootstrapping open-source English-Bulgarian computational dictionary. LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.1018-1023. [PDF, 158KB]

(2014) Grégoire Détrez, Víctor M.Sánchez-Cartagena, & Aarne Ranta: Sharing resources between free/open-source rule-based machine translation systems: Grammatical Framework and Apertium. LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.4394-4400. [PDF, 146KB]

(2014) Maud Ehrmann, Francesco Cecconi, Daniele Vannella, John McCrae, Philipp Cimiano, & Roberto Navigli: Representing multilingual data as linked data: the case of BabelNet 2.0.  LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.401-408. [PDF, 674KB]

(2014) Marcello Federico, Nicola Bertoldi, Marco Trombetti, & Alessandro Cattelan: MateCat: an open source CAT tool for MT post-editing. AMTA 2014: proceedings of the eleventh conference of the Association for Machine Translation in the Americas, Vancouver, BC, October 22-26; Tutorials, 98 slides

(2014) Spence Green, Daniel Cer, & Christpher D.Manning: Phrasal: a toolkit for new directions in statistical machine translation. [WMT 2014] Proceedings of the Ninth Workshop on Statistical Machine Translation, Baltimore, Maryland USA, June 26-27, 2014; pp.114-121. [PDF, 410KB]

(2014) David Kamholz, Jonathan Pool, & Susan M.Colowick: PanLex: building a resource for panlingual lexical translation.  LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.2696-2700. [PDF, 145KB]

(2014) Juan Luo & Yves Lepage: Production of phrase tables in 11 European languages using an improved sub-sentential aligner. LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.664-669. [PDF, 206KB]

(2014) Stelios Piperidis, Harris Papageorgiou, Christian Spurk, Georg Rehm, Khalid Choukri, Olivier Hamon, Nicoletta Calzolari, Riccardo del Gratta, Bernardo Magnini, & Christian Girardi: META-SHARE: one year after. LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.1208-1211. [PDF, 861KB]

(2014) Alex Rudnick, Taylor Skidmore, Alberto Samaniego, & Michael Gasser: Guampa: a toolkit for collaborative translation. LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.1659-1663. [PDF, 284KB]

(2014) Lane Schwartz: An open source desktop post-editing tool. AMTA 2014: proceedings of the eleventh conference of the Association for Machine Translation in the Americas, Vancouver, BC, October 22-26; Workshop on Post-editing Technology and Practice (WPTP-3); p. 122. [PDF, 103KB]

(2014) Antonio Toral, Raphael Rubino, Miquel Esplŕ-Gomis, Tommi Pirinen, Andy Way, & Gema  Ramírez-Sánchez: Extrinsic evaluation of web-crawlers in machine translation: a study on Croatian-English for the tourism domain. Proceedings of the 17th annual conference of the European Association for Machine Translation, EAMT 2014, Dubrovnik, Croatia, 16th-18th June 2014, edited by Marko Tadić, Philipp Koehn, Johann Roturier, Andy Way; pp. 221-224. [PDF, 345KB]

(2014) Jonathan North Washington, Ilnar Salimzyanov, & Francis M.Tyers: Finite-state morphological transducers for three Kypchak languages. LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.3378-3385. [PDF, 1727KB]

(2013) Christian Hardmeier, Sara Stymne, Jörg Tiedemann, & Joakim Nivre: Docent: a document-level decoder for phrase-based statistical machine translation.  ACL-2013: Proceedings of the 51st Meeting of the Association for Computational Linguistics, System demonstrations, Sofia, Bulgaria, August 4-9 2013; pp.193-198. [PDF, 147KB]

(2013) Vassilis Papavassiliou, Prokopis Prokopidis, & Gregor Thurmair: A modular open-source focused crawler for mining monolingual and bilingual corpora from the web.  Proceedings of the 6th Workshop on Building and Using Comparable Corpora, Sofia, Bulgaria, August 8, 2013; pp.43-51. [PDF, 663KB]

(2013) Matt Post, Juri Ganitkevitch, Luke Orland, Jonathan Weese, Yuan Cao & Chris Callison-Burch: Joshua 5.0: sparser, better, faster, server. WMT 2013: 8th Workshop on Statistical Machine Translation, Proceedings of the Workshop, August 8-9, 2013, Sofia, Bulgaria; pp.206-212. [PDF, 205KB]

(2013) Kenneth Heafield, Ivan Pouzyrevsky, Jonathan H.Clark, & Philipp Koehn: Scalable modified Kneser-Ney language model estimation.  ACL-2013: Proceedings of the 51st Meeting of the Association for Computational Linguistics, Short papers, Sofia, Bulgaria, August 4-9 2013; pp.690-696. [PDF, 212KB]

(2013) Graham Neubig: Travatar: a forest-to-string machine translation engine based on tree transducers. ACL-2013: Proceedings of the 51st Meeting of the Association for Computational Linguistics, System demonstrations, Sofia, Bulgaria, August 4-9 2013; pp.91-96. [PDF, 344KB]

(2013) Ilnar Salimzyanov, Jonathan North Washington, & Francis Morton Tyers: A free/open-source Kazakh-Tatar machine translation system. Proceedings of the XIV Machine Translation Summit, Nice, September 2-6, 2013; ed. K.Sima’an, M.L.Forcada, D.Grasmick, H.Depraetere, A.Way; pp.175-182. [PDF, 327KB]

(2013) Lucia Specia, Kashif Shah, Jose G.C.de Souza, & Trevor Cohn: QuEst – a translation quality estimation framework. ACL-2013: Proceedings of the 51st Meeting of the Association for Computational Linguistics, System demonstrations, Sofia, Bulgaria, August 4-9 2013; pp.79-84. [PDF, 163KB]

(2013) MosesCore: Moses open source evaluation and support co-ordination for outreach and exploitation. Proceedings of the XIV Machine Translation Summit, Nice, September 2-6, 2013; ed. K.Sima’an, M.L.Forcada, D.Grasmick, H.Depraetere, A.Way; p.433. [PDF, 537KB]

(2012) Wilker Aziz & Lucia Specia: PET: a standalone tool for assessing machine translation through post-editing. [Aslib 2012] Translating and the Computer 34, 29-30 November 2012, One Birdcage Walk, London, UK; 5pp. [PDF, 446KB]; presentation by Lucia Specia: 54 slides [PDF, 694KB]

(2012) Jan Berka, Ondřej Bojar, Mark Fishel, Maja Popović, & Daniel Zeman: Automatic MT error analysis: Hjerson helping Addicter.  LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.2158-2163. [PDF, 580KB]

(2012) Christian Federmann: Appraise: an open-source toolkit for manual evaluation of MT output. Prague Bulletin of Mathematical Linguistics 98, October 2012; pp.25-35. [PDF, 487KB]

(2012) Christian Federmann: Appraise. Machine Translation Marathon 2012 September 3-8, Edinburgh, UK; 30 slides [PDF of PPT, 1216KB]

(2012) Christian Federmann, Ioanna Giannopoulou, Christian Girardi, Olivier Hamon, Dimitris Mavroeidis, Salvatore Minutoli, & Marc Schröder: META-SHARE v2: an open network of repositories for language resources including data and tools.  LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.3300-3303. [PDF, 588KB]

(2012) Juri Ganitkevitch, Yuan Cao, Jonathan Weese, Matt Post, & Chris Callison-Burch: Joshua 4.0: packing, PRO, and paraphrases. WMT 2012: 7th Workshop on Statistical Machine Translation. Proceedings of the workshop, June 7-8, 2012, Montréal, Canada; pp.283-291. [PDF, 195KB]

(2012) Matthias Huck, Jan-Thorsten Peter, Markus Freitag, Stephan Peitz, & Hermann Ney: Hierarchical phrase-based translation with Jane 2. Prague Bulletin of Mathematical Linguistics 98, October 2012; pp.37-50. [PDF, 168KB]; presentation, 23 slides [PDF of PPT, 242KB]

(2012) Philipp Koehn & Hieu Hoang: Open source statistical machine translation: Moses, machine translation with open source sofware. [Tutorial at] AMTA-2012: the Tenth Biennial Conference of the Association for Machine Translation in the Americas. Proceedings, San Diego, CA, October 28 – November 1, 2012; presentation, 139 slides. [PDF of PPT, 2475KB]

(2012) Hai-Son Le, Thomas Lavergne, Alexandre Allauzen, Marianna Apidianaki, Li Gong, Aurélien Max, Artem Sokolov, Guillaume Wisniewski, & François Yvon: LIMSI @ WMT’12.  WMT 2012: 7th Workshop on Statistical Machine Translation. Proceedings of the workshop, June 7-8, 2012, Montréal, Canada; pp.330-337. [PDF, 189KB]

(2012) Aingeru Mayor, Mans Hulden, & Gorka Labaka: Developing an open-source FST grammar for verb chain transfer in a Spanish-Basque MT system. Proceedings of the 10th International Workshop on Finite State Methods and Natural Language Processing, Donostia-San Sebastián, July 23-25, 2012; pp.65-69. [PDF, 157KB]

(2012) Maja Popović: rgbF: an open source tool for n-gram based automatic evaluation of machine translation output. Prague Bulletin of Mathematical Linguistics 98, October 2012; pp.99-108. [PDF, 126KB]

(2012) V.M.Sánchez-Cartagena, F.Sánchez-Martínez, & J.A.Pérez-Ortiz: An open-source toolkit for integrating shallow-transfer rules into phrase-based statistical machine translation. In: Free/Open-Source Rule-Based Machine Translation, ed.Cristina Espańa-Bonet and Aarne  Ranta. Proceedings of a Workshop held in Gothenburg, 14-15 June, 2012; pp.41-54. [PDF, 620KB]

(2012) Max Silberztein, Tamás Váradi, & Marko Tadić: Open source multi-platform NooJ for NLP. Proceedings of COLING 2012: Demonstration Papers, Mumbai, December 2012; pp. 401-408. [PDF, 557KB]

(2012) Andrejs Vasiļjevs, Markus Forsberg, Tatiana Gornostay, Dorte H.Hansen, Kristin M.Jóhannsdóttir, Krister Lindén, Gunn I.Lyse, Lene Offersgaard, Ville Oksanen, Sussi Olsen, Bolette S.Pedersen, Eiríkur Rögnvaldsson, Roberts Rozis, Inguna Skadiņa, & Koenraad De Smet: Creation of an open shared language resource repository in the Nordic and Baltic countries.  LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.1076-1083. [PDF, 1336KB]

(2012) Xianchao Wu, Takuya Matsuzaki, & Jun’ichi Tsujii: Akamon: an open source toolkit for tree/forest-based statistical machine translation. [ACL 2012] Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju, Republic of Korea, 10 July 2012, System Demonstrations; pp.127-132. [PDF, 501KB]

(2012) Joern Wuebker, Matthias Huck, Stephan Peitz, Malte Nuhn, Markus Freitag, Jan-Thorsten Peter, Saab Mansour, & Hermann Ney: Jane 2: open source phrase-based and hierarchical statistical machine translation. Proceedings of COLING 2012: Demonstration Papers, Mumbai, December 2012; pp. 483-491. [PDF, 160KB]

(2012) Tong Xiao, Jingbo Zhu, Hao Zhang, & Qiang Li: NiuTrans: an open source toolkit for phrase-based and syntax-based machine translation. [ACL 2012] Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju, Republic of Korea, 10 July 2012, System Demonstrations; pp.19-24. [PDF, 258KB]

(2012) MosesCore: Moses open source evaluation and support co-ordination for outreach and exploitation.  [Project paper at] EAMT 2012: Proceedings of the 16th Annual Conference of the European Association for Machine Translation, Trento, Italy, May 28-30 2012, ed. Mauro Cettolo, Marcello Federico, Lucia Specia, Andy Way; p.201. [PDF, 76KB]

(2011) Steven Abney & Steven Bird: Towards a data model for the universal corpus. ACL 2011: Proceedings of the Fourth Workshop on Building and Using Comparable Corpora, Portland, Oregon, USA, 24 June 2011; pp.120-127. [PDF, 118KB]

(2011) Martha Dís Brandt, Hrafn Loftsson, Hlynur Sigurţórsson, & Francis M.Tyers: Apertium-IceNLP: a rule-based Icelandic to English machine translation system. [EAMT 2011]: proceedings of the 15th conference of the European Association for Machine Translation, 30-31 May 2011, Leuven, Belgium; eds. Mikel L.Forcada, Heidi Depraetere, Vincent Vandeghinste; pp.217-224. [PDF, 332KB]; presentation, 29 slides [PDF]

(2011) Josep M.Crego, François Yvon, & José B.Marińo: Ncode: an open source bilingual n-gram SMT toolkit. Sixth Machine Translation Marathon, 5-10 September 2011, Trento; Prague Bulletin of Mathematical Linguistics, no.96, October 2011; pp.49-58. [PDF, 190KB]; presentation, 59 slides [PDF of PPT, 878KB]

(2011) Philipp Koehn: Moses statistical machine translation system.  META-FORUM 2011: Solutions for multilingual Europe, June 27/28 2011, Hotel Marriott, Budapest, Hungary; 12 slides [PDF of PPT, 492KB]

(2011) Aaron B.Phillips & Ralf D.Brown: Training machine translation with a second-order Taylor approximation of weighted translation instances. MT Summit XIII: the Thirteenth Machine Translation Summit [organized by the] Asia-Pacific Association for Machine Translation (AAMT), 19-23 September 2011, Xiamen, China; pp.40-47. [PDF, 198KB]

(2011) Philippe Lacour, Any Freitas, Aurélien Bénel, Franck Eyraud & Diana Zambon: Translation and the new digital commons. Tralogy, Paris, 3-4 March 2011; 16 pp. [PDF, 313KB]

(2011) Stelios Piperidis: META-SHARE: an open resource exchange infrastructure for stimulating research and innovation.  META-FORUM 2011: Solutions for multilingual Europe, June 27/28 2011, Hotel Marriott, Budapest, Hungary; 37 slides [PDF of PPT, 5302KB]

(2011) Maja Popović: Hjerson: an open source tool for automatic error classification of machine translation output. Sixth Machine Translation Marathon, 5-10 September 2011, Trento; Prague Bulletin of Mathematical Linguistics, no.96, October 2011; pp.59-67. [PDF, 121KB]; presentation, 20 slides [PDF of PPT, 226KB]

(2011) Felipe Sánchez-Martínez and Juan Antonio Pérez-Ortiz (eds.): Proceedings of the Second International Workshop on Free/Open-Source Rule-Based Machine Translation, 20-21 January 2011, Barcelona, Spain

(2011) Daniel Stein, David Vilar, Stephan Peitz, Markus Freitag, Matthias Huck, & Hermann Ney: A guide to Jane, an open source hierarchical translation toolkit. Prague Bulletin of Mathematical Linguistics, no.95, April 2011; pp.5-18. [PDF, 192KB]

(2011) Andrejs Vasiļjevs, Raivis Skadiņš, & Jörg Tiedemann: LetsMT!: cloud-based platform for building user tailored machine translation engines. MT Summit XIII: the Thirteenth Machine Translation Summit [organized by the] Asia-Pacific Association for Machine Translation (AAMT), 19-23 September 2011, Xiamen, China; pp.507-511. [PDF, 211KB]

(2011) Jonathan Weese, Juri Ganitkevitch, Chris Callison-Burch, Matt Post, & Adam Lopez: Joshua 3.0: syntax-based machine translation with the Thrax grammar extractor. [WMT 2011] Proceedings of the 6th Workshop on Statistical Machine Translation, Edinburgh, Scotland, UK, July 30-31, 2011; pp.478-484. [PDF, 164KB]

(2011) Omar F.Zaidan: MAISE: a flexible, configurable, extensible open source package for mass AI system evaluation. [WMT 2011] Proceedings of the 6th Workshop on Statistical Machine Translation, Edinburgh, Scotland, UK, July 30-31, 2011; pp.130-134. [PDF, 108KB]

(2011) [LIHMT 2011] Introductions, and About the OpenMT-2 project; 1p. [PDF, 124KB]

(2010) proceedings of Fourth Machine Translation Marathon “Open Source Tools for Machine Translation”, 25-30 January, Dublin, Ireland; Prague Bulletin of Mathematical Linguistics, no.93, January 2010.

(2010) Loďc Barrault: MANY: open source MT system combination at WMT’10. ACL 2010: Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR. Proceedings of the workshop, 15-16 July 2010, Uppsala University, Uppsala, Sweden; pp. 271-275. [PDF, 101KB]

(2010) Nicola Bertoldi: IRSTLM toolkit. Presentation at Fifth Machine Translation Marathon, 13-18 September 2010, University of Le Mans; 34 slides. [PDF, 381KB]

(2010) Anton Bryl & Josef van Genabith: f-align: an open-source alignment tool for LFG f-structures. AMTA 2010: the Ninth conference of the Association for Machine Translation in the Americas, Denver, Colorado, October 31 – November 4, 2010; 8pp. [PDF, 229KB]

(2010) Dana Dannélls & John J.Camilleri: Verb morphology of Hebrew and Maltese – towards an open source type theoretical resource grammar in GF. LREC 2010: Workshop on Language Resources and Human Language Technology for Semitic Languages, Valletta, Malta, 17 May 2010; pp. 57-61. [PDF, 326KB]

(2010) Jo Drugan & Bogdan Babych: Shared resources, shared values? Ethical implications of sharing translation resources. JEC 2010: Second joint EM+/CNGL WorkshopBringing MT to the user: research on integrating MT in the translation industry”, AMTA 2010, Denver, Colorado, November 4, 2010; pp.3-9. [PDF, 9,433KB]

(2010) Chris Dyer, Adam Lopez, Juri Ganitkevitch, Jonathan Weese, Ferhan Ture, Phil Blunsom, Hendra Setiawan, Vladimir Eidelman, & Philip Resnik: cdec: a decoder, alignment, and learning framework for finite-state and context-free translation models. Proceedings of the ACL 2010 System Demonstrations, Uppsala, Sweden, 13 July 2010; pp.7-12. [PDF, 171KB]

(2010) Christian Federmann: Appraise: an open-source toolkit for manual phrase-based evaluation of translations. LREC 2010: proceedings of the  seventh international conference on Language Resources and Evaluation, 17-23 May 2010, Valletta, Malta; pp.1731-1734. [PDF, 368KB]

(2010) Christian Federmann & Andreas Eisele: MT Server Land: an open-source MT architecture. Fifth Machine Translation Marathon, 13-18 September, University of Le Mans; Prague Bulletin of Mathematical Linguistics, no.94, September 2010; pp.57-66. [PDF, 226KB]; presentation [PDF, 509KB]

(2010) Mikel L.Forcada, Boyan Ivanov Bonev, Sergio Ortiz Rojas, Juan Antonio Pérez Ortiz, Gema Ramírez Sánchez, Felipe Sánchez Martínez, Carme Armentano-Oller, Marco A.Montava, & Francis M.Tyers: Documentation of the open-source shallow-transfer machine translation platform Apertium; ed. Mireia Ginestí Rosell. Departament de Llenguatges i Sistemes Informŕtics, Universitat d’Alacant, March 10, 2010; 214pp. [PDF, 700KB]

(2010) Mikel L.Forcada: Free/open-source machine translation: the Apertium platform. Translingual Europe 2010, Hotel Maritim, Berlin, Germany, Monday June 7th 2010; 17pp. [PDF, 105KB]

(2010) Philipp Koehn & Hieu Hoang: Moses: machine translation with open source software. Tutorial at AMTA 2010: the Ninth conference of the Association for Machine Translation in the Americas, Denver, Colorado, November 4, 2010; 29 slides [PDF of PPT, 520KB]

(2010) Zhifei Li, Chris Callison-Burch, Chris Dyer, Juri Ganitkevitch, Ann Irvine, Sanjeev Khudanpur, Lane Schwartz, Wren N.G.Thornton, Ziyuan Wang, Jonathan Weese, & Omar F.Zaidan: Joshua 2.0: a toolkit for parsing-based machine translation with syntax, semirings, discriminative training and other goodies. ACL 2010: Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR. Proceedings of the workshop, 15-16 July 2010, Uppsala University, Uppsala, Sweden; pp. 133-137. [PDF, 101KB]

(2010) Aaron B.Phillips: The Cunei machine translation platform for WMT’10. ACL 2010: Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR. Proceedings of the workshop, 15-16 July 2010, Uppsala University, Uppsala, Sweden; pp. 149-154. [PDF, 121KB]

(2010) Stelios Piperidis: META-SHARE: the open resource exchange facility. META-FORUM 2010: Challenges for multilingual Europe, November 17/18 2010, Brussels, Belgium; 32 slides [PDF of PPT, 6471KB]

(2010) Ting Qian, Kristy Hollingshead, Su-youn Yoon, Kyoung-young Kim, & Richard Sproat: A Python toolkit for universal transliteration. LREC 2010: proceedings of the  seventh international conference on Language Resources and Evaluation, 17-23 May 2010, Valletta, Malta; pp.2897-2901. [PDF, 576KB]

(2010) Aarne Ranta, Krasimir Angelov, & Thomas Hallgren: Tools for multilingual grammar-based translation on the web. Proceedings of the ACL 2010 System Demonstrations, Uppsala, Sweden, 13 July 2010; pp.66-71. [PDF, 313KB]

(2010) Manny Rayner, Pierrette Bouillon, Nikos Tsourakis, Johanna Gerlach, Maria Georgescul, Yukie Nakao, & Claudia Baur: A multilingual CALL game based on speech translation. LREC 2010: proceedings of the  seventh international conference on Language Resources and Evaluation, 17-23 May 2010, Valletta, Malta; pp.1531-1538. [PDF, 573KB]

(2010) Antoine Rey: GlobalSight MT integration. Translingual Europe 2010, Hotel Maritim, Berlin, Germany, Monday June 7th 2010; 5pp. [PDF, 1048KB]

(2010) Achim Ruopp: The Moses for Localization open source project. AMTA 2010: the Ninth conference of the Association for Machine Translation in the Americas, Denver, Colorado, October 31 – November 4, 2010; 4pp. [PDF, 111KB]

(2010 Lane Schwartz: Reproducible results in parsing-based machine translation: the JHU shared task submission.  ACL 2010: Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR. Proceedings of the workshop, 15-16 July 2010, Uppsala University, Uppsala, Sweden; pp. 177-182. [PDF, 101KB]

(2010) Daniel Stein, David Vilar, Stephan Peitz, & Hermann Ney: Jane: a guide to RWTH’s hierarchical machine translation toolkit. Presentation at Fifth Machine Translation Marathon, 13-18 September, University of Le Mans; 29 slides [PDF, 175KB]

(2010) Josef van Genabith: EuroMatrixPlus – evaluation, localisation, open source. Translingual Europe 2010, Hotel Maritim, Berlin, Germany, Monday June 7th 2010; 24pp. [PDF, 2818KB]

(2010) David Vilar, Daniel Stein, Matthias Huck, & Hermann Ney: Jane: open source hierarchical translation, extended with reordering and lexicon models. ACL 2010: Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR. Proceedings of the workshop, 15-16 July 2010, Uppsala University, Uppsala, Sweden; pp. 262-270. [PDF, 136KB]

(2010) E.Yuste, M.Herranz, A.-L.Lagarda, L.Tarazón, I.Sánchez-Cortina, & F.Casacuberta: PangeaMT – putting open standards to work…well. AMTA 2010: the Ninth conference of the Association for Machine Translation in the Americas, Denver, Colorado, October 31 – November 4, 2010; 8pp. [PDF, 480KB]

Parallel text corpora see Bilingual corpora, Multilingual corpora

Pruning see Cleaning and filtering

Scarce resources (see also Language resources, Rapid development of MT)

(2014) Burak Aydın & Arzucan Özgür: Expanding machine translation training data with an out-of-domain corpus using language modeling based vocabulary saturation. AMTA 2014: proceedings of the eleventh conference of the Association for Machine Translation in the Americas, Vancouver, BC, October 22-26; pp.180-192. [PDF, 523KB]

(2014) Peter Baumann & Janet Pierrehumbert: Using resource-rich languages to improve morphological analysis of under-resourced languages. LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.3355-3359. [PDF, 143KB]

(2014) Daniel Beck, Kashif Shah, & Lucia Specia: SHEF-Lite 2.0: sparse multi-task Gaussian processes for translation quality estimation. [WMT 2014] Proceedings of the Ninth Workshop on Statistical Machine Translation, Baltimore, Maryland USA, June 26-27, 2014; pp.307-312. [PDF, 306KB]

(2014) Alex Rudnick, Taylor Skidmore, Alberto Samaniego, & Michael Gasser: Guampa: a toolkit for collaborative translation. LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.1659-1663. [PDF, 284KB]

(2014) Raphael Rubino, Antonio Toral, Nikola Ljubešić, & Gema Ramírez-Sánchez: Quality estimation for synthetic parallel data generation. LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.1843-1849. [PDF, 248KB]

(2014) Raivis Skadiņš, Mārcis Pinnis, Andrejs Vasiļjevs, Inguna Skadiņa, & Tomas Hudik: Application of machine translation in localization into low-resourced languages.  Proceedings of the 17th annual conference of the European Association for Machine Translation, EAMT 2014, Dubrovnik, Croatia, 16th-18th June 2014, edited by Marko Tadić, Philipp Koehn, Johann Roturier, Andy Way; pp.209-216. [PDF, 871KB]

(2013) Haithem Afli, Loďc Barrault & Holger Schwenk: Multimodal comparable corpora as resources for extracting parallel data: parallel phrases extraction. International Joint Conference on Natural Language Processing, Nagoya, Japan, 14-18 October 2013; pp.286-292. [PDF, 1586KB]

(2013) Qing Dou & Kevin Knight: Dependency-based decipherment for resource-limited machine translation. [EMNLP 2013] Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, Washington, USA, 18-21 October 2013; pp.1668-1676. [PDF, 153KB]

(2013) Mirela-Ştefania Duma & Cristina Vertan: Integration of machine translation in on-line multilingual applications – domain adaptation. Translation: Computation, Corpora,  Cognition 3 (1), June 2013; pp.67-74. [PDF, 395KB]

(2013) Ahmed El Kholy, Nizar Habash, Gregor Leusch, Evgeny Matusov, & Hassan Sawaf: Language independent connectivity strength features for phrasal pivot statistical machine translation.  ACL-2013: Proceedings of the 51st Meeting of the Association for Computational Linguistics, Short papers, Sofia, Bulgaria, August 4-9 2013; pp.412-418. [PDF, 200KB]; revised version.

(2013) Ankur Gandhe & Rashmi Gangadharaiah: Hypothesis refinement using agreement constraints in machine translation. International Joint Conference on Natural Language Processing, Nagoya, Japan, 14-18 October 2013; pp.429-437. [PDF, 170KB]

(2013) Ann Irvine & Chris Callison-Burch: Combining bilingual and comparable corpora for low resource machine translation. WMT 2013: 8th Workshop on Statistical Machine Translation, Proceedings of the Workshop, August 8-9, 2013, Sofia, Bulgaria; pp.262-270. [PDF, 225KB]

(2013) Ann Irvine: Statistical machine translation in low resource settings. [NAACL-HLT 2013] Proceedings of the NAACL HLT 2013 Student Research Workshop, 13 June 2013, Atlanta, Georgia; pp.54-61. [PDF, 185KB]

(2013) Igor Leturia, Kepa Sarasola, Xabier Arregi, Arantza Diaz de Ilarraza, Eva Navas, Ińaki Sainz, Arantza del Pozo, David Baranda, & Urtza Iturraspe: The BerbaTek project for Basque: promoting a less-resourced language via language technology for translation, content management and learning. Translation: Computation, Corpora,  Cognition 3 (1), June 2013; pp.119-135. [PDF, 785KB]

(2013) Khang Nhut Lam & Jugal Kalita: Creating reverse bilingual dictionaries. [NAACL-HLT 2013] The 2013 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 9-14 June 2013, Atlanta, Georgia; pp.524-528. [PDF, 138KB]

(2013) Lian Tze Lim, Lay-Ki Soon, Tek Yong Lim, Enya Kong Tang, & Bali Ranaivo-Malançon: Context-dependent multilingual lexical lookup for under-resourced languages.  ACL-2013: Proceedings of the 51st Meeting of the Association for Computational Linguistics, Short papers, Sofia, Bulgaria, August 4-9 2013; pp.294-299. [PDF, 373KB]

(2013) Oscar Täckström, Ryan McDonald, & Joakim Nivre: Target language adaptation of discriminative transfer parsers. [NAACL-HLT 2013] The 2013 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 9-14 June 2013, Atlanta, Georgia; pp.1061-1071. [PDF, 258KB]

(2013) Oscar Täckström, Dipanjan Das, Slav Petrov, Ryan McDonald, & Joakim Nivre: Token and type constraints for cross-lingual part-of-speech tagging. Transactions of the Association for Computational Linguistics 1 (2013); pp.1-12 [PDF, 3217KB]

(2013) Jörg Tiedemann & Preslav Nakov: Analyzing the use of character-level translation with sparse and noisy datasets. Proceedings of Recent Advances in Natural  Language Processing, Hissar, Bulgaria, 7-13 September 2013; pp.676-684. [PDF, 133KB]

(2013) Mo Yu, Tiejun Zhao, Yalong Bai, Hao Tian, & Dianhai Yu: Cross-lingual projections between languages from differenr families. ACL-2013: Proceedings of the 51st Meeting of the Association for Computational Linguistics, Short papers, Sofia, Bulgaria, August 4-9 2013; pp.312-317. [PDF, 187KB]

(2012) Khan Md.Anwarus Salam, Setsuo Yamada, & Tetsuro Nishino: Sublexical translations for low-resource language. COLING 2012: Proceedings of the Workshop on Machine Translation and Parsing in Indian Languages (MTPIL-2012), Mumbai, December 2012; pp.39-51. [PDF, 514KB]

(2012) Damir Cávar: Bootstrapping NLP and MT resources for under-resourced languages. In: Crosslingual Language Technology in service of an integrated multilingual Europe, 4-5 May 2012, University of Hamburg, Germany; p.31. [PDF]

(2012) Sherri Condon, Luis Hernandez, Dan Parvaz, Mohammad S.Khan, & Hazrat Jahed: Producing data for under-resourced languages: a Dari-English parallel corpus of multi-genre text. AMTA-2012: the Tenth Biennial Conference of the Association for Machine Translation in the Americas. Proceedings, San Diego, CA, October 28 – November 1, 2012; 10p. [PDF, 602KB]; abstract, 1p. [PDF, 12KB]

(2012) Doren Singh, Thoudam: Addressing some issues of data sparsity towards improving English-Manipuri SMT using morphological information. AMTA-2012: Monolingual machine translation-2012 workshop. Proceedings, San Diego, November 1, 2012; pp.46-54. [PDF, 454KB]

(2012) Greg Durrett, Adam Pauls, & Dan Klein: Syntactic transfer using a bilingual lexicon. EMNLP-CoNLL 2012: Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Proceedings of the conference, July 12-14, Jeju Island, Korea; pp.1-11. [PDF, 286KB]

(2012) Georgi Iliev & Angel Genov: Expanding parallel resources for medium-density languages for free.  LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.3937-3943. [PDF, 321KB]

(2012) Anoop Kunchukuttan, Shourya Roy, Pratik Patel, Kushal Ladha, Somya Gupta, Mitesh Khapra, & Pushpak Bhattacharyya: Experiences in resource generation for machine translation through crowdsourcing.  LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.384-391. [PDF, 467KB]

(2012) Takanori Kusumoto & Tomoyosi Akiba: Statistical machine translation without a source-side parallel corpus using word lattice and phrase extension.  LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.3929-3932. [PDF, 352KB]

(2012) Septina Dian Larasati: Improving word alignment by exploiting adapted word similarity. AMTA-2012: Monolingual machine translation-2012 workshop. Proceedings, San Diego, November 1, 2012; pp.41-45. [PDF, 227KB]

(2012) William D.Lewis & Phong Yang: Building MT for a severely under-resourced language: White Hmong. AMTA-2012: the Tenth Biennial Conference of the Association for Machine Translation in the Americas. Proceedings, San Diego, CA, October 28 – November 1, 2012; 10pp. [PDF, 124KB]

(2012) Preslav Nakov & Hwee Tou Ng: Improving statistical machine translation for a resource-poor language using related resource-rich languages. Journal of Artificial Intelligence Research 44 (2012); pp.179-222. [PDF, 421KB]

(2012) Lene Offersgaard & Dorte Haltrup Hansen: SMT systems for less-resourced languages based on domain-specific data.  [BUCC 2012] The 5th Workshop on Building and Using Comparable Corpora: “Language Resources for Machine Translation in Less-Resourced Languages and Domains”,  LREC 2012 Workshop, 26 May 2012, Istanbul, Turkey; pp.75-80. [PDF, 380KB]

(2012) Matt Post, Chris Callison-Burch, & Miles Osborne: Constructing parallel corpora for six Indian languages via crowdsourcing. WMT 2012: 7th Workshop on Statistical Machine Translation. Proceedings of the workshop, June 7-8, 2012, Montréal, Canada; pp.401-409. [PDF, 388KB]

(2012) Xabier Saralegi, Iker Manterola, & Ińaki San Vicente: Building a Basque-Chinese dictionary by using English as pivot.  LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.1443-1447. [PDF, 641KB]

(2012) Inguna Skadiņa: Analysis and evaluation of comparable corpora for under-resourced areas of machine translation.  [BUCC 2012] The 5th Workshop on Building and Using Comparable Corpora: “Language Resources for Machine Translation in Less-Resourced Languages and Domains”,  LREC 2012 Workshop, 26 May 2012, Istanbul, Turkey; pp.17-19. [PDF, 370KB]

(2012) Sokratis Sofianopoulos, Marina Vassiliou, & George Tambouratzis: Implementing a language-independent MT methodology. [ACL 2012] Proceedings of the First Workshop on Multilingual Modeling, Jeju, Republic of Korea, 8-14 July 2012; pp.1-10. [PDF, 285KB]

(2012) Fangzhong Su & Bogdan Babych: Measuring comparability of documents in non-parallel corpora for efficient extraction of (semi-)parallel translation equivalents. EACL Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra): Proceedings of the workshop, 23-24 April 2012, Avignon, France; pp.10-19. [PDF, 188KB]

(2012) Feifei Zhai, Jiajun Zhang, Yu Zhou, & Chengqing Zong: Tree-based translation without using parse trees. Proceedings of COLING 2012: Technical Papers, Mumbai, December 2012; pp.3037-3054. [PDF, 1800KB]

(2012) Jörg Tiedemann: Character-based pivot translation for under-resourced languages and domains. [EACL 2012] Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, Avignon, France, April 23-27, 2012; pp.141-151. [PDF, 254KB]

(2012) Antonio Toral: Pivot-based machine translation between statistical and black box systems. EAMT 2012: Proceedings of the 16th Annual Conference of the European Association for Machine Translation, Trento, Italy, May 28-30 2012, ed. Mauro Cettolo, Marcello Federico, Lucia Specia, Andy Way; pp.321-328. [PDF, 187KB]

(2012) Andrius Utka: Multilingual resources and their application for the Lithuanian language [abstract]. In: Crosslingual Language Technology in service of an integrated multilingual Europe, 4-5 May 2012, University of Hamburg, Germany; p.23. [PDF]

(2012) Pidong Wang, Preslav Nakov, & Hwee Tou Ng: Source language adaptation for resource-poor machine translation. EMNLP-CoNLL 2012: Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Proceedings of the conference, July 12-14, Jeju Island, Korea; pp.286-296. [PDF, 171KB]

(2012) Ping Xu & Pascale Fung: Cross-lingual language modelling with syntactic reordering for low-resource speech recognition. EMNLP-CoNLL 2012: Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Proceedings of the conference, July 12-14, Jeju Island, Korea; pp.766-776. [PDF, 275KB]

(2012) Xiaoning Zhu, Yiming Cui, Conghui Zhu, Tiejun Zhao, & Hailong Cao: The HIT-LTRC machine translation system for IWSLT 2012. IWSLT-2012: 9th International Workshop on Spoken Language Translation, Hong Kong, December 6th-7th, 2012; pp.77-80. [PDF, 576KB]; presentation, 14 slides [PDF of PPT, 1018KB]

(2012) ACCURAT: Analysis and evaluation of comparable corpora for under resourced areas of machine translation. [Project paper at] EAMT 2012: Proceedings of the 16th Annual Conference of the European Association for Machine Translation, Trento, Italy, May 28-30 2012, ed. Mauro Cettolo, Marcello Federico, Lucia Specia, Andy Way; p.205. [PDF, 72KB]

(2012) [BUCC 2012] The 5th Workshop on Building and Using Comparable Corpora: “Language Resources for Machine Translation in Less-Resourced Languages and Domains”,  LREC 2012 Workshop, 26 May 2012, Istanbul, Turkey

(2012) proceedings of SALTMIL 2012, “Language technology for normalisation of less-resourced languages”, Istanbul, Turkey, May 22 2012. [PDF, 14500KB]

(2011) Vamshi Ambati, Sanjika Hewavitharana, Stephan Vogel, & Jaime Carbonell: Active learning with multiple annotations for comparable data classification task. ACL 2011: Proceedings of the Fourth Workshop on Building and Using Comparable Corpora, Portland, Oregon, USA, 24 June 2011; pp.69-77. [PDF, 201KB]

(2011) Vamshi Ambati, Stephan Vogel, & Jaime Carbonell: Multi-strategy approaches to active learning for statistical machine translation.  MT Summit XIII: the Thirteenth Machine Translation Summit [organized by the] Asia-Pacific Association for Machine Translation (AAMT), 19-23 September 2011, Xiamen, China; pp.122-129. [PDF, 432KB]

(2011) Sankaranarayanan Ananthakrishnan, Shiv Vitaladevuni, Rohit Prasad, & Prem Natarajan: Source error-projection for sample selection in phrase-based SMT for resource-poor languages. [IJCNLP 2011] Proceedings of the 5th International Joint Conference on Natural Language Processing, Chiang Mai, Thailand, November 8-13, 2011; pp.819-827. [PDF, 425KB]

(2011) Khan Md. Anwarus Salam, Setsuo Yamada, & Tetsuro Nishino: Example-based machine translation for low-resource language using chunk-string templates. MT Summit XIII: the Thirteenth Machine Translation Summit [organized by the] Asia-Pacific Association for Machine Translation (AAMT), 19-23 September 2011, Xiamen, China; pp.488-495. [PDF, 531KB]

(2011) Ondřej Bojar & Aleš Tamchyna: Improving translation model by monolingual data . [WMT 2011] Proceedings of the 6th Workshop on Statistical Machine Translation, Edinburgh, Scotland, UK, July 30-31, 2011; pp.330-336. [PDF, 107KB]

(2011) Alexandru Ceauşu & Dan Tufiş: Addressing SMT data sparseness when translating into morphologically-rich languages. Proceedings of the 8th international NLPSC workshop. Special theme: Human-machine interaction in translation, Copenhagen Business School, 20-21 August 2011; ed.Bernadette Sharp, Michael Zock, Michael Carl, Arnt Lykke Jakobsen (Copenhagen Studies in Language 41), Frederiksberg: Samfundslitteratur, 2011; pp.57-68. [PDF, 1886KB]

(2011) Marta R.Costa-jussŕ, Carlos Henríquez, & Rafael E.Banchs: Enhancing scarce-resource language translation through pivot combinations. [IJCNLP 2011] Proceedings of the 5th International Joint Conference on Natural Language Processing, Chiang Mai, Thailand, November 8-13, 2011; pp.1361-1365. [PDF, 153KB]

(2011) Sandipan Dandapat, Sara Morrissey, Andy Way, & Mikel L.Forcada: Using example-based MT to support statistical MT when translating homogeneous data in a resource-poor setting. [EAMT 2011]: proceedings of the 15th conference of the European Association for Machine Translation, 30-31 May 2011, Leuven, Belgium; eds. Mikel L.Forcada, Heidi Depraetere, Vincent Vandeghinste; pp.201-208. [PDF, 413KB]; presentation, 24 slides [PDF]

(2011) Vladimir Eidelman, Kristy Hollingshead, & Philip Resnik: Noisy SMS machine translation in low-density languages. [WMT 2011] Proceedings of the 6th Workshop on Statistical Machine Translation, Edinburgh, Scotland, UK, July 30-31, 2011; pp.344-350. [PDF, 133KB]

 (2011) Monica Gavrila: Constrained recombination in an example-based machine translation system. [EAMT 2011]: proceedings of the 15th conference of the European Association for Machine Translation, 30-31 May 2011, Leuven, Belgium; eds. M ikel L.Forcada, Heidi Depraetere, Vincent Vandeghinste; pp.193-200. [PDF, 417KB]; presentation, 36 slides [PDF]

(2011) Monica Gavrila & Natalia Elita: Experiments with small-sized corpora in CBMT. [RANLP 2011] Proceedings of the Student Research Workshop associated with RANLP 2011, Hissar, Bulgaria, 13 September 2011; pp.67-72. [PDF, 113KB]

(2011) Sanjika Hewavitharana, Nguyen Bach, Qin Gao, Vamshi Ambati, & Stephan Vogel: CMU Haitian Creole-English translation system for WMT 2011. [WMT 2011] Proceedings of the 6th Workshop on Statistical Machine Translation, Edinburgh, Scotland, UK, July 30-31, 2011; pp.386-392. [PDF, 122KB]

(2011) Deirdre Hogan, Jennifer Foster, & Josef van Genabith: Decreasing lexical data sparsity in statistical syntactic parsing – experiments with named entities. Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World (MWE 2011), Portland, Oregon, USA, 23 June 2011; pp.14-19. [PDF, 97KB]

(2011) Suhel Jaber, Sara Tonelli, & Rodolfo Delmonte: Venetan to English machine translation: issues and possible solutions. Proceedings of the 8th international NLPSC workshop. Special theme: Human-machine interaction in translation, Copenhagen Business School, 20-21 August 2011; ed.Bernadette Sharp, Michael Zock, Michael Carl, Arnt Lykke Jakobsen (Copenhagen Studies in Language 41), Frederiksberg: Samfundslitteratur, 2011; pp.69-80. [PDF, 873KB]

(2011) Mitesh M.Khapra, Salil Joshi, Arindam Chatterjee, & Pushpak Bhattacharyya: Together we can: bilingual bootstrapping for WSD. ACL-HLT 2011: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, Portland, Oregon, June 19-24, 2011; pp.561-569. [PDF, 137KB]

(2011) Kiran Kumar N, Santosh GSK, & Vasudeva Varma: A language-independent approach to identify the named entities in under-resourced languages, and clustering multilingual documents. CLEF 2011: Conference on Multilingual and Multimodal Iinformation Access Evaluation, 19-22 September 2011, Amsterdam; 29slides [PDF of PPT, 185KB]

(2011) William D.Lewis, Robert Munro, & Stephan Vogel: Crisis MT: developing a cookbook for MT in crisis situations. [WMT 2011] Proceedings of the 6th Workshop on Statistical Machine Translation, Edinburgh, Scotland, UK, July 30-31, 2011; pp.501-511. [PDF, 115KB]

 (2011) Jeff Ma, Spyros Matsoukas, & Richard Schwartz: Improving low-resource statistical machine translation with a novel semantic word clustering algorithm. MT Summit XIII: the Thirteenth Machine Translation Summit [organized by the] Asia-Pacific Association for Machine Translation (AAMT), 19-23 September 2011, Xiamen, China; pp.352-359. [PDF, 160KB]

 (2011) Victor M.Sánchez-Cartagnena, Felipe Sánchez-Martínez, & Juan Antonio Pérez-Ortiz: Enriching a statistical machine translation system trained on small parallel corpora with rule-based bilingual phrases. [RANLP 2011] Proceedings of Recent Advances in Natural Language Processing, Hissar, Bulgaria, 12-14 September 2011; pp.90-96. [PDF, 125KB]

(2011) Reshef Shilon, Nizar Habash, Alon Lavie, & Shuly Wintner: Machine translation between Hebrew and Arabic: needs, challenges and preliminary solutions. Machine Translation and Morphologically- rich Languages: Research Workshop of the Israel Science Foundation, University of Haifa, Israel, 23 January, 2011; 1p. [PDF, 54KB]

(2011) Raivis Skadiņš, Maris Puriņš, Inguna Skadiņa, & Andrejs Vasiļjevs: Evaluation of SMT in localization to under-resourced inflected language. [EAMT 2011]: proceedings of the 15th conference of the European Association for Machine Translation, 30-31 May 2011, Leuven, Belgium; eds. Mikel L.Forcada, Heidi Depraetere, Vincent Vandeghinste; pp.35-40. [PDF, 287KB]; presentation, 17 slides [PDF, 796KB]

 (2011) Zhiyang Wang, Yajuan Lü, & Qun Liu: Multi-granularity word alignment and decoding for agglutinative language translation. MT Summit XIII: the Thirteenth Machine Translation Summit [organized by the] Asia-Pacific Association for Machine Translation (AAMT), 19-23 September 2011, Xiamen, China; pp.360-367. [PDF, 169KB]

(2011) Jia Xu & Weiwei Sun: Generating virtual parallel corpus: a compatibility centric method. MT Summit XIII: the Thirteenth Machine Translation Summit [organized by the] Asia-Pacific Association for Machine Translation (AAMT), 19-23 September 2011, Xiamen, China; pp.406-413. [PDF, 859KB]

(2011) ACCURAT: analysis and evaluation of comparable corpora for under resourced areas of machine translation. (European Machine Translation Projects.) [EAMT 2011]: proceedings of the 15th conference of the European Association for Machine Translation, 30-31 May 2011, Leuven, Belgium; eds. Mikel L.Forcada, Heidi Depraetere, Vincent Vandeghinste; p.323. [PDF, 287KB]

(2011) LetsMT! Platform for online sharing of training data and building user tailored MT. (European Machine Translation Projects.) [EAMT 2011]: proceedings of the 15th conference of the European Association for Machine Translation, 30-31 May 2011, Leuven, Belgium; eds. Mikel L.Forcada, Heidi Depraetere, Vincent Vandeghinste; p.337. [PDF, 66KB]

(2010) proceedings of 7th SaLTMiL Workshop on Creation and use of basic lexical resources for less-resourced languages, LREC-2010: proceedings of the seventh international conference on Language Resources and Evaluation, Valletta, Malta, 23 May 2010. [PDF, 916KB]

(2010) Sisay Adugna & Andreas Eisele: English-Oromo machine translation: an experiment using a statistical approach. LREC 2010: proceedings of the seventh international conference on Language Resources and Evaluation, 17-23 May 2010, Valletta, Malta; pp.2196-2199. [PDF, 368KB]

(2010) Vamshi Ambati, Stephen Vogel, & Jaime Carbonell: Active learning and crowd-sourcing for machine translation. LREC 2010: proceedings of the seventh international conference on Language Resources and Evaluation, 17-23 May 2010, Valletta, Malta; pp.2169-2174. [PDF, 436KB]

 (2010) Sankaranarayanan Ananthakrishnan, Rohit Prasad, David Stallard, & Prem Natarajan: Discriminative sample selection for statistical machine translation.  [EMNLP 2010] Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, MIT, Massachusetts, USA, 9-11 October 2010; pp.626-635. [PDF, 329KB]

(2010) Alberto Barrón-Cedeńo, Paolo Rosso, Eneko Agirre, & Gorka Labaka: Plagiarism detection across distant language pairs. Coling 2010: 23rd International Conference on Computational Linguistics. Proceedings of the conference, 23-27 August 2010, Beijing International Convention Center, Beijing, China; pp.37-45. [PDF, 274KB]

(2010) Ondřej Bojar, Pavel Straňák, & Daniel Zeman: Data issues in English-to-Hindi machine translation. LREC 2010: proceedings of the  seventh international conference on Language Resources and Evaluation, 17-23 May 2010, Valletta, Malta; pp.1771-1777. [PDF, 557KB]

(2010) Ondřej Bojar, Kamil Kos, & David Mareček: Tackling sparse data issue in machine translation evaluation. ACL 2010: the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, July 11-16, 2010: Proceedings of the Conference Short Papers; pp.86-91. [PDF, 132KB]

(2010) Bill Dolan: Building partnerships with language communities: the importance of shared technology and shared data. META-FORUM 2010: Challenges for multilingual Europe, November 17/18 2010, Brussels, Belgium; 38 slides [PDF of PPT, 1245KB]

(2010) Jinhua Du, Jie Jiang, & Andy Way: Facilitating translation using source language paraphrase lattices.  [EMNLP 2010] Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, MIT, Massachusetts, USA, 9-11 October 2010; pp.420-429. [PDF, 766KB]

(2010) Andreas Eisele & Jia Xu: Improving machine translation performance using comparable corpora.  [LREC 2010] Proceedings of the 3rd Workshop on Building and Using Comparable Corpora, Malta, 22 May 2010; pp.35-41. [PDF, 104KB]

(2010) Jan Hajic: Building bridges using innovative approaches in machine translation. META-FORUM 2010: Challenges for multilingual Europe, November 17/18 2010, Brussels, Belgium; 24 slides [PDF of PPT, 1109KB]

(2010) Md.Zahurul Islam, Jörg Tiedemann & Andreas Eisele: English to Bangla phrase-based machine translation.  EAMT 2010: Proceedings of the 14th Annual conference of the European Association for Machine Translation, 27-28 May 2010, Saint-Raphaël, France. Proceedings ed.Viggo Hansen and François Yvon; 8pp. [PDF, 601KB]

(2010) Sittichai Jiampojamarn, Kenneth Dwyer, Shane Bergsma, Aditya Bhargava, Qing Dou, Mi-Young Kim, & Grzegorz Kondrak: Transliteration generation and mining with limited training resources. NEWS 2010: Proceedings of the 2010 Named Entities Workshop, ACL 2010, Uppsala, Sweden, 16 July 2010; pp.39-47. [PDF, 143KB]

(2010) Jae-Hee Lee, Seung-Wook Lee, Gumwon Hong, Young-Sook Hwang, Sang-Bum Kim, & Hae-Chang Rim: A post-processing approach to statistical word alignment reflecting alignment tendency between part-of-speeches. Coling 2010: 23rd International Conference on Computational Linguistics, 23-27 August 2010, Beijing International Convention Center, Beijing, China, Posters volume; pp.623-629. [PDF, 438KB]

(2010) William Lewis: Haitian Creole: developing MT for a low data language. Translingual Europe 2010, Hotel Maritim, Berlin, Germany, Monday June 7th 2010; 24pp. [PDF, 3425KB]

(2010) William D.Lewis: Haitian Creole: how to build and ship an MT engine from scratch in 4 days, 17 hours, & 30 minutes. EAMT 2010: Proceedings of the 14th Annual conference of the European Association for Machine Translation, 27-28 May 2010, Saint-Raphaël, France. Proceedings ed.Viggo Hansen and François Yvon; 8pp. [PDF, 518KB]; presentation: 24 slides [PDF, 630KB]

(2010) Jan Niehues & Alex Waibel: Domain adaptation in statistical machine translation using factored translation models. EAMT 2010: Pro ceedings of the 14th Annual conference of the European Association for Machine Translation, 27-28 May 2010, Saint-Raphaël, France. Proceedings ed.Viggo Hansen and François Yvon; 7pp. [PDF, 581KB]; presentation: 27 slides [PDF, 887KB]

(2010) Reinhard Rapp & Michael Zock: The noisier the better: identifying multilingual word translations using a single monolingual corpus. [Coling 2010] Proceedings of the 4th Workshop on Cross Lingual Information Access, Beijing, China, 28 August 2010; pp.16-25. [PDF, 179KB]

(2010) Reinhard Rapp & Michael Zock: Utilizing citations of foreign words in corpus-based dictionary generation. [Coling 2010] Proceedings  of the Second Workshop on NLP Challenges in the Information Explosion Era, Beijing, China, 28 August 2010; pp.50-59. [PDF, 188KB]

 (2010) Tanja Schultz & Alan W.Black: Multilingual speech processing – rapid language adaptation tools and technologies. Interspeech 2010, Makuhari, Japan, 26-30 September 2010; 2pp. [PDF, 66KB]

 (2010) Libin Shen, Bing Zhang, Spyros Matsoukas, Jinxi Xu, & Ralph Weischedel: Statistical machine translation with a factorized grammar.  [EMNLP 2010] Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, MIT, Massachusetts, USA, 9-11 October 2010; pp.616-625. [PDF, 147KB]

(2010) Reshef Shilon, Nizar Habash, Alon Lavie, & Shuly Wintner: Machine translation between Hebrew and Arabic: needs, challenges and preliminary solutions. AMTA 2010: the Ninth conference of the Association for Machine Translation in the Americas, Denver, Colorado, October 31 – November 4, 2010; 10pp. [PDF, 141KB]

(2010) Inguna Skadiņa, Andrejs Vasiļjevs, Raivis Skadiņš, Robert Gaizauskas, Dan Tufiş, & Tatiana Gornostay: Analysis and evaluation of comparable corpora for under resourced areas of machine translation. [LREC 2010] Proceedings of the 3rd Workshop on Building and Using Comparable Corpora, Malta, 22 May 2010; pp.6-14. [PDF, 318KB]

(2010) Daniel Stein, Christoph Schmidt, & Hermann Ney: Sign language machine translation overkill. Proceedings of the 7th International Workshop on Spoken Language Translation, 2-3 December 2010, Paris, France; pp.337-344. [PDF, 2052KB]

(2010) Yulia Tsvetkov & Shuly Wintner: Automatic acquisition of parallel corpora from websites with dynamic content.  LREC 2010: proceedings of the  seventh international conference on Language Resources and Evaluation, 17-23 May 2010, Valletta, Malta; pp.3389-3392. [PDF, 447KB]

(2010) Francis M.Tyers: Rule-based Breton to French machine translation. EAMT 2010: Proceedings of the 14th Annual conference of the European Association for Machine Translation, 27-28 May 2010, Saint-Raphaël, France. Proceedings ed.Viggo Hansen and François Yvon; 8pp. [PDF, 553KB]

 (2010) Andrejs Vasiljevs: LetsMT! – towards cloud based service for MT generation. Translingual Europe 2010, Hotel Maritim, Berlin, Germany, Monday June 7th 2010; 15pp. [PDF, 368KB]

(2010) Karthik Visweswariah, Vijil Chenthamarakshan, & Nandakishore Kambhatla: Urdu and Hindi: translation and sharing of linguistic resources. Coling 2010: 23rd International Conference on Computational Linguistics, 23-27 August 2010, Beijing International Convention Center, Beijing, China, Posters volume; pp.1283-1291. [PDF, 217KB]

(2010) Bing Xiang, Yonggang Deng, & Bowen Zhou: Diversify and combine: improving word alignment for machine translation on low-resource languages. ACL 2010: the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, July 11-16, 2010: Proceedings of the Conference Short Papers; pp.22-26. [PDF, 86KB]

Software resources

(2013) Valeria Aliperta: Streamlining your workflow: useful desktop software and mobile applications for the interpreting and translation industry [abstract]. [Aslib 2013] Translating and the Computer 35, 28-29 November 2013, etc.venues, Paddington, London, UK; 1p.

(2011) Liang Tian, Fai Wong, & Sam Chao: Word alignment using GIZA++ on Windows. MT Summit XIII: the Thirteenth Machine Translation Summit [organized by the] Asia-Pacific Association for Machine Translation (AAMT), 19-23 September 2011, Xiamen, China; pp.369-372. [PDF, 192KB]

Sparse data see Scarce resources

Spoken language resources

(2014) Eunah Cho, Sarah Fünfer, Sebastian Stüker, & Alex Waibel: A corpus of spontaneous speech in lectures: the KIT lecture corpus for spoken language processing and translation.  LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.1554-1559. [PDF, 146KB]

(2013) Pierrette Bouillon, Johanna Gerlach, Ulrich Germann, Barry Haddow & Manny Rayner: Two approaches to correcting homophone confusion in a hybrid machine translation system. Proceedings of the Second Workshop on Hybrid Approaches to Translation, Sofia, Bulgaria, August 8, 2013; pp.109-116. [PDF, 298KB]

(2013) Matt Post, Gaurav Kumar, Adam Lopez, Damianos Karakos, Chris Callison-Burch, & Sanjeev Khudanpur: Improved speech-to-text translation with the Fisher and Callhome Spanish-English speech translation corpus. [IWSLT 2013] Proceedings of the 10th International Workshop on Spoken Language Translation, Heidelberg, Germany, Dec.5-6, 2013; 7pp. [PDF, 192KB]

(2013) Hiroaki Shimizu, Graham Neubig, Sakriani Sakti, Tomoki Toda, & Satoshi Nakamura: Constructing a speech translation system using simultaneous interpretation data. [IWSLT 2013] Proceedings of the 10th International Workshop on Spoken Language Translation, Heidelberg, Germany, Dec.5-6, 2013; 7pp. [PDF, 1041KB]

(2012) Sebastian Stüker, Florian Kraft, Christian Mohr, Teresa Herrmann, Eunah Cho, & Alex Waibel: The KIT lecture corpus for speech translation.  LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.3409-3414. [PDF, 1105KB]

(2010) Tanja Schultz & Alan W.Black: Multilingual speech processing – rapid language adaptation tools and technologies. Interspeech 2010, Makuhari, Japan, 26-30 September 2010; 2pp. [PDF, 66KB]

Translation memory see Index of aids and toools

Treebanks (see also Semantic analysis and representation, Thesaurus method)

(2014) Ann Bies, Justin Mott, Seth Kulick, Jennifer Garland, & Colin Warner: Incorporating alternate translations into English Translation Treebank.  LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.1863-1868. [PDF, 339KB]

(2013) Jiří Mírovský, Kateřina Rysová, Magdaléna Rysová, & Eva Hajičová: (Pre-)annotation of topic-focus articulation in Prague Czech-English dependency treebank. International Joint Conference on Natural Language Processing, Nagoya, Japan, 14-18 October 2013; pp.55-63. [PDF, 990KB]

(2012) Ondřej Bojar, Zdeněk Žabokrtský, Ondřej Dušek, Petra Galuščáková, Martin Majliš, David Mareček, Jiří Maršík, Michal Novák, Martin Popel, & Aleš Tamchyna: The joy of parallelism with CzEng 1.0.  LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.3921-3928. [PDF, 460KB]

(2012) Cristina Bosco, Manuela Sanguinetti, & Leonardo Lesmo: The Parallel-TUT: a multilingual and multiformat treebank.  LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.3982-3987. [PDF, 460KB]

(2012) Masood Ghayoomi: From grammar rule extraction to treebanking: a bootstrapping approach. LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.1912-1919. [PDF, 1203KB]

(2012) Jan Hajič, Eva Hajičová, Jarmila Panevová, Petr Sgall, Ondřej Bojar, Silvie Cinková, Eva Fučiková, Marie Mikulová, Petr Pajas, Jan Popelka, Jiři Semecký, Jana Šindlerová, Jan Štěpánek, Josef Toman, Zdeňka Urešová, & Zdeněk Žabokrtský: Announcing Prague Czech-English dependency treebank 2.0.  LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.3153-3160. [PDF, 544KB]

(2012) Gideon Kotzé, Vincent Vandeghinste, Scott Martens, & Jörg Tiedemann: Large aligned treebanks for syntax-based machine translation.  LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.467-473. [PDF, 498KB]

(2012) Xuansong Li, Stephanie Strassel, Stephen Grimes, Safa Ismael, Mohamed Maamouri, Ann Bies, & Nianwen Xue: Parallel aligned treebanks at LDC: new challenges interfacing existing infrastructures.  LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.1848-1855. [PDF, 740KB]

(2012) Annette Rios & Anne Göhring: A tree is a Baum is an árbol is a sach’a: creating a trilingual treebank.  LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.1874-1879. [PDF, 360KB]

(2012) Andrejs Vasiļjevs, Markus Forsberg, Tatiana Gornostay, Dorte H.Hansen, Kristin M.Jóhannsdóttir, Krister Lindén, Gunn I.Lyse, Lene Offersgaard, Ville Oksanen, Sussi Olsen, Bolette S.Pedersen, Eiríkur Rögnvaldsson, Roberts Rozis, Inguna Skadiņa, & Koenraad De Smet: Creation of an open shared language resource repository in the Nordic and Baltic countries.  LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.1076-1083. [PDF, 1336KB]

(2012) Kateřina Veselovská, Nguy Giang Linh, & Michal Novák: Using Czech-English parallel corpora in automatic identification of it.  [BUCC 2012] The 5th Workshop on Building and Using Comparable Corpora: “Language Resources for Machine Translation in Less-Resourced Languages and Domains”,  LREC 2012 Workshop, 26 May 2012, Istanbul, Turkey; pp.112-120. [PDF, 306KB]

(2011) Manuela Sanguinetti & Cristina Bosco: Building the multilingual TUT parallel treebank. AEPC 2011: proceedings of the Second Workshop on Annotation and Exploitation of Parallel Corpora, associated with the 8th International Conference on Recent Advances in Natural Language Processing (RANLP 2011), 15th September 2011, Hissar, Bulgaria; pp.19-28. [PDF, 316KB]

(2011) Kiril Simov, Petya Osenova, Laska Laskova, Aleksandar Savkov, & Stanislava Kancheva: Bulgarian-English parallel treebank: word and semantic level alignment. AEPC 2011: proceedings of the Second Workshop on Annotation and Exploitation of Parallel Corpora, associated with the 8th International Conference on Recent Advances in Natural Language Processing (RANLP 2011), 15th September 2011, Hissar, Bulgaria; pp.29-38. [PDF, 396KB]

(2011) Martin Volk, Torsten Marek, & Yvonne Samuelsson: Building and querying parallel treebanks. Translation: Computation, Corpora,  Cognition 1 (1), December 2011; pp.7-28. [PDF, 824KB]

(2010) Tagyoung Chung & Daniel Gildea: Effects of empty categories on machine translation.  [EMNLP 2010] Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, MIT, Massachusetts, USA, 9-11 October 2010; pp.636-645. [PDF, 198KB]

(2010) Stephen Grimes, Xuansong Li, Ann Bies, Seth Kulick, Xiaoyi Ma, & Stephanie Strassel: Creating Arabic-English parallel word-aligned treebank corpora at LDC. LREC 2010: Workshop on Language Resources and Human Language Technology for Semitic Languages, Valletta, Malta, 17 May 2010; pp.102-107. [PDF, 747KB]

(2010) Jun Sun, Min Zhang, & Chew Lim Tan: Exploring syntactic structural features for sub-tree alignment using bilingual tree kernels. ACL 2010: the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, July 11-16, 2010: Conference proceedings; pp.306-315. [PDF, 286KB]

Wikis

(2012) CoSyne, a project on multilingual content synchronization with wikis. [Project paper at] EAMT 2012: Proceedings of the 16th Annual Conference of the European Association for Machine Translation, Trento, Italy, May 28-30 2012, ed. Mauro Cettolo, Marcello Federico, Lucia Specia, Andy Way; p.206. [PDF, 87KB]

(2011) Federico Gaspari, Antonio Toral & Sudip Kumar Naskar: User-focused task-oriented MT evaluation for wikis: a case study. Proceedings of the Third Joint EM+/CNGL Workshop “Bringing MT to the User: Research Meets Translators” (JEC ’11), Luxembourg, 14 October 2011; pp.13-22. [PDF, 961KB]

(2011) CoSyne, a project on multi-lingual content synchronization with wikis. (European Machine Translation Projects.) [EAMT 2011]: proceedings of the 15th conference of the European Association for Machine Translation, 30-31 May 2011, Leuven, Belgium; eds. Mikel L.Forcada, Heidi Depraetere, Vincent Vandeghinste; p.329. [PDF, 46KB]

Wikis

(2015) Takashi Tsunakawa & Hiroyuki Kaji: Towards cross-lingual patent wikification. MT Summit XV, October 30 – November 3, 2015, Miami, Florida, USA. Proceedings of MT Summit XV: Sixth Workshop on Patent and Scientific Literature Translation (PSLT6); pp.89-95. [PDF, 1165KB]

Wiktionary

(2015) Zied Elloumi, Hervé Blanchon, Gilles Serasset, & Laurent Besacier: METEOR for multiple target languages using DBnary. MT Summit XV, October 30 – November 3, 2015, Miami, Florida, USA. Proceedings of MT Summit XV: vol.1: MT Researchers’ Track; pp.80-89. [PDF, 569KB]

(2015) Gerard de Melo: Wiktionary-based word embeddings. MT Summit XV, October 30 – November 3, 2015, Miami, Florida, USA. Proceedings of MT Summit XV: vol.1: MT Researchers’ Track; pp.346-359. [PDF, 709KB] 

Wordnets (see also WordNet in index of systems)

(2014) Tatiana Erekhinskaya, Meghana Satpute, & Dan Moldovan: Multilingual eXtended Word Net Knowledge Base semantic parsing and translation of glosses.  LREC 2014: Ninth International Conference on Language Resources and Evaluation, May 26-31, 2014 Harpa Concert Hall and Conference Center, Reykjavik, Iceland; ed. Nicoletta Calzolari et al.; pp.2990-2994. [PDF, 82KB]

(2014) Able-to-Include: Improving accessibility for people with intellectual disabilities. Proceedings of the 17th annual conference of the European Association for Machine Translation, EAMT 2014, Dubrovnik, Croatia, 16th-18th June 2014, edited by Marko Tadić, Philipp Koehn, Johann Roturier, Andy Way; p.134. [PDF]

(2013) Dhouha Bouamor, Nasredine Semmar, & Pierre Zweigenbaum: Using WordNet and semantic similarity for bilingual terminology mining from comparable corpora. Proceedings of the 6th Workshop on Building and Using Comparable Corpora, Sofia, Bulgaria, August 8, 2013; pp.16-23. [PDF, 628KB]

(2012) Darja Fišer: Language resources and tools for semantically enhanced processing of Slovene [abstract]. In: Crosslingual Language Technology in service of an integrated multilingual Europe, 4-5 May 2012, University of Hamburg, Germany; p.22. [PDF]

(2012) Salil Joshi, Arindam Chatterjee, Arun Karthikeyan Karra, & Pushpak Bhattacharyya: Eating your own cooking: automatically linking wordnet synsets of two languages. Proceedings of COLING 2012: Demonstration Papers, Mumbai, December 2012; pp. 239-246. [PDF, 1527KB]

(2012) Gerard de Melo & Gerhard Weikum: UWN: a large multilingual lexical knowledge base. [ACL 2012] Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju, Republic of Korea, 10 July 2012, System Demonstrations; pp.151-156. [PDF, 772KB]

(2012) Jyrki Niemi & Krister Lindén: Representing the translation relation in a bilingual wordnet.  LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.2439-2446. [PDF, 365KB]

(2012) Andrejs Vasiļjevs, Markus Forsberg, Tatiana Gornostay, Dorte H.Hansen, Kristin M.Jóhannsdóttir, Krister Lindén, Gunn I.Lyse, Lene Offersgaard, Ville Oksanen, Sussi Olsen, Bolette S.Pedersen, Eiríkur Rögnvaldsson, Roberts Rozis, Inguna Skadiņa, & Koenraad De Smet: Creation of an open shared language resource repository in the Nordic and Baltic countries.  LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.1076-1083. [PDF, 1336KB]

(2012) Špela Vintar, Darja Fišer, & Aljoša Vrščaj: Were the clocks striking or surprising? Using WSD to improve MT performance. EACL Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra): Proceedings of the workshop, 23-24 April 2012, Avignon, France; pp.87-92. [PDF, 283KB]

(2011) Pushpak Bhattacharyya: IndoWordNet and multilingual resource conscious word sense disambiguation. Proceedings of the 8th international NLPSC workshop. Special theme: Human-machine interaction in translation, Copenhagen Business School, 20-21 August 2011; ed.Bernadette Sharp, Michael Zock, Michael Carl, Arnt Lykke Jakobsen (Copenhagen Studies in Language 41), Frederiksberg: Samfundslitteratur, 2011; pp.29-30. [PDF, 677KB]

(2011) Špela Vintar & Darja Fišer: Enriching Slovene WordNet with domain-specific terms. Translation: Computation, Corpora,  Cognition 1 (1), December 2011; pp.29-44. [PDF, 631KB]

(2010) Pushpak Bhattacharyya: IndoWordnet. LREC 2010: proceedings of the  seventh international conference on Language Resources and Evaluation, 17-23 May 2010, Valletta, Malta; pp.3785-3792. [PDF, 411KB]

World Wide Web [see also Internet, Semantic Web, Wikis]

(2014) Vicent Alabau & Luis A.Leiva: Collaborative web UI localization, or how to build feature-rich multilingual datasets. Proceedings of the 17th annual conference of the European Association for Machine Translation, EAMT 2014, Dubrovnik, Croatia, 16th-18th June 2014, edited by Marko Tadić, Philipp Koehn, Johann Roturier, Andy Way; pp..151-154. [PDF, 352KB]

(2014) Yi Lu, Longyue Wang, Derek F.Wong, Lidia S.Chao, Yiming Wang, & Francisco Oliveira: Domain adaptation for medical text translation using web resources. [WMT 2014] Proceedings of the Ninth Workshop on Statistical Machine Translation, Baltimore, Maryland USA, June 26-27, 2014; pp.233-238. [PDF, 485KB]

(2014) Antonio Toral, Raphael Rubino, Miquel Esplŕ-Gomis, Tommi Pirinen, Andy Way, & Gema  Ramírez-Sánchez: Extrinsic evaluation of web-crawlers in machine translation: a study on Croatian-English for the tourism domain. Proceedings of the 17th annual conference of the European Association for Machine Translation, EAMT 2014, Dubrovnik, Croatia, 16th-18th June 2014, edited by Marko Tadić, Philipp Koehn, Johann Roturier, Andy Way; pp. 221-224. [PDF, 345KB]

(2013) Vassilis Papavassiliou, Prokopis Prokopidis, & Gregor Thurmair: A modular open-source focused crawler for mining monolingual and bilingual corpora from the web.  Proceedings of the 6th Workshop on Building and Using Comparable Corpora, Sofia, Bulgaria, August 8, 2013; pp.43-51. [PDF, 663KB]

(2013) Felix Sasaki: Metadata for the multilingual web. Translation: Computation, Corpora,  Cognition 3 (1), June 2013; pp.19-26. [PDF, 124KB]

(2013) Felix Sasaki: Metadata for the multilingual web: introducing internationalization tag set (ITS) 2.0. Proceedings of the XIV Machine Translation Summit, Nice, September 2-6, 2013; ed. K.Sima’an, M.L.Forcada, D.Grasmick, H.Depraetere, A.Way; p.423. [PDF, 205KB]

(2013) Jason R.Smith, Herve Saint-Amand, Magdalena Plamada, Philipp Koehn, Chris Callison-Burch, & Adam Lopez: Dirt cheap web-scale parallel text from the Common Crawl.  ACL-2013: Proceedings of the 51st Meeting of the Association for Computational Linguistics, Sofia, Bulgaria, August 4-9 2013; pp.1374-1383. [PDF, 179KB]

(2013) Chengzhi Zhang, Xuchen Yao & Chunyu Kit: Finding more bilingual webpages with high credibility via link analysis. Proceedings of the 6th Workshop on Building and Using Comparable Corpora, Sofia, Bulgaria, August 8, 2013; pp.138-143. [PDF, 499KB]

(2012) Ahmet Aker, Evangelos Kanoulas, & Robert Gaizauskas: A light way to collect comparable corpora from the Web. LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.15-20. [PDF, 856KB]

(2012) Anelia Belogay, Diman Karagyozov, Svetla Koeva, Cristina Vertan, Adam Przepiórkowski, Polivios Raxis, & Dan Cristea: Harnessing NLP technologies in the processes of multilingual content management.  [EACL 2012] Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics, Avignon, France, April 23-27, 2012; pp. 6-10. [PDF, 367KB]

(2012) Valeria Caruso & Anna De Meo: What else can databases do to assist translators? Illustrating a rated inventory of Web dictionaries. [Aslib 2012] Translating and the Computer 34, 29-30 November 2012, One Birdcage Walk, London, UK; 12pp. [PDF, 848KB], presentation by Martin Thomas: 50 slides [PDF, 3336KB]

(2012) Pavel Pecina, Antonio  Toral, Vassilis Papavassiliou, Prokopis Prokopidis, & Josef van Genabith: Domain adaptation of statistical machine translation using web-crawled resources: a case study. EAMT 2012: Proceedings of the 16th Annual Conference of the European Association for Machine Translation, Trento, Italy, May 28-30 2012, ed. Mauro Cettolo, Marcello Federico, Lucia Specia, Andy Way; pp.145-152. [PDF, 201KB]

(2012) Feiliang Ren: A practical Chinese-English ON translation method based on ON’s distribution characteristics on the web. Proceedings of COLING 2012: Demonstration Papers, Mumbai, December 2012; pp. 239-246. [PDF, 102KB]

(2012) Stephen D.Richardson: Using the Microsoft Translator Hub at the Church of Jesus Christ of Latter-day Saints. AMTA-2012: the Tenth Biennial Conference of the Association for Machine Translation in the Americas. Proceedings, San Diego, CA, October 28 – November 1, 2012; 8pp. [PDF, 555KB]

(2012) Ińaki San Vicente & Iker Manterola: PaCo2: a fully automated tool for gathering parallel corpora from the Web.  LREC 2012: Eighth international conference on Language Resources and Evaluation, 21-27 May 2012, Istanbul, Turkey; pp.1-6. [PDF, 415KB]

(2012) Embedding machine translation in ATLAS content management system. [Project paper at] EAMT 2012: Proceedings of the 16th Annual Conference of the European Association for Machine Translation, Trento, Italy, May 28-30 2012, ed. Mauro Cettolo, Marcello Federico, Lucia Specia, Andy Way; p.96. [PDF, 209KB]

(2011) Takeshi Abekawa & Kyo Kageura: Using seed terms for crawling bilingual terminology lists on the Web. Translating and the Computer 33, 17-18 November 2011, London; 12pp. [PDF, 68KB]

(2011) Sven Christian Andrä & Jörg Schütz: The semantically-enriched translation interoperability protocol. [IJCNLP 2011] Proceedings of Workshop on Language Resources, Technology and Services in the Sharing Paradigm, Chiang Mai, Thailand, November 12, 2011; pp.24-31. [PDF, 218KB]

(2011) Duo Ding: Integrate multilingual web search results using cross-lingual topic models. [IJCNLP 2011] Proceedings of the 5th International Joint Conference on Natural Language Processing, Chiang Mai, Thailand, November 8-13, 2011; pp.20-24. [PDF, 367KB]

(2011) Théo Hoffenberg & Christophe Brun-Franc: An innovative platform to allow full translation of internet sites. [EAMT 2011]: proceedings of the 15th conference of the European Association for Machine Translation, 30-31 May 2011, Leuven, Belgium; eds. Mikel L.Forcada, Heidi Depraetere, Vincent Vandeghinste; pp.41-45. [PDF, 316KB]

(2011) Richard Ishida: The multilingual web: latest developments at the W3C/IETF. Translating and the Computer 33, 17-18 November 2011, London; presentation 55 slides. [PDF of PPT, 4551KB]

(2011) Miguel A.Jiménez-Crespo: To adapt or not to adapt in web localization: a contrastive genre-based study of original and localised legal sections in corporate websites. Journal of Specialised Translation 15 (January 2011); pp.2-27. [PDF, 237KB]

(2011) Wang Ling, Pável Calado, Bruno Martins, Isabel Trancoso, Alan Black, & Luísa Coheur: Named entity translation using anchor texts. IWSLT 2011: Proceedings of the International Workshop on Spoken Language Translation, San Francisco, December 8-9, 2011, ed. Marcello Federico, Mei-Yuh Hwang, Margit Rödder, Sebastian Stüker; pp.206-213. [PDF, 442KB]

 (2011) Spencer Rarrick, Chris Quirk, & Will Lewis: MT detection in web-scraped parallel corpora. MT Summit XIII: the Thirteenth Machine Translation Summit [organized by the] Asia-Pacific Association for Machine Translation (AAMT), 19-23 September 2011, Xiamen, China; pp.422-429. [PDF, 323KB]

(2011) Simon Shi, Pascale Fung, Emmanuel Prochasson, Chi-kiu Lo, & Dekai Wu: Mining parallel documents using low bandwidth and high precision CLIR from the heterogeneous web. [IJCNLP 2011] Proceedings of the 5th International Joint Conference on Natural Language Processing, Chiang Mai, Thailand, November 8-13, 2011; pp.420-428. [PDF, 512KB]

(2011) Johanka Spoustová & Miroslav Spousta: Comparable fora. ACL 2011: Proceedings of the Fourth Workshop on Building and Using Comparable Corpora, Portland, Oregon, USA, 24 June 2011; pp.96-101. [PDF, 80KB]

(2011) Cristina Vertan & Monica Gavrila: Using manual and parallel aligned corpora for machine translation services within an on-line content management system. AEPC 2011: proceedings of the Second Workshop on Annotation and Exploitation of Parallel Corpora, associated with the 8th International Conference on Recent Advances in Natural Language Processing (RANLP 2011), 15th September 2011, Hissar, Bulgaria; pp.53-58. [PDF, 361KB]

(2011) Arnaud Vié, Luis Villarejo Muńoz, Mireia Farrús Cabeceran, & Jimmy O’Regan: Apertium advanced web interface: a first step towards interactivity and language tools convergence. Proceedings of the Second International Workshop on Free/Open-Source Rule-Based Machine Translation, Barcelona, Spain, January 20-21, 2011, ed. F.Sánchez-Martínez and J.A.Pérez-Ortiz; pp.45-51. [PDF, 280KB]

(2011) Cesare Zanca: Developing translation strategies and cultural awareness using corpora and the  web. Tralogy, Paris, 3-4 March 2011; 14pp. [PDF, 160KB]

(2011)  LIWP – EU language industry web platform. (European Machine Translation Projects.) [EAMT 2011]: proceedings of the 15th conference of the European Association for Machine Translation, 30-31 May 2011, Leuven, Belgium; eds. Mikel L.Forcada, Heidi Depraetere, Vincent Vandeghinste; p.339. [PDF, 40KB]

(2010) Ahmet Aker & Robert Gaizauskas: Model summaries for location-related images. LREC 2010: proceedings of the seventh international conference on Language Resources and Evaluation, 17-23 May 2010, Valletta, Malta; pp.3119-3124. [PDF, 429KB]

(2010) José Joăo Almeida & Alberto Simőes: Automatic parallel corpora and bilingual terminology extraction from parallel websites. [LREC 2010] Proceedings of the 3rd Workshop on Building and Using Comparable Corpora, Malta, 22 May 2010; pp.50-55. [PDF, 257KB]

 (2010) Christian Boitet, Huynh Cong Phap, Nguyen Hong Thai, & Valérie Bellynck: The iMAG concept: multilingual access gateway to an elected web sites with incremental quality increase through collaborative post-edition of MT pretranslations. TALN 2010. Proceedings of Traitement Automatique du Langage Naturel, 19-23 juillet 2010. Montréal, Canada. 8pp. [PDF, 1585KB]

(2010) Sean Colbath: Terminology management for web monitoring. AMTA 2010: the Ninth conference of the Association for Machine Translation in the Americas, Denver, Colorado, October 31 – November 4, 2010; abstract

(2010) Dmitry Davidov & Ari Rappoport: Automated translation of semantic relationships. Coling 2010: 23rd International Conference on Computational Linguistics. Proceedings of the conference, 23-27 August 2010, Beijing International Convention Center, Beijing, China; pp.241-249. [PDF, 196KB]

(2010) Alain Désilets: WeBiText: multilingual concordancer built from public high quality web content. AMTA 2010: the Ninth conference of the Association for Machine Translation in the Americas, Denver, Colorado, October 31 – November 4, 2010; abstract

(2010) Miquel Esplŕ-Gomis & Mikel L.Forcada: Combining content-based and URL-based heuristics to harvest aligned bitexts from multilingual sites with Bitextor. Fourth Machine Translation Marathon “Open Source Tools for Machine Translation”, 25-30 January, Dublin, Ireland; Prague Bulletin of Mathematical Linguistics, no.93, January 2010; pp.77-86. [PDF, 160KB]

(2010) Yanhui Feng, Yu Hong, Zhenxiang Yan, Jianmin Yao, & Qiaoming Zhu: A novel method for bilingual web page acquisition from search engine web records. Coling 2010: 23rd International Conference on Computational Linguistics, 23-27 August 2010, Beijing International Convention Center, Beijing, China, Posters volume; pp.294-302. [PDF, 184KB]

(2010) Pascale Fung, Emmanuel Prochasson, & Simon Shi: Trillions of comparable documents. [LREC 2010] Proceedings of the 3rd Workshop on Building and Using Comparable Corpora, Malta, 22 May 2010; pp.26-34. [PDF, 199KB]

(2010) Aarne Ranta, Krasimir Angelov, & Thomas Hallgren: Tools for multilingual grammar-based translation on the web. Proceedings of the ACL 2010 System Demonstrations, Uppsala, Sweden, 13 July 2010; pp.66-71. [PDF, 313KB]

(2010) Osamuyimen Stewart, David Lubensky, Scott Macdonald, & Julie Marcotte: Using machine translation for localization of electronic support content: evaluating end-user satisfaction. AMTA 2010: the Ninth conference of the Association for Machine Translation in the Americas, Denver, Colorado, October 31 – November 4, 2010; 6pp. [PDF, 39KB]

(2010) Yulia Tsvetkov & Shuly Wintner: Automatic acquisition of parallel corpora from websites with dynamic content.  LREC 2010: proceedings of the  seventh international conference on Language Resources and Evaluation, 17-23 May 2010, Valletta, Malta; pp.3389-3392. [PDF, 447KB]

(2010) Jakob Uszkoreit, Jay M.Ponte, Ashok C.Popat, & Moshe Dubiner: Large scale parallel document mining for machine translation. Coling 2010: 23rd International Conference on Computational Linguistics. Proceedings of the conference, 23-27 August 2010, Beijing International Convention Center, Beijing, China; pp.1101-1109. [PDF, 241KB]