Proceedings of the First Workshop on Curation and Applications of Parallel and Comparable Corpora

Proceedings of the First Workshop on Curation and Applications of Parallel and Comparable Corpora Haithem Afli Chao-Hong Liu November 2017

Taipei, Taiwan

Asian Federation of Natural Language Processing http://www.aclweb.org/anthology/W17-56 book Cupral:2017 Building a Better Bitext for Structurally Different Languages through Self-training JungyeulPark LoicDugast Jeen-PyoHong Chang-UkShin Jeong-WonCha Proceedings of the First Workshop on Curation and Applications of Parallel and Comparable Corpora November 2017

Taipei, Taiwan

Asian Federation of Natural Language Processing 1–10 http://www.aclweb.org/anthology/W17-5601 We propose a novel method to bootstrap the construction of parallel corpora for new pairs of structurally different languages. We do so by combining the use of a pivot language and self-training. A pivot language enables the use of existing translation models to bootstrap the alignment and a self-training procedure enables to achieve better alignment, both at the document and sentence level. We also propose several evaluation methods for the resulting alignment. inproceedings park-EtAl:2017:Cupral MultiNews: A Web collection of an Aligned Multimodal and Multilingual Corpus HaithemAfli PintuLohar AndyWay Proceedings of the First Workshop on Curation and Applications of Parallel and Comparable Corpora November 2017

Taipei, Taiwan

Asian Federation of Natural Language Processing 11–15 http://www.aclweb.org/anthology/W17-5602 Integrating Natural Language Processing (NLP) and computer vision is a promising effort. However, the applicability of these methods directly depends on the availability of a specific multimodal data that includes images and texts. In this paper, we present a collection of a Multimodal corpus of comparable texts and their images in 9 languages from the web news articles of Euronews website. This corpus has found widespread use in the NLP community in Multilingual and multimodal tasks. Here, we focus on its acquisition of the images and text data and their multilingual alignment. inproceedings afli-lohar-way:2017:Cupral Learning Phrase Embeddings from Paraphrases with GRUs zhihaozhou LifuHuang HengJi Proceedings of the First Workshop on Curation and Applications of Parallel and Comparable Corpora November 2017

Taipei, Taiwan

Asian Federation of Natural Language Processing 16–23 http://www.aclweb.org/anthology/W17-5603 Learning phrase representations has been widely explored in many Natural Language Processing tasks (e.g., Sentiment Analysis, Machine Translation) and has shown promising improvements. Previous studies either learn non-compositional phrase representations with general word embedding learning techniques or learn compositional phrase representations based on syntactic structures, which either require huge amounts of human annotations or cannot be easily generalized to all phrases. In this work, we propose to take advantage of large-scaled paraphrase database and present a pairwise-GRU framework to generate compositional phrase representations. Our framework can be re-used to generate representations for any phrases. Experimental results show that our framework achieves state-of-the-art results on several phrase similarity tasks. inproceedings zhou-huang-ji:2017:Cupral