2020
pdf
bib
abs
Multilingual Universal Sentence Encoder for Semantic Retrieval
Yinfei Yang
|
Daniel Cer
|
Amin Ahmad
|
Mandy Guo
|
Jax Law
|
Noah Constant
|
Gustavo Hernandez Abrego
|
Steve Yuan
|
Chris Tar
|
Yun-hsuan Sung
|
Brian Strope
|
Ray Kurzweil
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations
We present easy-to-use retrieval focused multilingual sentence embedding models, made available on TensorFlow Hub. The models embed text from 16 languages into a shared semantic space using a multi-task trained dual-encoder that learns tied cross-lingual representations via translation bridge tasks (Chidambaram et al., 2018). The models achieve a new state-of-the-art in performance on monolingual and cross-lingual semantic retrieval (SR). Competitive performance is obtained on the related tasks of translation pair bitext retrieval (BR) and retrieval question answering (ReQA). On transfer learning tasks, our multilingual embeddings approach, and in some cases exceed, the performance of English only sentence embeddings.
2019
pdf
bib
abs
Learning Cross-Lingual Sentence Representations via a Multi-task Dual-Encoder Model
Muthu Chidambaram
|
Yinfei Yang
|
Daniel Cer
|
Steve Yuan
|
Yunhsuan Sung
|
Brian Strope
|
Ray Kurzweil
Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)
The scarcity of labeled training data across many languages is a significant roadblock for multilingual neural language processing. We approach the lack of in-language training data using sentence embeddings that map text written in different languages, but with similar meanings, to nearby embedding space representations. The representations are produced using a dual-encoder based model trained to maximize the representational similarity between sentence pairs drawn from parallel data. The representations are enhanced using multitask training and unsupervised monolingual corpora. The effectiveness of our multilingual sentence embeddings are assessed on a comprehensive collection of monolingual, cross-lingual, and zero-shot/few-shot learning tasks.
pdf
bib
abs
Hierarchical Document Encoder for Parallel Corpus Mining
Mandy Guo
|
Yinfei Yang
|
Keith Stevens
|
Daniel Cer
|
Heming Ge
|
Yun-hsuan Sung
|
Brian Strope
|
Ray Kurzweil
Proceedings of the Fourth Conference on Machine Translation (Volume 1: Research Papers)
We explore using multilingual document embeddings for nearest neighbor mining of parallel data. Three document-level representations are investigated: (i) document embeddings generated by simply averaging multilingual sentence embeddings; (ii) a neural bag-of-words (BoW) document encoding model; (iii) a hierarchical multilingual document encoder (HiDE) that builds on our sentence-level model. The results show document embeddings derived from sentence-level averaging are surprisingly effective for clean datasets, but suggest models trained hierarchically at the document-level are more effective on noisy data. Analysis experiments demonstrate our hierarchical models are very robust to variations in the underlying sentence embedding quality. Using document embeddings trained with HiDE achieves the state-of-the-art on United Nations (UN) parallel document mining, 94.9% P@1 for en-fr and 97.3% P@1 for en-es.
2018
pdf
bib
abs
Universal Sentence Encoder for English
Daniel Cer
|
Yinfei Yang
|
Sheng-yi Kong
|
Nan Hua
|
Nicole Limtiaco
|
Rhomni St. John
|
Noah Constant
|
Mario Guajardo-Cespedes
|
Steve Yuan
|
Chris Tar
|
Brian Strope
|
Ray Kurzweil
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
We present easy-to-use TensorFlow Hub sentence embedding models having good task transfer performance. Model variants allow for trade-offs between accuracy and compute resources. We report the relationship between model complexity, resources, and transfer performance. Comparisons are made with baselines without transfer learning and to baselines that incorporate word-level transfer. Transfer learning using sentence-level embeddings is shown to outperform models without transfer learning and often those that use only word-level transfer. We show good transfer task performance with minimal training data and obtain encouraging results on word embedding association tests (WEAT) of model bias.
pdf
bib
abs
Learning Semantic Textual Similarity from Conversations
Yinfei Yang
|
Steve Yuan
|
Daniel Cer
|
Sheng-yi Kong
|
Noah Constant
|
Petr Pilar
|
Heming Ge
|
Yun-Hsuan Sung
|
Brian Strope
|
Ray Kurzweil
Proceedings of the Third Workshop on Representation Learning for NLP
We present a novel approach to learn representations for sentence-level semantic similarity using conversational data. Our method trains an unsupervised model to predict conversational responses. The resulting sentence embeddings perform well on the Semantic Textual Similarity (STS) Benchmark and SemEval 2017’s Community Question Answering (CQA) question similarity subtask. Performance is further improved by introducing multitask training, combining conversational response prediction and natural language inference. Extensive experiments show the proposed model achieves the best performance among all neural models on the STS Benchmark and is competitive with the state-of-the-art feature engineered and mixed systems for both tasks.
pdf
bib
abs
Effective Parallel Corpus Mining using Bilingual Sentence Embeddings
Mandy Guo
|
Qinlan Shen
|
Yinfei Yang
|
Heming Ge
|
Daniel Cer
|
Gustavo Hernandez Abrego
|
Keith Stevens
|
Noah Constant
|
Yun-Hsuan Sung
|
Brian Strope
|
Ray Kurzweil
Proceedings of the Third Conference on Machine Translation: Research Papers
This paper presents an effective approach for parallel corpus mining using bilingual sentence embeddings. Our embedding models are trained to produce similar representations exclusively for bilingual sentence pairs that are translations of each other. This is achieved using a novel training method that introduces hard negatives consisting of sentences that are not translations but have some degree of semantic similarity. The quality of the resulting embeddings are evaluated on parallel corpus reconstruction and by assessing machine translation systems trained on gold vs. mined sentence pairs. We find that the sentence embeddings can be used to reconstruct the United Nations Parallel Corpus (Ziemski et al., 2016) at the sentence-level with a precision of 48.9% for en-fr and 54.9% for en-es. When adapted to document-level matching, we achieve a parallel document matching accuracy that is comparable to the significantly more computationally intensive approach of Uszkoreit et al. (2010). Using reconstructed parallel data, we are able to train NMT models that perform nearly as well as models trained on the original data (within 1-2 BLEU).
2017
pdf
bib
abs
Generating High-Quality and Informative Conversation Responses with Sequence-to-Sequence Models
Yuanlong Shao
|
Stephan Gouws
|
Denny Britz
|
Anna Goldie
|
Brian Strope
|
Ray Kurzweil
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
Sequence-to-sequence models have been applied to the conversation response generation problem where the source sequence is the conversation history and the target sequence is the response. Unlike translation, conversation responding is inherently creative. The generation of long, informative, coherent, and diverse responses remains a hard task. In this work, we focus on the single turn setting. We add self-attention to the decoder to maintain coherence in longer responses, and we propose a practical approach, called the glimpse-model, for scaling to large datasets. We introduce a stochastic beam-search algorithm with segment-by-segment reranking which lets us inject diversity earlier in the generation process. We trained on a combined data set of over 2.3B conversation messages mined from the web. In human evaluation studies, our method produces longer responses overall, with a higher proportion rated as acceptable and excellent as length increases, compared to baseline sequence-to-sequence models with explicit length-promotion. A back-off strategy produces better responses overall, in the full spectrum of lengths.