He Bai


pdf bib
Better Language Model with Hypernym Class Prediction
He Bai | Tong Wang | Alessandro Sordoni | Peng Shi
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Class-based language models (LMs) have been long devised to address context sparsity in n-gram LMs. In this study, we revisit this approach in the context of neural LMs. We hypothesize that class-based prediction leads to an implicit context aggregation for similar words and thus can improve generalization for rare words. We map words that have a common WordNet hypernym to the same class and train large neural LMs by gradually annealing from predicting the class to token prediction during training. Empirically, this curriculum learning strategy consistently improves perplexity over various large, highly-performant state-of-the-art Transformer-based models on two datasets, WikiText-103 and ARXIV. Our analysis shows that the performance improvement is achieved without sacrificing performance on rare words. Finally, we document other attempts that failed to yield empirical gains, and discuss future directions for the adoption of class-based LMs on a larger scale.


pdf bib
Semantics of the Unwritten: The Effect of End of Paragraph and Sequence Tokens on Text Generation with GPT2
He Bai | Peng Shi | Jimmy Lin | Luchen Tan | Kun Xiong | Wen Gao | Jie Liu | Ming Li
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Student Research Workshop

The semantics of a text is manifested not only by what is read but also by what is not read. In this article, we will study how those implicit “not read” information such as end-of-paragraph () and end-of-sequence () affect the quality of text generation. Specifically, we find that the pre-trained language model GPT2 can generate better continuations by learning to generate the in the fine-tuning stage. Experimental results on English story generation show that can lead to higher BLEU scores and lower perplexity. We also conduct experiments on a self-collected Chinese essay dataset with Chinese-GPT2, a character level LM without and during pre-training. Experimental results show that the Chinese GPT2 can generate better essay endings with .

pdf bib
Cross-Lingual Training of Dense Retrievers for Document Retrieval
Peng Shi | Rui Zhang | He Bai | Jimmy Lin
Proceedings of the 1st Workshop on Multilingual Representation Learning

Dense retrieval has shown great success for passage ranking in English. However, its effectiveness for non-English languages remains unexplored due to limitation in training resources. In this work, we explore different transfer techniques for document ranking from English annotations to non-English languages. Our experiments reveal that zero-shot model-based transfer using mBERT improves search quality. We find that weakly-supervised target language transfer is competitive compared to generation-based target language transfer, which requires translation models.


pdf bib
Cross-Lingual Training of Neural Models for Document Ranking
Peng Shi | He Bai | Jimmy Lin
Findings of the Association for Computational Linguistics: EMNLP 2020

We tackle the challenge of cross-lingual training of neural document ranking models for mono-lingual retrieval, specifically leveraging relevance judgments in English to improve search in non-English languages. Our work successfully applies multi-lingual BERT (mBERT) to document ranking and additionally compares against a number of alternatives: translating the training data, translating documents, multi-stage hybrids, and ensembles. Experiments on test collections in six different languages from diverse language families reveal many interesting findings: model-based relevance transfer using mBERT can significantly improve search quality in (non-English) mono-lingual retrieval, but other “low resource” approaches are competitive as well.


pdf bib
Memory Consolidation for Contextual Spoken Language Understanding with Dialogue Logistic Inference
He Bai | Yu Zhou | Jiajun Zhang | Chengqing Zong
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Dialogue contexts are proven helpful in the spoken language understanding (SLU) system and they are typically encoded with explicit memory representations. However, most of the previous models learn the context memory with only one objective to maximizing the SLU performance, leaving the context memory under-exploited. In this paper, we propose a new dialogue logistic inference (DLI) task to consolidate the context memory jointly with SLU in the multi-task framework. DLI is defined as sorting a shuffled dialogue session into its original logical order and shares the same memory encoder and retrieval mechanism as the SLU model. Our experimental results show that various popular contextual SLU models can benefit from our approach, and improvements are quite impressive, especially in slot filling.


pdf bib
Source Critical Reinforcement Learning for Transferring Spoken Language Understanding to a New Language
He Bai | Yu Zhou | Jiajun Zhang | Liang Zhao | Mei-Yuh Hwang | Chengqing Zong
Proceedings of the 27th International Conference on Computational Linguistics

To deploy a spoken language understanding (SLU) model to a new language, language transferring is desired to avoid the trouble of acquiring and labeling a new big SLU corpus. An SLU corpus is a monolingual corpus with domain/intent/slot labels. Translating the original SLU corpus into the target language is an attractive strategy. However, SLU corpora consist of plenty of semantic labels (slots), which general-purpose translators cannot handle well, not to mention additional culture differences. This paper focuses on the language transferring task given a small in-domain parallel SLU corpus. The in-domain parallel corpus can be used as the first adaptation on the general translator. But more importantly, we show how to use reinforcement learning (RL) to further adapt the adapted translator, where translated sentences with more proper slot tags receive higher rewards. Our reward is derived from the source input sentence exclusively, unlike reward via actor-critical methods or computing reward with a ground truth target sentence. Hence we can adapt the translator the second time, using the big monolingual SLU corpus from the source language. We evaluate our approach on Chinese to English language transferring for SLU systems. The experimental results show that the generated English SLU corpus via adaptation and reinforcement learning gives us over 97% in the slot F1 score and over 84% accuracy in domain classification. It demonstrates the effectiveness of the proposed language transferring method. Compared with naive translation, our proposed method improves domain classification accuracy by relatively 22%, and the slot filling F1 score by relatively more than 71%.