Bo Li

May refer to several people

Other people with similar names: Bo Li (BeiHang), Bo Li (NUS, Google), Bo Li (Vanderbilt, UIUC)


pdf bib
Class Lifelong Learning for Intent Detection via Structure Consolidation Networks
Qingbin Liu | Yanchao Hao | Xiaolong Liu | Bo Li | Dianbo Sui | Shizhu He | Kang Liu | Jun Zhao | Xi Chen | Ningyu Zhang | Jiaoyan Chen
Findings of the Association for Computational Linguistics: ACL 2023

Intent detection, which estimates diverse intents behind user utterances, is an essential component of task-oriented dialogue systems. Previous intent detection models are usually trained offline, which can only handle predefined intent classes. In the real world, new intents may keep challenging deployed models. For example, with the prevalence of the COVID-19 pandemic, users may pose various issues related to the pandemic to conversational systems, which brings many new intents. A general intent detection model should be intelligent enough to continually learn new data and recognize new arriving intent classes. Therefore, this work explores Class Lifelong Learning for Intent Detection (CLL-ID), where the model continually learns new intent classes from new data while avoiding catastrophic performance degradation on old data. To this end, we propose a novel lifelong learning method, called Structure Consolidation Networks (SCN), which consists of structure-based retrospection and contrastive knowledge distillation to handle the problems of expression diversity and class imbalance in the CLL-ID task. In addition to formulating the new task, we construct 3 benchmarks based on 8 intent detection datasets. Experimental results demonstrate the effectiveness of SCN, which significantly outperforms previous lifelong learning methods on the three benchmarks.

pdf bib
SCCS: Semantics-Consistent Cross-domain Summarization via Optimal Transport Alignment
Jielin Qiu | Jiacheng Zhu | Mengdi Xu | Franck Dernoncourt | Trung Bui | Zhaowen Wang | Bo Li | Ding Zhao | Hailin Jin
Findings of the Association for Computational Linguistics: ACL 2023

Multimedia summarization with multimodal output (MSMO) is a recently explored application in language grounding. It plays an essential role in real-world applications, i.e., automatically generating cover images and titles for news articles or providing introductions to online videos. However, existing methods extract features from the whole video and article and use fusion methods to select the representative one, thus usually ignoring the critical structure and varying semantics with video/document. In this work, we propose a Semantics-Consistent Cross-domain Summarization (SCCS) model based on optimal transport alignment with visual and textual segmentation. Our method first decomposes both videos and articles into segments in order to capture the structural semantics, and then follows a cross-domain alignment objective with optimal transport distance, which leverages multimodal interaction to match and select the visual and textual summary. We evaluated our method on three MSMO datasets, and achieved performance improvement by 8% & 6% of textual and 6.6% &5.7% of video summarization, respectively, which demonstrated the effectiveness of our method in producing high-quality multimodal summaries.


pdf bib
Point, Disambiguate and Copy: Incorporating Bilingual Dictionaries for Neural Machine Translation
Tong Zhang | Long Zhang | Wei Ye | Bo Li | Jinan Sun | Xiaoyu Zhu | Wen Zhao | Shikun Zhang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

This paper proposes a sophisticated neural architecture to incorporate bilingual dictionaries into Neural Machine Translation (NMT) models. By introducing three novel components: Pointer, Disambiguator, and Copier, our method PDC achieves the following merits inherently compared with previous efforts: (1) Pointer leverages the semantic information from bilingual dictionaries, for the first time, to better locate source words whose translation in dictionaries can potentially be used; (2) Disambiguator synthesizes contextual information from the source view and the target view, both of which contribute to distinguishing the proper translation of a specific source word from multiple candidates in dictionaries; (3) Copier systematically connects Pointer and Disambiguator based on a hierarchical copy mechanism seamlessly integrated with Transformer, thereby building an end-to-end architecture that could avoid error propagation problems in alternative pipe-line methods. The experimental results on Chinese-English and English-Japanese benchmarks demonstrate the PDC’s overall superiority and effectiveness of each component.


pdf bib
AutoETER: Automated Entity Type Representation for Knowledge Graph Embedding
Guanglin Niu | Bo Li | Yongfei Zhang | Shiliang Pu | Jingyang Li
Findings of the Association for Computational Linguistics: EMNLP 2020

Recent advances in Knowledge Graph Embedding (KGE) allow for representing entities and relations in continuous vector spaces. Some traditional KGE models leveraging additional type information can improve the representation of entities which however totally rely on the explicit types or neglect the diverse type representations specific to various relations. Besides, none of the existing methods is capable of inferring all the relation patterns of symmetry, inversion and composition as well as the complex properties of 1-N, N-1 and N-N relations, simultaneously. To explore the type information for any KG, we develop a novel KGE framework with Automated Entity TypE Representation (AutoETER), which learns the latent type embedding of each entity by regarding each relation as a translation operation between the types of two entities with a relation-aware projection mechanism. Particularly, our designed automated type representation learning mechanism is a pluggable module which can be easily incorporated with any KGE model. Besides, our approach could model and infer all the relation patterns and complex relations. Experiments on four datasets demonstrate the superior performance of our model compared to state-of-the-art baselines on link prediction tasks, and the visualization of type clustering provides clearly the explanation of type embeddings and verifies the effectiveness of our model.

pdf bib
Graph Enhanced Dual Attention Network for Document-Level Relation Extraction
Bo Li | Wei Ye | Zhonghao Sheng | Rui Xie | Xiangyu Xi | Shikun Zhang
Proceedings of the 28th International Conference on Computational Linguistics

Document-level relation extraction requires inter-sentence reasoning capabilities to capture local and global contextual information for multiple relational facts. To improve inter-sentence reasoning, we propose to characterize the complex interaction between sentences and potential relation instances via a Graph Enhanced Dual Attention network (GEDA). In GEDA, sentence representation generated by the sentence-to-relation (S2R) attention is refined and synthesized by a Heterogeneous Graph Convolutional Network before being fed into the relation-to-sentence (R2S) attention . We further design a simple yet effective regularizer based on the natural duality of the S2R and R2S attention, whose weights are also supervised by the supporting evidence of relation instances during training. An extensive set of experiments on an existing large-scale dataset show that our model achieve competitive performance, especially for the inter-sentence relation extraction, while the neural predictions can also be interpretable and easily observed.


pdf bib
Exploiting Entity BIO Tag Embeddings and Multi-task Learning for Relation Extraction with Imbalanced Data
Wei Ye | Bo Li | Rui Xie | Zhonghao Sheng | Long Chen | Shikun Zhang
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

In practical scenario, relation extraction needs to first identify entity pairs that have relation and then assign a correct relation class. However, the number of non-relation entity pairs in context (negative instances) usually far exceeds the others (positive instances), which negatively affects a model’s performance. To mitigate this problem, we propose a multi-task architecture which jointly trains a model to perform relation identification with cross-entropy loss and relation classification with ranking loss. Meanwhile, we observe that a sentence may have multiple entities and relation mentions, and the patterns in which the entities appear in a sentence may contain useful semantic information that can be utilized to distinguish between positive and negative instances. Thus we further incorporate the embeddings of character-wise/word-wise BIO tag from the named entity recognition task into character/word embeddings to enrich the input representation. Experiment results show that our proposed approach can significantly improve the performance of a baseline model with more than 10% absolute increase in F1-score, and outperform the state-of-the-art models on ACE 2005 Chinese and English corpus. Moreover, BIO tag embeddings are particularly effective and can be used to improve other models as well.


pdf bib
Alibaba Submission for WMT18 Quality Estimation Task
Jiayi Wang | Kai Fan | Bo Li | Fengming Zhou | Boxing Chen | Yangbin Shi | Luo Si
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

The goal of WMT 2018 Shared Task on Translation Quality Estimation is to investigate automatic methods for estimating the quality of machine translation results without reference translations. This paper presents the QE Brain system, which proposes the neural Bilingual Expert model as a feature extractor based on conditional target language model with a bidirectional transformer and then processes the semantic representations of source and the translation output with a Bi-LSTM predictive model for automatic quality estimation. The system has been applied to the sentence-level scoring and ranking tasks as well as the word-level tasks for finding errors for each word in translations. An extensive set of experimental results have shown that our system outperformed the best results in WMT 2017 Quality Estimation tasks and obtained top results in WMT 2018.

pdf bib
Learning Neural Representation for CLIR with Adversarial Framework
Bo Li | Ping Cheng
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

The existing studies in cross-language information retrieval (CLIR) mostly rely on general text representation models (e.g., vector space model or latent semantic analysis). These models are not optimized for the target retrieval task. In this paper, we follow the success of neural representation in natural language processing (NLP) and develop a novel text representation model based on adversarial learning, which seeks a task-specific embedding space for CLIR. Adversarial learning is implemented as an interplay between the generator process and the discriminator process. In order to adapt adversarial learning to CLIR, we design three constraints to direct representation learning, which are (1) a matching constraint capturing essential characteristics of cross-language ranking, (2) a translation constraint bridging language gaps, and (3) an adversarial constraint forcing both language and media invariant to be reached more efficiently and effectively. Through the joint exploitation of these constraints in an adversarial manner, the underlying cross-language semantics relevant to retrieval tasks are better preserved in the embedding space. Standard CLIR experiments show that our model significantly outperforms state-of-the-art continuous space models and is better than the strong machine translation baseline.

pdf bib
Joint Learning from Labeled and Unlabeled Data for Information Retrieval
Bo Li | Ping Cheng | Le Jia
Proceedings of the 27th International Conference on Computational Linguistics

Recently, a significant number of studies have focused on neural information retrieval (IR) models. One category of works use unlabeled data to train general word embeddings based on term proximity, which can be integrated into traditional IR models. The other category employs labeled data (e.g. click-through data) to train end-to-end neural IR models consisting of layers for target-specific representation learning. The latter idea accounts better for the IR task and is favored by recent research works, which is the one we will follow in this paper. We hypothesize that general semantics learned from unlabeled data can complement task-specific representation learned from labeled data of limited quality, and that a combination of the two is favorable. To this end, we propose a learning framework which can benefit from both labeled and more abundant unlabeled data for representation learning in the context of IR. Through a joint learning fashion in a single neural framework, the learned representation is optimized to minimize both the supervised loss on query-document matching and the unsupervised loss on text reconstruction. Standard retrieval experiments on TREC collections indicate that the joint learning methodology leads to significant better performance of retrieval over several strong baselines for IR.

pdf bib
Alibaba Speech Translation Systems for IWSLT 2018
Nguyen Bach | Hongjie Chen | Kai Fan | Cheung-Chi Leung | Bo Li | Chongjia Ni | Rong Tong | Pei Zhang | Boxing Chen | Bin Ma | Fei Huang
Proceedings of the 15th International Conference on Spoken Language Translation

This work describes the En→De Alibaba speech translation system developed for the evaluation campaign of the International Workshop on Spoken Language Translation (IWSLT) 2018. In order to improve ASR performance, multiple ASR models including conventional and end-to-end models are built, then we apply model fusion in the final step. ASR pre and post-processing techniques such as speech segmentation, punctuation insertion, and sentence splitting are found to be very useful for MT. We also employed most techniques that have proven effective during the WMT 2018 evaluation, such as BPE, back translation, data selection, model ensembling and reranking. These ASR and MT techniques, combined, improve the speech translation quality significantly.


pdf bib
NLPTEA 2017 Shared Task – Chinese Spelling Check
Gabriel Fung | Maxime Debosschere | Dingmin Wang | Bo Li | Jia Zhu | Kam-Fai Wong
Proceedings of the 4th Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA 2017)

This paper provides an overview along with our findings of the Chinese Spelling Check shared task at NLPTEA 2017. The goal of this task is to develop a computer-assisted system to automatically diagnose typing errors in traditional Chinese sentences written by students. We defined six types of errors which belong to two categories. Given a sentence, the system should detect where the errors are, and for each detected error determine its type and provide correction suggestions. We designed, constructed, and released a benchmark dataset for this task.


pdf bib
Dependency parsing for Chinese long sentence: A second-stage main structure parsing method
Bo Li | Yunfei Long | Weiguang Qu
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation: Posters


pdf bib
Clustering Comparable Corpora For Bilingual Lexicon Extraction
Bo Li | Eric Gaussier | Akiko Aizawa
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Degré de comparabilité, extraction lexicale bilingue et recherche d’information interlingue (Degree of comparability, bilingual lexical extraction and cross-language information retrieval)
Bo Li | Eric Gaussier | Emmanuel Morin | Amir Hazem
Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Nous étudions dans cet article le problème de la comparabilité des documents composant un corpus comparable afin d’améliorer la qualité des lexiques bilingues extraits et les performances des systèmes de recherche d’information interlingue. Nous proposons une nouvelle approche qui permet de garantir un certain degré de comparabilité et d’homogénéité du corpus tout en préservant une grande part du vocabulaire du corpus d’origine. Nos expériences montrent que les lexiques bilingues que nous obtenons sont d’une meilleure qualité que ceux obtenus avec les approches précédentes, et qu’ils peuvent être utilisés pour améliorer significativement les systèmes de recherche d’information interlingue.


pdf bib
Improving Corpus Comparability for Bilingual Lexicon Extraction from Comparable Corpora
Bo Li | Eric Gaussier
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)


Mining Chinese-English Parallel Corpora from the Web
Bo Li | Juan Liu
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-II


pdf bib
Mining Parallel Text from the Web based on Sentence Alignment
Bo Li | Juan Liu | Huili Zhu
Proceedings of the 21st Pacific Asia Conference on Language, Information and Computation