Hao Liu


pdf bib
UNIMO-2: End-to-End Unified Vision-Language Grounded Learning
Wei Li | Can Gao | Guocheng Niu | Xinyan Xiao | Hao Liu | Jiachen Liu | Hua Wu | Haifeng Wang
Findings of the Association for Computational Linguistics: ACL 2022

Vision-Language Pre-training (VLP) has achieved impressive performance on various cross-modal downstream tasks. However, most existing methods can only learn from aligned image-caption data and rely heavily on expensive regional features, which greatly limits their scalability and performance. In this paper, we propose an end-to-end unified-modal pre-training framework, namely UNIMO-2, for joint learning on both aligned image-caption data and unaligned image-only and text-only corpus. We build a unified Transformer model to jointly learn visual representations, textual representations and semantic alignment between images and texts. In particular, we propose to conduct grounded learning on both images and texts via a sharing grounded space, which helps bridge unaligned images and texts, and align the visual and textual semantic spaces on different types of corpora. The experiments show that our grounded learning method can improve textual and visual semantic alignment for improving performance on various cross-modal tasks. Moreover, benefiting from effective joint modeling of different types of corpora, our model also achieves impressive performance on single-modal visual and textual tasks. Our code and models are public at the UNIMO project page https://unimo-ptm.github.io/.


pdf bib
Improving Abstractive Dialogue Summarization with Hierarchical Pretraining and Topic Segment
MengNan Qi | Hao Liu | YuZhuo Fu | Ting Liu
Findings of the Association for Computational Linguistics: EMNLP 2021

With the increasing abundance of meeting transcripts, meeting summary has attracted more and more attention from researchers. The unsupervised pre-training method based on transformer structure combined with fine-tuning of downstream tasks has achieved great success in the field of text summarization. However, the semantic structure and style of meeting transcripts are quite different from that of articles. In this work, we propose a hierarchical transformer encoder-decoder network with multi-task pre-training. Specifically, we mask key sentences at the word-level encoder and generate them at the decoder. Besides, we randomly mask some of the role alignments in the input text and force the model to recover the original role tags to complete the alignments. In addition, we introduce a topic segmentation mechanism to further improve the quality of the generated summaries. The experimental results show that our model is superior to the previous methods in meeting summary datasets AMI and ICSI.

pdf bib
UNIMO: Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning
Wei Li | Can Gao | Guocheng Niu | Xinyan Xiao | Hao Liu | Jiachen Liu | Hua Wu | Haifeng Wang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Existed pre-training methods either focus on single-modal tasks or multi-modal tasks, and cannot effectively adapt to each other. They can only utilize single-modal data (i.e., text or image) or limited multi-modal data (i.e., image-text pairs). In this work, we propose a UNIfied-MOdal pre-training architecture, namely UNIMO, which can effectively adapt to both single-modal and multi-modal understanding and generation tasks. Large scale of free text corpus and image collections are utilized to improve the capability of visual and textual understanding, and cross-modal contrastive learning (CMCL) is leveraged to align the textual and visual information into a unified semantic space, over a corpus of image-text pairs augmented with related images and texts. With the help of rich non-paired single-modal data, our model is able to learn more generalizable representations, by allowing textual knowledge and visual knowledge to enhance each other in the unified semantic space. The experimental results show that UNIMO greatly improves the performance of several single-modal and multi-modal downstream tasks. Our code and pre-trained models are public at https://github.com/PaddlePaddle/Research/tree/master/NLP/UNIMO.


pdf bib
SKEP: Sentiment Knowledge Enhanced Pre-training for Sentiment Analysis
Hao Tian | Can Gao | Xinyan Xiao | Hao Liu | Bolei He | Hua Wu | Haifeng Wang | Feng Wu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Recently, sentiment analysis has seen remarkable advance with the help of pre-training approaches. However, sentiment knowledge, such as sentiment words and aspect-sentiment pairs, is ignored in the process of pre-training, despite the fact that they are widely used in traditional sentiment analysis approaches. In this paper, we introduce Sentiment Knowledge Enhanced Pre-training (SKEP) in order to learn a unified sentiment representation for multiple sentiment analysis tasks. With the help of automatically-mined knowledge, SKEP conducts sentiment masking and constructs three sentiment knowledge prediction objectives, so as to embed sentiment information at the word, polarity and aspect level into pre-trained sentiment representation. In particular, the prediction of aspect-sentiment pairs is converted into multi-label classification, aiming to capture the dependency between words in a pair. Experiments on three kinds of sentiment tasks show that SKEP significantly outperforms strong pre-training baseline, and achieves new state-of-the-art results on most of the test datasets. We release our code at https://github.com/baidu/Senta.

pdf bib
Diversified Multiple Instance Learning for Document-Level Multi-Aspect Sentiment Classification
Yunjie Ji | Hao Liu | Bolei He | Xinyan Xiao | Hua Wu | Yanhua Yu
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Neural Document-level Multi-aspect Sentiment Classification (DMSC) usually requires a lot of manual aspect-level sentiment annotations, which is time-consuming and laborious. As document-level sentiment labeled data are widely available from online service, it is valuable to perform DMSC with such free document-level annotations. To this end, we propose a novel Diversified Multiple Instance Learning Network (D-MILN), which is able to achieve aspect-level sentiment classification with only document-level weak supervision. Specifically, we connect aspect-level and document-level sentiment by formulating this problem as multiple instance learning, providing a way to learn aspect-level classifier from the back propagation of document-level supervision. Two diversified regularizations are further introduced in order to avoid the overfitting on document-level signals during training. Diversified textual regularization encourages the classifier to select aspect-relevant snippets, and diversified sentimental regularization prevents the aspect-level sentiments from being overly consistent with document-level sentiment. Experimental results on TripAdvisor and BeerAdvocate datasets show that D-MILN remarkably outperforms recent weakly-supervised baselines, and is also comparable to the supervised method.


pdf bib
Pattern-revising Enhanced Simple Question Answering over Knowledge Bases
Yanchao Hao | Hao Liu | Shizhu He | Kang Liu | Jun Zhao
Proceedings of the 27th International Conference on Computational Linguistics

Question Answering over Knowledge Bases (KB-QA), which automatically answer natural language questions based on the facts contained by a knowledge base, is one of the most important natural language processing (NLP) tasks. Simple questions constitute a large part of questions queried on the web, still being a challenge to QA systems. In this work, we propose to conduct pattern extraction and entity linking first, and put forward pattern revising procedure to mitigate the error propagation problem. In order to learn to rank candidate subject-predicate pairs to enable the relevant facts retrieval given a question, we propose to do joint fact selection enhanced by relation detection. Multi-level encodings and multi-dimension information are leveraged to strengthen the whole procedure. The experimental results demonstrate that our approach sets a new record in this task, outperforming the current state-of-the-art by an absolute large margin.


pdf bib
N-gram Model for Chinese Grammatical Error Diagnosis
Jianbo Zhao | Hao Liu | Zuyi Bao | Xiaopeng Bai | Si Li | Zhiqing Lin
Proceedings of the 4th Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA 2017)

Detection and correction of Chinese grammatical errors have been two of major challenges for Chinese automatic grammatical error diagnosis.This paper presents an N-gram model for automatic detection and correction of Chinese grammatical errors in NLPTEA 2017 task. The experiment results show that the proposed method is good at correction of Chinese grammatical errors.


pdf bib
Effective Selection of Translation Model Training Data
Le Liu | Yu Hong | Hao Liu | Xing Wang | Jianmin Yao
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)


pdf bib
A Three-Step Deterministic Parser for Chinese Dependency Parsing
Kun Yu | Sadao Kurohashi | Hao Liu
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers


pdf bib
Chinese Word Segmentation and Named Entity Recognition by Character Tagging
Kun Yu | Sadao Kurohashi | Hao Liu | Toshiaki Nakazawa
Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing