Xiaojun Wan


2021

pdf bib
ParaSCI: A Large Scientific Paraphrase Dataset for Longer Paraphrase Generation
Qingxiu Dong | Xiaojun Wan | Yue Cao
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

We propose ParaSCI, the first large-scale paraphrase dataset in the scientific field, including 33,981 paraphrase pairs from ACL (ParaSCI-ACL) and 316,063 pairs from arXiv (ParaSCI-arXiv). Digging into characteristics and common patterns of scientific papers, we construct this dataset though intra-paper and inter-paper methods, such as collecting citations to the same paper or aggregating definitions by scientific terms. To take advantage of sentences paraphrased partially, we put up PDBERT as a general paraphrase discovering method. The major advantages of paraphrases in ParaSCI lie in the prominent length and textual diversity, which is complementary to existing paraphrase datasets. ParaSCI obtains satisfactory results on human evaluation and downstream tasks, especially long paraphrase generation.

pdf bib
Continual Learning for Neural Machine Translation
Yue Cao | Hao-Ran Wei | Boxing Chen | Xiaojun Wan
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Neural machine translation (NMT) models are data-driven and require large-scale training corpus. In practical applications, NMT models are usually trained on a general domain corpus and then fine-tuned by continuing training on the in-domain corpus. However, this bears the risk of catastrophic forgetting that the performance on the general domain is decreased drastically. In this work, we propose a new continual learning framework for NMT models. We consider a scenario where the training is comprised of multiple stages and propose a dynamic knowledge distillation technique to alleviate the problem of catastrophic forgetting systematically. We also find that the bias exists in the output linear projection when fine-tuning on the in-domain corpus, and propose a bias-correction module to eliminate the bias. We conduct experiments on three representative settings of NMT application. Experimental results show that the proposed method achieves superior performance compared to baseline models in all settings.

pdf bib
Video Paragraph Captioning as a Text Summarization Task
Hui Liu | Xiaojun Wan
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Video paragraph captioning aims to generate a set of coherent sentences to describe a video that contains several events. Most previous methods simplify this task by using ground-truth event segments. In this work, we propose a novel framework by taking this task as a text summarization task. We first generate lots of sentence-level captions focusing on different video clips and then summarize these captions to obtain the final paragraph caption. Our method does not depend on ground-truth event segments. Experiments on two popular datasets ActivityNet Captions and YouCookII demonstrate the advantages of our new framework. On the ActivityNet dataset, our method even outperforms some previous methods using ground-truth event segment labels.

pdf bib
TransSum: Translating Aspect and Sentiment Embeddings for Self-Supervised Opinion Summarization
Ke Wang | Xiaojun Wan
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Making Better Use of Bilingual Information for Cross-Lingual AMR Parsing
Yitao Cai | Zhe Lin | Xiaojun Wan
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Pushing Paraphrase Away from Original Sentence: A Multi-Round Paraphrase Generation Approach
Zhe Lin | Xiaojun Wan
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Structure-Aware Pre-Training for Table-to-Text Generation
Xinyu Xing | Xiaojun Wan
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
WIND: Weighting Instances Differentially for Model-Agnostic Domain Adaptation
Xiang Chen | Yue Cao | Xiaojun Wan
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

2020

pdf bib
On the Helpfulness of Document Context to Sentence Simplification
Renliang Sun | Zhe Lin | Xiaojun Wan
Proceedings of the 28th International Conference on Computational Linguistics

Most of the research on text simplification is limited to sentence level nowadays. In this paper, we are the first to investigate the helpfulness of document context on sentence simplification and apply it to the sequence-to-sequence model. We firstly construct a sentence simplification dataset in which the contexts for the original sentence are provided by Wikipedia corpus. The new dataset contains approximately 116K sentence pairs with context. We then propose a new model that makes full use of the context information. Our model uses neural networks to learn the different effects of the preceding sentences and the following sentences on the current sentence and applies them to the improved transformer model. Evaluated on the newly constructed dataset, our model achieves 36.52 on SARI value, which outperforms the best performing model in the baselines by 2.46 (7.22%), indicating that context indeed helps improve sentence simplification. In the ablation experiment, we show that using either the preceding sentences or the following sentences as context can significantly improve simplification.

pdf bib
Improving Grammatical Error Correction with Data Augmentation by Editing Latent Representation
Zhaohong Wan | Xiaojun Wan | Wenguang Wang
Proceedings of the 28th International Conference on Computational Linguistics

The incorporation of data augmentation method in grammatical error correction task has attracted much attention. However, existing data augmentation methods mainly apply noise to tokens, which leads to the lack of diversity of generated errors. In view of this, we propose a new data augmentation method that can apply noise to the latent representation of a sentence.By editing the latent representations of grammatical sentences, we can generate synthetic samples with various error types. Combining with some pre-defined rules, our method can greatly improve the performance and robustness of existing grammatical error correction models. We evaluate our method on public benchmarks of GEC task and it achieves the state-of-the-art performance on CoNLL-2014 and FCE benchmarks.

pdf bib
Adversarial Text Generation via Sequence Contrast Discrimination
Ke Wang | Xiaojun Wan
Findings of the Association for Computational Linguistics: EMNLP 2020

In this paper, we propose a sequence contrast loss driven text generation framework, which learns the difference between real texts and generated texts and uses that difference. Specifically, our discriminator contains a discriminative sequence generator instead of a binary classifier, and measures the ‘relative realism’ of generated texts against real texts by making use of them simultaneously. Moreover, our generator uses discriminative sequences to directly improve itself, which not only replaces the gradient propagation process from the discriminator to the generator, but also avoids the time-consuming sampling process of estimating rewards in some previous methods. We conduct extensive experiments with various metrics, substantiating that our framework brings improvements in terms of training stability and the quality of generated texts.

pdf bib
DivGAN: Towards Diverse Paraphrase Generation via Diversified Generative Adversarial Network
Yue Cao | Xiaojun Wan
Findings of the Association for Computational Linguistics: EMNLP 2020

Paraphrases refer to texts that convey the same meaning with different expression forms. Traditional seq2seq-based models on paraphrase generation mainly focus on the fidelity while ignoring the diversity of outputs. In this paper, we propose a deep generative model to generate diverse paraphrases. We build our model based on the conditional generative adversarial network, and propose to incorporate a simple yet effective diversity loss term into the model in order to improve the diversity of outputs. The proposed diversity loss maximizes the ratio of pairwise distance between the generated texts and their corresponding latent codes, forcing the generator to focus more on the latent codes and produce diverse samples. Experimental results on benchmarks of paraphrase generation show that our proposed model can generate more diverse paraphrases compared with baselines.

pdf bib
Abstractive Multi-Document Summarization via Joint Learning with Single-Document Summarization
Hanqi Jin | Xiaojun Wan
Findings of the Association for Computational Linguistics: EMNLP 2020

Single-document and multi-document summarizations are very closely related in both task definition and solution method. In this work, we propose to improve neural abstractive multi-document summarization by jointly learning an abstractive single-document summarizer. We build a unified model for single-document and multi-document summarizations by fully sharing the encoder and decoder and utilizing a decoding controller to aggregate the decoder’s outputs for multiple input documents. We evaluate our model on two multi-document summarization datasets: Multi-News and DUC-04. Experimental results show the efficacy of our approach, and it can substantially outperform several strong baselines. We also verify the helpfulness of single-document summarization to abstractive multi-document summarization task.

pdf bib
AMR-To-Text Generation with Graph Transformer
Tianming Wang | Xiaojun Wan | Hanqi Jin
Transactions of the Association for Computational Linguistics, Volume 8

Abstract meaning representation (AMR)-to-text generation is the challenging task of generating natural language texts from AMR graphs, where nodes represent concepts and edges denote relations. The current state-of-the-art methods use graph-to-sequence models; however, they still cannot significantly outperform the previous sequence-to-sequence models or statistical approaches. In this paper, we propose a novel graph-to-sequence model (Graph Transformer) to address this task. The model directly encodes the AMR graphs and learns the node representations. A pairwise interaction function is used for computing the semantic relations between the concepts. Moreover, attention mechanisms are used for aggregating the information from the incoming and outgoing neighbors, which help the model to capture the semantic information effectively. Our model outperforms the state-of-the-art neural approach by 1.5 BLEU points on LDC2015E86 and 4.8 BLEU points on LDC2017T10 and achieves new state-of-the-art performances.

pdf bib
Learning to Ask More: Semi-Autoregressive Sequential Question Generation under Dual-Graph Interaction
Zi Chai | Xiaojun Wan
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Traditional Question Generation (TQG) aims to generate a question given an input passage and an answer. When there is a sequence of answers, we can perform Sequential Question Generation (SQG) to produce a series of interconnected questions. Since the frequently occurred information omission and coreference between questions, SQG is rather challenging. Prior works regarded SQG as a dialog generation task and recurrently produced each question. However, they suffered from problems caused by error cascades and could only capture limited context dependencies. To this end, we generate questions in a semi-autoregressive way. Our model divides questions into different groups and generates each group of them in parallel. During this process, it builds two graphs focusing on information from passages, answers respectively and performs dual-graph interaction to get information for generation. Besides, we design an answer-aware attention mechanism and the coarse-to-fine generation scenario. Experiments on our new dataset containing 81.9K questions show that our model substantially outperforms prior works.

pdf bib
Multimodal Transformer for Multimodal Machine Translation
Shaowei Yao | Xiaojun Wan
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Multimodal Machine Translation (MMT) aims to introduce information from other modality, generally static images, to improve the translation quality. Previous works propose various incorporation methods, but most of them do not consider the relative importance of multiple modalities. Equally treating all modalities may encode too much useless information from less important modalities. In this paper, we introduce the multimodal self-attention in Transformer to solve the issues above in MMT. The proposed method learns the representation of images based on the text, which avoids encoding irrelevant information in images. Experiments and visualization analysis demonstrate that our model benefits from visual information and substantially outperforms previous works and competitive baselines in terms of various metrics.

pdf bib
Automatic Generation of Citation Texts in Scholarly Papers: A Pilot Study
Xinyu Xing | Xiaosheng Fan | Xiaojun Wan
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

In this paper, we study the challenging problem of automatic generation of citation texts in scholarly papers. Given the context of a citing paper A and a cited paper B, the task aims to generate a short text to describe B in the given context of A. One big challenge for addressing this task is the lack of training data. Usually, explicit citation texts are easy to extract, but it is not easy to extract implicit citation texts from scholarly papers. We thus first train an implicit citation extraction model based on BERT and leverage the model to construct a large training dataset for the citation text generation task. Then we propose and train a multi-source pointer-generator network with cross attention mechanism for citation text generation. Empirical evaluation results on a manually labeled test dataset verify the efficacy of our model. This pilot study confirms the feasibility of automatically generating citation texts in scholarly papers and the technique has the great potential to help researchers prepare their scientific papers.

pdf bib
Jointly Learning to Align and Summarize for Neural Cross-Lingual Summarization
Yue Cao | Hui Liu | Xiaojun Wan
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Cross-lingual summarization is the task of generating a summary in one language given a text in a different language. Previous works on cross-lingual summarization mainly focus on using pipeline methods or training an end-to-end model using the translated parallel data. However, it is a big challenge for the model to directly learn cross-lingual summarization as it requires learning to understand different languages and learning how to summarize at the same time. In this paper, we propose to ease the cross-lingual summarization training by jointly learning to align and summarize. We design relevant loss functions to train this framework and propose several methods to enhance the isomorphism and cross-lingual transfer between languages. Experimental results show that our model can outperform competitive models in most cases. In addition, we show that our model even has the ability to generate cross-lingual summaries without access to any cross-lingual corpus.

pdf bib
Multi-Granularity Interaction Network for Extractive and Abstractive Multi-Document Summarization
Hanqi Jin | Tianming Wang | Xiaojun Wan
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

In this paper, we propose a multi-granularity interaction network for extractive and abstractive multi-document summarization, which jointly learn semantic representations for words, sentences, and documents. The word representations are used to generate an abstractive summary while the sentence representations are used to produce an extractive summary. We employ attention mechanisms to interact between different granularity of semantic representations, which helps to capture multi-granularity key information and improves the performance of both abstractive and extractive summarization. Experiment results show that our proposed model substantially outperforms all strong baseline methods and achieves the best results on the Multi-News dataset.

pdf bib
Semantic Parsing for English as a Second Language
Yuanyuan Zhao | Weiwei Sun | Junjie Cao | Xiaojun Wan
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

This paper is concerned with semantic parsing for English as a second language (ESL). Motivated by the theoretical emphasis on the learning challenges that occur at the syntax-semantics interface during second language acquisition, we formulate the task based on the divergence between literal and intended meanings. We combine the complementary strengths of English Resource Grammar, a linguistically-precise hand-crafted deep grammar, and TLE, an existing manually annotated ESL UD-TreeBank with a novel reranking model. Experiments demonstrate that in comparison to human annotations, our method can obtain a very promising SemBanking quality. By means of the newly created corpus, we evaluate state-of-the-art semantic parsing as well as grammatical error correction models. The evaluation profiles the performance of neural NLP techniques for handling ESL data and suggests some research directions.

pdf bib
Heterogeneous Graph Transformer for Graph-to-Sequence Learning
Shaowei Yao | Tianming Wang | Xiaojun Wan
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

The graph-to-sequence (Graph2Seq) learning aims to transduce graph-structured representations to word sequences for text generation. Recent studies propose various models to encode graph structure. However, most previous works ignore the indirect relations between distance nodes, or treat indirect relations and direct relations in the same way. In this paper, we propose the Heterogeneous Graph Transformer to independently model the different relations in the individual subgraphs of the original graph, including direct relations, indirect relations and multiple possible relations between nodes. Experimental results show that our model strongly outperforms the state of the art on all four standard benchmarks of AMR-to-text generation and syntax-based neural machine translation.

pdf bib
Homophonic Pun Generation with Lexically Constrained Rewriting
Zhiwei Yu | Hongyu Zang | Xiaojun Wan
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Punning is a creative way to make conversation enjoyable and literary writing elegant. In this paper, we focus on the task of generating a pun sentence given a pair of homophones. We first find the constraint words supporting the semantic incongruity for a sentence. Then we rewrite the sentence with explicit positive and negative constraints. Our model achieves the state-of-the-art results in both automatic and human evaluations. We further make an error analysis and discuss the challenges for the computational pun models.

pdf bib
Routing Enforced Generative Model for Recipe Generation
Zhiwei Yu | Hongyu Zang | Xiaojun Wan
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

One of the most challenging part of recipe generation is to deal with the complex restrictions among the input ingredients. Previous researches simplify the problem by treating the inputs independently and generating recipes containing as much information as possible. In this work, we propose a routing method to dive into the content selection under the internal restrictions. The routing enforced generative model (RGM) can generate appropriate recipes according to the given ingredients and user preferences. Our model yields new state-of-the-art results on the recipe generation task with significant improvements on BLEU, F1 and human evaluation.

pdf bib
IGSQL: Database Schema Interaction Graph Based Neural Model for Context-Dependent Text-to-SQL Generation
Yitao Cai | Xiaojun Wan
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Context-dependent text-to-SQL task has drawn much attention in recent years. Previous models on context-dependent text-to-SQL task only concentrate on utilizing historic user inputs. In this work, in addition to using encoders to capture historic information of user inputs, we propose a database schema interaction graph encoder to utilize historic information of database schema items. In decoding phase, we introduce a gate mechanism to weigh the importance of different vocabularies and then make the prediction of SQL tokens. We evaluate our model on the benchmark SParC and CoSQL datasets, which are two large complex context-dependent cross-domain text-to-SQL datasets. Our model outperforms previous state-of-the-art model by a large margin and achieves new state-of-the-art results on the two datasets. The comparison and ablation results demonstrate the efficacy of our model and the usefulness of the database schema interaction graph encoder.

2019

pdf bib
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
Kentaro Inui | Jing Jiang | Vincent Ng | Xiaojun Wan
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

pdf bib
Learning Bilingual Sentiment-Specific Word Embeddings without Cross-lingual Supervision
Yanlin Feng | Xiaojun Wan
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Word embeddings learned in two languages can be mapped to a common space to produce Bilingual Word Embeddings (BWE). Unsupervised BWE methods learn such a mapping without any parallel data. However, these methods are mainly evaluated on tasks of word translation or word similarity. We show that these methods fail to capture the sentiment information and do not perform well enough on cross-lingual sentiment analysis. In this work, we propose UBiSE (Unsupervised Bilingual Sentiment Embeddings), which learns sentiment-specific word representations for two languages in a common space without any cross-lingual supervision. Our method only requires a sentiment corpus in the source language and pretrained monolingual word embeddings of both languages. We evaluate our method on three language pairs for cross-lingual sentiment analysis. Experimental results show that our method outperforms previous unsupervised BWE methods and even supervised BWE methods. Our method succeeds for a distant language pair English-Basque.

pdf bib
How to Avoid Sentences Spelling Boring? Towards a Neural Approach to Unsupervised Metaphor Generation
Zhiwei Yu | Xiaojun Wan
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Metaphor generation attempts to replicate human creativity with language, which is an attractive but challengeable text generation task. Previous efforts mainly focus on template-based or rule-based methods and result in a lack of linguistic subtlety. In order to create novel metaphors, we propose a neural approach to metaphor generation and explore the shared inferential structure of a metaphorical usage and a literal usage of a verb. Our approach does not require any manually annotated metaphors for training. We extract the metaphorically used verbs with their metaphorical senses in an unsupervised way and train a neural language model from wiki corpus. Then we generate metaphors conveying the assigned metaphorical senses with an improved decoding algorithm. Automatic metrics and human evaluations demonstrate that our approach can generate metaphors with good readability and creativity.

pdf bib
INS: An Interactive Chinese News Synthesis System
Hui Liu | Wentao Qin | Xiaojun Wan
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations)

Nowadays, we are surrounded by more and more online news articles. Tens or hundreds of news articles need to be read if we wish to explore a hot news event or topic. So it is of vital importance to automatically synthesize a batch of news articles related to the event or topic into a new synthesis article (or overview article) for reader’s convenience. It is so challenging to make news synthesis fully automatic that there is no successful solution by now. In this paper, we put forward a novel Interactive News Synthesis system (i.e. INS), which can help generate news overview articles automatically or by interacting with users. More importantly, INS can serve as a tool for editors to help them finish their jobs. In our experiments, INS performs well on both topic representation and synthesis article generation. A user study also demonstrates the usefulness and users’ satisfaction with the INS tool. A demo video is available at https://youtu.be/7ItteKW3GEk.

pdf bib
Parsing Chinese Sentences with Grammatical Relations
Weiwei Sun | Yufei Chen | Xiaojun Wan | Meichun Liu
Computational Linguistics, Volume 45, Issue 1 - March 2019

We report our work on building linguistic resources and data-driven parsers in the grammatical relation (GR) analysis for Mandarin Chinese. Chinese, as an analytic language, encodes grammatical information in a highly configurational rather than morphological way. Accordingly, it is possible and reasonable to represent almost all grammatical relations as bilexical dependencies. In this work, we propose to represent grammatical information using general directed dependency graphs. Both only-local and rich long-distance dependencies are explicitly represented. To create high-quality annotations, we take advantage of an existing TreeBank, namely, Chinese TreeBank (CTB), which is grounded on the Government and Binding theory. We define a set of linguistic rules to explore CTB’s implicit phrase structural information and build deep dependency graphs. The reliability of this linguistically motivated GR extraction procedure is highlighted by manual evaluation. Based on the converted corpus, data-driven, including graph- and transition-based, models are explored for Chinese GR parsing. For graph-based parsing, a new perspective, graph merging, is proposed for building flexible dependency graphs: constructing complex graphs via constructing simple subgraphs. Two key problems are discussed in this perspective: (1) how to decompose a complex graph into simple subgraphs, and (2) how to combine subgraphs into a coherent complex graph. For transition-based parsing, we introduce a neural parser based on a list-based transition system. We also discuss several other key problems, including dynamic oracle and beam search for neural transition-based parsing. Evaluation gauges how successful GR parsing for Chinese can be by applying data-driven models. The empirical analysis suggests several directions for future study.

pdf bib
Towards a Unified End-to-End Approach for Fully Unsupervised Cross-Lingual Sentiment Analysis
Yanlin Feng | Xiaojun Wan
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

Sentiment analysis in low-resource languages suffers from the lack of training data. Cross-lingual sentiment analysis (CLSA) aims to improve the performance on these languages by leveraging annotated data from other languages. Recent studies have shown that CLSA can be performed in a fully unsupervised manner, without exploiting either target language supervision or cross-lingual supervision. However, these methods rely heavily on unsupervised cross-lingual word embeddings (CLWE), which has been shown to have serious drawbacks on distant language pairs (e.g. English - Japanese). In this paper, we propose an end-to-end CLSA model by leveraging unlabeled data in multiple languages and multiple domains and eliminate the need for unsupervised CLWE. Our model applies to two CLSA settings: the traditional cross-lingual in-domain setting and the more challenging cross-lingual cross-domain setting. We empirically evaluate our approach on the multilingual multi-domain Amazon review dataset. Experimental results show that our model outperforms the baselines by a large margin despite its minimal resource requirement.

pdf bib
Multi-Modal Sarcasm Detection in Twitter with Hierarchical Fusion Model
Yitao Cai | Huiyu Cai | Xiaojun Wan
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Sarcasm is a subtle form of language in which people express the opposite of what is implied. Previous works of sarcasm detection focused on texts. However, more and more social media platforms like Twitter allow users to create multi-modal messages, including texts, images, and videos. It is insufficient to detect sarcasm from multi-model messages based only on texts. In this paper, we focus on multi-modal sarcasm detection for tweets consisting of texts and images in Twitter. We treat text features, image features and image attributes as three modalities and propose a multi-modal hierarchical fusion model to address this task. Our model first extracts image features and attribute features, and then leverages attribute features and bidirectional LSTM network to extract text features. Features of three modalities are then reconstructed and fused into one feature vector for prediction. We create a multi-modal sarcasm detection dataset based on Twitter. Evaluation results on the dataset demonstrate the efficacy of our proposed model and the usefulness of the three modalities.

pdf bib
Asking the Crowd: Question Analysis, Evaluation and Generation for Open Discussion on Online Forums
Zi Chai | Xinyu Xing | Xiaojun Wan | Bo Huang
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Teaching machines to ask questions is an important yet challenging task. Most prior work focused on generating questions with fixed answers. As contents are highly limited by given answers, these questions are often not worth discussing. In this paper, we take the first step on teaching machines to ask open-answered questions from real-world news for open discussion (openQG). To generate high-qualified questions, effective ways for question evaluation are required. We take the perspective that the more answers a question receives, the better it is for open discussion, and analyze how language use affects the number of answers. Compared with other factors, e.g. topic and post time, linguistic factors keep our evaluation from being domain-specific. We carefully perform variable control on 11.5M questions from online forums to get a dataset, OQRanD, and further perform question analysis. Based on these conclusions, several models are built for question evaluation. For openQG task, we construct OQGenD, the first dataset as far as we know, and propose a model based on conditional generative adversarial networks and our question evaluation model. Experiments show that our model can generate questions with higher quality compared with commonly-used text generation methods.

pdf bib
Automated Chess Commentator Powered by Neural Chess Engine
Hongyu Zang | Zhiwei Yu | Xiaojun Wan
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

In this paper, we explore a new approach for automated chess commentary generation, which aims to generate chess commentary texts in different categories (e.g., description, comparison, planning, etc.). We introduce a neural chess engine into text generation models to help with encoding boards, predicting moves, and analyzing situations. By jointly training the neural chess engine and the generation models for different categories, the models become more effective. We conduct experiments on 5 categories in a benchmark Chess Commentary dataset and achieve inspiring results in both automatic and human evaluations.

2018

pdf bib
Point Precisely: Towards Ensuring the Precision of Data in Generated Texts Using Delayed Copy Mechanism
Liunian Li | Xiaojun Wan
Proceedings of the 27th International Conference on Computational Linguistics

The task of data-to-text generation aims to generate descriptive texts conditioned on a number of database records, and recent neural models have shown significant progress on this task. The attention based encoder-decoder models with copy mechanism have achieved state-of-the-art results on a few data-to-text datasets. However, such models still face the problem of putting incorrect data records in the generated texts, especially on some more challenging datasets like RotoWire. In this paper, we propose a two-stage approach with a delayed copy mechanism to improve the precision of data records in the generated texts. Our approach first adopts an encoder-decoder model to generate a template text with data slots to be filled and then leverages a proposed delayed copy mechanism to fill in the slots with proper data records. Our delayed copy mechanism can take into account all the information of the input data records and the full generated template text by using double attention, position-aware attention and a pairwise ranking loss. The two models in the two stages are trained separately. Evaluation results on the RotoWire dataset verify the efficacy of our proposed approach to generate better templates and copy data records more precisely.

pdf bib
Book Review: Automatic Text Simplification by Horacio Saggion
Xiaojun Wan
Computational Linguistics, Volume 44, Issue 4 - December 2018

pdf bib
Neural Maximum Subgraph Parsing for Cross-Domain Semantic Dependency Analysis
Yufei Chen | Sheng Huang | Fang Wang | Junjie Cao | Weiwei Sun | Xiaojun Wan
Proceedings of the 22nd Conference on Computational Natural Language Learning

We present experiments for cross-domain semantic dependency analysis with a neural Maximum Subgraph parser. Our parser targets 1-endpoint-crossing, pagenumber-2 graphs which are a good fit to semantic dependency graphs, and utilizes an efficient dynamic programming algorithm for decoding. For disambiguation, the parser associates words with BiLSTM vectors and utilizes these vectors to assign scores to candidate dependencies. We conduct experiments on the data sets from SemEval 2015 as well as Chinese CCGBank. Our parser achieves very competitive results for both English and Chinese. To improve the parsing performance on cross-domain texts, we propose a data-oriented method to explore the linguistic generality encoded in English Resource Grammar, which is a precisionoriented, hand-crafted HPSG grammar, in an implicit way. Experiments demonstrate the effectiveness of our data-oriented method across a wide range of conditions.

pdf bib
Accurate SHRG-Based Semantic Parsing
Yufei Chen | Weiwei Sun | Xiaojun Wan
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We demonstrate that an SHRG-based parser can produce semantic graphs much more accurately than previously shown, by relating synchronous production rules to the syntacto-semantic composition process. Our parser achieves an accuracy of 90.35 for EDS (89.51 for DMRS) in terms of elementary dependency match, which is a 4.87 (5.45) point improvement over the best existing data-driven model, indicating, in our view, the importance of linguistically-informed derivation for data-driven semantic parsing. This accuracy is equivalent to that of English Resource Grammar guided models, suggesting that (recurrent) neural network models are able to effectively learn deep linguistic knowledge from annotations.

pdf bib
A Neural Approach to Pun Generation
Zhiwei Yu | Jiwei Tan | Xiaojun Wan
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Automatic pun generation is an interesting and challenging text generation task. Previous efforts rely on templates or laboriously manually annotated pun datasets, which heavily constrains the quality and diversity of generated puns. Since sequence-to-sequence models provide an effective technique for text generation, it is promising to investigate these models on the pun generation task. In this paper, we propose neural network models for homographic pun generation, and they can generate puns without requiring any pun data for training. We first train a conditional neural language model from a general text corpus, and then generate puns from the language model with an elaborately designed decoding algorithm. Automatic and human evaluations show that our models are able to generate homographic puns of good readability and quality.

pdf bib
Language Generation via DAG Transduction
Yajie Ye | Weiwei Sun | Xiaojun Wan
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

A DAG automaton is a formal device for manipulating graphs. By augmenting a DAG automaton with transduction rules, a DAG transducer has potential applications in fundamental NLP tasks. In this paper, we propose a novel DAG transducer to perform graph-to-program transformation. The target structure of our transducer is a program licensed by a declarative programming language rather than linguistic structures. By executing such a program, we can easily get a surface string. Our transducer is designed especially for natural language generation (NLG) from type-logical semantic graphs. Taking Elementary Dependency Structures, a format of English Resource Semantics, as input, our NLG system achieves a BLEU-4 score of 68.07. This remarkable result demonstrates the feasibility of applying a DAG transducer to resolve NLG, as well as the effectiveness of our design.

pdf bib
Pre- and In-Parsing Models for Neural Empty Category Detection
Yufei Chen | Yuanyuan Zhao | Weiwei Sun | Xiaojun Wan
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Motivated by the positive impact of empty category on syntactic parsing, we study neural models for pre- and in-parsing detection of empty category, which has not previously been investigated. We find several non-obvious facts: (a) BiLSTM can capture non-local contextual information which is essential for detecting empty categories, (b) even with a BiLSTM, syntactic information is still able to enhance the detection, and (c) automatic detection of empty categories improves parsing quality for overt words. Our neural ECD models outperform the prior state-of-the-art by significant margins.

pdf bib
Sense-Aware Neural Models for Pun Location in Texts
Yitao Cai | Yin Li | Xiaojun Wan
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

A homographic pun is a form of wordplay in which one signifier (usually a word) suggests two or more meanings by exploiting polysemy for an intended humorous or rhetorical effect. In this paper, we focus on the task of pun location, which aims to identify the pun word in a given short text. We propose a sense-aware neural model to address this challenging task. Our model first obtains several WSD results for the text, and then leverages a bidirectional LSTM network to model each sequence of word senses. The outputs at each time step for different LSTM networks are then concatenated for prediction. Evaluation results on the benchmark SemEval 2017 dataset demonstrate the efficacy of our proposed model.

pdf bib
Adapting Neural Single-Document Summarization Model for Abstractive Multi-Document Summarization: A Pilot Study
Jianmin Zhang | Jiwei Tan | Xiaojun Wan
Proceedings of the 11th International Conference on Natural Language Generation

Till now, neural abstractive summarization methods have achieved great success for single document summarization (SDS). However, due to the lack of large scale multi-document summaries, such methods can be hardly applied to multi-document summarization (MDS). In this paper, we investigate neural abstractive methods for MDS by adapting a state-of-the-art neural abstractive summarization model for SDS. We propose an approach to extend the neural abstractive model trained on large scale SDS data to the MDS task. Our approach only makes use of a small number of multi-document summaries for fine tuning. Experimental results on two benchmark DUC datasets demonstrate that our approach can outperform a variety of baseline neural models.

pdf bib
Semantic Role Labeling for Learner Chinese: the Importance of Syntactic Parsing and L2-L1 Parallel Data
Zi Lin | Yuguang Duan | Yuanyuan Zhao | Weiwei Sun | Xiaojun Wan
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

This paper studies semantic parsing for interlanguage (L2), taking semantic role labeling (SRL) as a case task and learner Chinese as a case language. We first manually annotate the semantic roles for a set of learner texts to derive a gold standard for automatic SRL. Based on the new data, we then evaluate three off-the-shelf SRL systems, i.e., the PCFGLA-parser-based, neural-parser-based and neural-syntax-agnostic systems, to gauge how successful SRL for learner Chinese can be. We find two non-obvious facts: 1) the L1-sentence-trained systems performs rather badly on the L2 data; 2) the performance drop from the L1 data to the L2 data of the two parser-based systems is much smaller, indicating the importance of syntactic parsing in SRL for interlanguages. Finally, the paper introduces a new agreement-based model to explore the semantic coherency information in the large-scale L2-L1 parallel data. We then show such information is very effective to enhance SRL for learner texts. Our model achieves an F-score of 72.06, which is a 2.02 point improvement over the best baseline.

2017

pdf bib
Content Selection for Real-time Sports News Construction from Commentary Texts
Jin-ge Yao | Jianmin Zhang | Xiaojun Wan | Jianguo Xiao
Proceedings of the 10th International Conference on Natural Language Generation

We study the task of constructing sports news report automatically from live commentary and focus on content selection. Rather than receiving every piece of text of a sports match before news construction, as in previous related work, we novelly verify the feasibility of a more challenging but more useful setting to generate news report on the fly by treating live text input as a stream. Specifically, we design various scoring functions to address different requirements of the task. The near submodularity of scoring functions makes it possible to adapt efficient greedy algorithms even in stream data settings. Experiments suggest that our proposed framework can already produce comparable results compared with previous work that relies on a supervised learning-to-rank model with heavy feature engineering.

pdf bib
Towards Automatic Generation of Product Reviews from Aspect-Sentiment Scores
Hongyu Zang | Xiaojun Wan
Proceedings of the 10th International Conference on Natural Language Generation

Data-to-text generation is very essential and important in machine writing applications. The recent deep learning models, like Recurrent Neural Networks (RNNs), have shown a bright future for relevant text generation tasks. However, rare work has been done for automatic generation of long reviews from user opinions. In this paper, we introduce a deep neural network model to generate long Chinese reviews from aspect-sentiment scores representing users’ opinions. We conduct our study within the framework of encoder-decoder networks, and we propose a hierarchical structure with aligned attention in the Long-Short Term Memory (LSTM) decoder. Experiments show that our model outperforms retrieval based baseline methods, and also beats the sequential generation models in qualitative evaluations.

pdf bib
Parsing for Grammatical Relations via Graph Merging
Weiwei Sun | Yantao Du | Xiaojun Wan
Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)

This paper is concerned with building deep grammatical relation (GR) analysis using data-driven approach. To deal with this problem, we propose graph merging, a new perspective, for building flexible dependency graphs: Constructing complex graphs via constructing simple subgraphs. We discuss two key problems in this perspective: (1) how to decompose a complex graph into simple subgraphs, and (2) how to combine subgraphs into a coherent complex graph. Experiments demonstrate the effectiveness of graph merging. Our parser reaches state-of-the-art performance and is significantly better than two transition-based parsers.

pdf bib
The Covert Helps Parse the Overt
Xun Zhang | Weiwei Sun | Xiaojun Wan
Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)

This paper is concerned with whether deep syntactic information can help surface parsing, with a particular focus on empty categories. We design new algorithms to produce dependency trees in which empty elements are allowed, and evaluate the impact of information about empty category on parsing overt elements. Such information is helpful to reduce the approximation error in a structured parsing model, but increases the search space for inference and accordingly the estimation error. To deal with structure-based overfitting, we propose to integrate disambiguation models with and without empty elements, and perform structure regularization via joint decoding. Experiments on English and Chinese TreeBanks with different parsing models indicate that incorporating empty elements consistently improves surface parsing.

pdf bib
Semantic Dependency Parsing via Book Embedding
Weiwei Sun | Junjie Cao | Xiaojun Wan
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We model a dependency graph as a book, a particular kind of topological space, for semantic dependency parsing. The spine of the book is made up of a sequence of words, and each page contains a subset of noncrossing arcs. To build a semantic graph for a given sentence, we design new Maximum Subgraph algorithms to generate noncrossing graphs on each page, and a Lagrangian Relaxation-based algorithm tocombine pages into a book. Experiments demonstrate the effectiveness of the bookembedding framework across a wide range of conditions. Our parser obtains comparable results with a state-of-the-art transition-based parser.

pdf bib
Abstractive Document Summarization with a Graph-Based Attentional Neural Model
Jiwei Tan | Xiaojun Wan | Jianguo Xiao
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Abstractive summarization is the ultimate goal of document summarization research, but previously it is less investigated due to the immaturity of text generation techniques. Recently impressive progress has been made to abstractive sentence summarization using neural models. Unfortunately, attempts on abstractive document summarization are still in a primitive stage, and the evaluation results are worse than extractive methods on benchmark datasets. In this paper, we review the difficulties of neural abstractive document summarization, and propose a novel graph-based attention mechanism in the sequence-to-sequence framework. The intuition is to address the saliency factor of summarization, which has been overlooked by prior works. Experimental results demonstrate our model is able to achieve considerable improvement over previous neural abstractive models. The data-driven neural abstractive method is also competitive with state-of-the-art extractive methods.

pdf bib
Parsing to 1-Endpoint-Crossing, Pagenumber-2 Graphs
Junjie Cao | Sheng Huang | Weiwei Sun | Xiaojun Wan
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We study the Maximum Subgraph problem in deep dependency parsing. We consider two restrictions to deep dependency graphs: (a) 1-endpoint-crossing and (b) pagenumber-2. Our main contribution is an exact algorithm that obtains maximum subgraphs satisfying both restrictions simultaneously in time O(n5). Moreover, ignoring one linguistically-rare structure descreases the complexity to O(n4). We also extend our quartic-time algorithm into a practical parser with a discriminative disambiguation model and evaluate its performance on four linguistic data sets used in semantic dependency parsing.

pdf bib
Quasi-Second-Order Parsing for 1-Endpoint-Crossing, Pagenumber-2 Graphs
Junjie Cao | Sheng Huang | Weiwei Sun | Xiaojun Wan
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We propose a new Maximum Subgraph algorithm for first-order parsing to 1-endpoint-crossing, pagenumber-2 graphs. Our algorithm has two characteristics: (1) it separates the construction for noncrossing edges and crossing edges; (2) in a single construction step, whether to create a new arc is deterministic. These two characteristics make our algorithm relatively easy to be extended to incorporiate crossing-sensitive second-order features. We then introduce a new algorithm for quasi-second-order parsing. Experiments demonstrate that second-order features are helpful for Maximum Subgraph parsing.

pdf bib
Towards a Universal Sentiment Classifier in Multiple languages
Kui Xu | Xiaojun Wan
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Existing sentiment classifiers usually work for only one specific language, and different classification models are used in different languages. In this paper we aim to build a universal sentiment classifier with a single classification model in multiple different languages. In order to achieve this goal, we propose to learn multilingual sentiment-aware word embeddings simultaneously based only on the labeled reviews in English and unlabeled parallel data available in a few language pairs. It is not required that the parallel data exist between English and any other language, because the sentiment information can be transferred into any language via pivot languages. We present the evaluation results of our universal sentiment classifier in five languages, and the results are very promising even when the parallel data between English and the target languages are not used. Furthermore, the universal single classifier is compared with a few cross-language sentiment classifiers relying on direct parallel data between the source and target languages, and the results show that the performance of our universal sentiment classifier is very promising compared to that of different cross-language classifiers in multiple target languages.

pdf bib
Towards Automatic Construction of News Overview Articles by News Synthesis
Jianmin Zhang | Xiaojun Wan
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

In this paper we investigate a new task of automatically constructing an overview article from a given set of news articles about a news event. We propose a news synthesis approach to address this task based on passage segmentation, ranking, selection and merging. Our proposed approach is compared with several typical multi-document summarization methods on the Wikinews dataset, and achieves the best performance on both automatic evaluation and manual evaluation.

pdf bib
Leveraging Diverse Lexical Chains to Construct Essays for Chinese College Entrance Examination
Liunian Li | Xiaojun Wan | Jin-ge Yao | Siming Yan
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

In this work we study the challenging task of automatically constructing essays for Chinese college entrance examination where the topic is specified in advance. We explore a sentence extraction framework based on diversified lexical chains to capture coherence and richness. Experimental analysis shows the effectiveness of our approach and reveals the importance of information richness in essay writing.

2016

pdf bib
Attention-based LSTM Network for Cross-Lingual Sentiment Classification
Xinjie Zhou | Xiaojun Wan | Jianguo Xiao
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Towards Constructing Sports News from Live Text Commentary
Jianmin Zhang | Jin-ge Yao | Xiaojun Wan
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Cross-Lingual Sentiment Classification with Bilingual Document Representation Learning
Xinjie Zhou | Xiaojun Wan | Jianguo Xiao
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Automatic Labeling of Topic Models Using Text Summaries
Xiaojun Wan | Tianming Wang
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
User Embedding for Scholarly Microblog Recommendation
Yang Yu | Xiaojun Wan | Xinjie Zhou
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Transition-Based Parsing for Deep Dependency Structures
Xun Zhang | Yantao Du | Weiwei Sun | Xiaojun Wan
Computational Linguistics, Volume 42, Issue 3 - September 2016

pdf bib
Towards Accurate and Efficient Chinese Part-of-Speech Tagging
Weiwei Sun | Xiaojun Wan
Computational Linguistics, Volume 42, Issue 3 - September 2016

pdf bib
PKUSUMSUM : A Java Platform for Multilingual Document Summarization
Jianmin Zhang | Tianming Wang | Xiaojun Wan
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations

PKUSUMSUM is a Java platform for multilingual document summarization, and it sup-ports multiple languages, integrates 10 automatic summarization methods, and tackles three typical summarization tasks. The summarization platform has been released and users can easily use and update it. In this paper, we make a brief description of the char-acteristics, the summarization methods, and the evaluation results of the platform, and al-so compare PKUSUMSUM with other summarization toolkits.

2015

pdf bib
Phrase-based Compressive Cross-Language Summarization
Jin-ge Yao | Xiaojun Wan | Jianguo Xiao
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Peking: Building Semantic Dependency Graphs with a Hybrid Parser
Yantao Du | Fan Zhang | Xun Zhang | Weiwei Sun | Xiaojun Wan
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf bib
A Data-Driven, Factorization Parser for CCG Dependency Structures
Yantao Du | Weiwei Sun | Xiaojun Wan
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf bib
BrailleSUM: A News Summarization System for the Blind and Visually Impaired People
Xiaojun Wan | Yue Hu
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

2014

pdf bib
Peking: Profiling Syntactic Tree Parsing Techniques for Semantic Graph Parsing
Yantao Du | Fan Zhang | Weiwei Sun | Xiaojun Wan
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf bib
Automatic Generation of Related Work Sections in Scientific Papers: An Optimization Approach
Yue Hu | Xiaojun Wan
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Joint Decoding of Tree Transduction Models for Sentence Compression
Jin-ge Yao | Xiaojun Wan | Jianguo Xiao
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Grammatical Relations in Chinese: GB-Ground Extraction and Data-Driven Parsing
Weiwei Sun | Yantao Du | Xin Kou | Shuoyang Ding | Xiaojun Wan
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2013

pdf bib
Collective Opinion Target Extraction in Chinese Microblogs
Xinjie Zhou | Xiaojun Wan | Jianguo Xiao
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
Capturing Long-distance Dependencies in Sequence Models: A Case Study of Chinese Part-of-speech Tagging
Weiwei Sun | Xiaochang Peng | Xiaojun Wan
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
Data-driven, PCFG-based and Pseudo-PCFG-based Models for Chinese Dependency Parsing
Weiwei Sun | Xiaojun Wan
Transactions of the Association for Computational Linguistics, Volume 1

We present a comparative study of transition-, graph- and PCFG-based models aimed at illuminating more precisely the likely contribution of CFGs in improving Chinese dependency parsing accuracy, especially by combining heterogeneous models. Inspired by the impact of a constituency grammar on dependency parsing, we propose several strategies to acquire pseudo CFGs only from dependency annotations. Compared to linguistic grammars learned from rich phrase-structure treebanks, well designed pseudo grammars achieve similar parsing accuracy and have equivalent contributions to parser ensemble. Moreover, pseudo grammars increase the diversity of base models; therefore, together with all other models, further improve system combination. Based on automatic POS tagging, our final model achieves a UAS of 87.23%, resulting in a significant improvement of the state of the art.

pdf bib
Learning to Order Natural Language Texts
Jiwei Tan | Xiaojun Wan | Jianguo Xiao
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Co-Regression for Cross-Language Review Rating Prediction
Xiaojun Wan
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2012

pdf bib
Proceedings of the First Workshop on Multilingual Modeling
Jagadeesh Jagarlamudi | Sujith Ravi | Xiaojun Wan | Hal Daume III
Proceedings of the First Workshop on Multilingual Modeling

pdf bib
Update Summarization Based on Co-Ranking with Constraints
Xiaojun Wan
Proceedings of COLING 2012: Posters

pdf bib
Reducing Approximation and Estimation Errors for Chinese Lexical Processing with Heterogeneous Annotations
Weiwei Sun | Xiaojun Wan
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2011

pdf bib
Bilingual Co-Training for Sentiment Classification of Chinese Product Reviews
Xiaojun Wan
Computational Linguistics, Volume 37, Issue 3 - September 2011

pdf bib
Timeline Generation through Evolutionary Trans-Temporal Summarization
Rui Yan | Liang Kong | Congrui Huang | Xiaojun Wan | Xiaoming Li | Yan Zhang
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf bib
Named Entity Recognition in Chinese News Comments on the Web
Xiaojun Wan | Liang Zong | Xiaojiang Huang | Tengfei Ma | Houping Jia | Yuqian Wu | Jianguo Xiao
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
Using Bilingual Information for Cross-Language Document Summarization
Xiaojun Wan
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Comparative News Summarization Using Linear Programming
Xiaojiang Huang | Xiaojun Wan | Jianguo Xiao
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2010

pdf bib
CRF-based Experiments for Cross-Domain Chinese Word Segmentation at CIPS-SIGHAN-2010
Xiao Qin | Liang Zong | Yuqian Wu | Xiaojun Wan | Jianwu Yang
CIPS-SIGHAN Joint Conference on Chinese Language Processing

pdf bib
Cross-Language Document Summarization Based on Machine Translation Quality Prediction
Xiaojun Wan | Huiying Li | Jianguo Xiao
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf bib
Towards a Unified Approach to Simultaneous Single-Document and Multi-Document Summarizations
Xiaojun Wan
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

pdf bib
Opinion Target Extraction in Chinese News Comments
Tengfei Ma | Xiaojun Wan
Coling 2010: Posters

2009

pdf bib
Co-Training for Cross-Lingual Sentiment Classification
Xiaojun Wan
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

2008

pdf bib
CollabRank: Towards a Collaborative Approach to Single-Document Keyphrase Extraction
Xiaojun Wan | Jianguo Xiao
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

pdf bib
Using Bilingual Knowledge and Ensemble Techniques for Unsupervised Chinese Sentiment Analysis
Xiaojun Wan
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

pdf bib
An Exploration of Document Impact on Graph-Based Multi-Document Summarization
Xiaojun Wan
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

2007

pdf bib
Towards an Iterative Reinforcement Approach for Simultaneous Document Summarization and Keyword Extraction
Xiaojun Wan | Jianwu Yang | Jianguo Xiao
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

2006

pdf bib
Improved Affinity Graph Based Multi-Document Summarization
Xiaojun Wan | Jianwu Yang
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers