Bowei Zou - ACL Anthology

Bowei Zou

Also published as: 博伟邹

2025

Enhancing Event-centric News Cluster Summarization via Data Sharpening and Localization Insights
Longyin Zhang | Bowei Zou | AiTi Aw
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

This paper tackles the challenges of clustering news articles by main events (MEs) and summarizing these clusters, focusing on diverse languages and localized contexts. Our approach consists of four key contributions. First, we investigate the role of dynamic clustering and the integration of various ME references, including event attributions extracted by language models (LMs), in enhancing event-centric clustering. Second, we propose a data-sharpening framework that optimizes the balance between information volume and entropy in input texts, thereby optimizing generated summaries on multiple indicators. Third, we fine-tune LMs with local news articles for cross-lingual temporal question-answering and text summarization, achieving notable improvements in capturing localized contexts. Lastly, we present the first cross-lingual dataset and comprehensive evaluation metrics tailored for the event-centric news cluster summarization pipeline. Our findings enhance the understanding of news summarization across N-gram, event-level coverage, and faithfulness, providing new insights into leveraging LMs for large-scale cross-lingual and localized news analysis.

Improving Explainable Fact-Checking with Claim-Evidence Correlations
Xin Tan | Bowei Zou | Ai Ti Aw
Proceedings of the 31st International Conference on Computational Linguistics

Automatic fact-checking systems that employ large language models (LLMs) have achieved human-level performance in combating widespread misinformation. However, current LLM-based fact-checking systems fail to reveal the reasoning principles behind their decision-making for the claim verdict. In this work, we propose Correlation-Enhanced Explainable Fact-Checking (CorXFact), an LLM-based fact-checking system that simulates the reasoning principle of human fact-checkers for evidence-based claim verification: assessing and weighing the correlations between the claim and each piece of evidence. Following this principle, CorXFact enables efficient claim verification and transparent explanation generation. Furthermore, we contribute the CorFEVER test set to comprehensively evaluate the CorXFact system in claim-evidence correlation identification and claim verification in both closed-domain and real-world fact-checking scenarios. Experimental results show that our proposed CorXFact significantly outperforms four strong fact-checking baselines in claim authenticity prediction and verdict explanation.

CCL-XCoT: An Efficient Cross-Lingual Knowledge Transfer Method for Mitigating Hallucination Generation
Zheng Weihua | Roy Ka-Wei Lee | Zhengyuan Liu | Wu Kui | AiTi Aw | Bowei Zou
Findings of the Association for Computational Linguistics: EMNLP 2025

Multilingual Large Language Models (MLLMs) demonstrate strong generalization across languages, yet they remain prone to hallucinations, especially in low-resource languages, due to training data imbalances. These hallucinations, which include inaccurate or fabricated outputs, are particularly problematic in domain-specific generation tasks (Chataigner et al., 2024). To address this challenge, we propose CCL-XCoT (Curriculum-based Contrastive Learning-based Cross-lingual Chain-of-Thought), a two-stage fine-tuning framework for mitigating hallucination in MLLMs. Our approach first enhances cross-lingual semantic alignment through curriculum-based contrastive learning combined with next-token prediction during continued pre-training. Building on this foundation, we then introduce a cross-lingual Chain-of-Thought (XCoT) prompting strategy during instruction fine-tuning, which guides the model to reason in a high-resource language before generating answers in the target low-resource language. Experimental results show that CCL-XCoT reduces hallucination rates by up to 62% and substantially improves factual knowledge transfer across language pairs, without relying on external retrieval or multi-model ensembles.

A Benchmark for Translations Across Styles and Language Variants
Xin Tan | Bowei Zou | AiTi Aw
Findings of the Association for Computational Linguistics: EMNLP 2025

As machine translation (MT) rapidly advances in bridging global communication gaps, there is growing interest in variety-targeted translation for fine-grained language variants and specific translation styles. This translation variant aims to generate target outputs that are not only contextually accurate but also culturally sensitive. However, the lack of comprehensive evaluation benchmarks has hindered progress in this field. To bridge this gap, this work focuses on the translation across styles and language variants, aiming to establish a robust foundation for the automatic evaluation of fine-grained cultural and stylistic nuances, thereby fostering innovation in culturally sensitive translations. Specifically, we evaluate translations across four key dimensions: semantic preservation, cultural and regional specificity, expression style, and fluency at both the word and sentence levels. Through detailed human evaluations, we validate the high reliability of the proposed evaluation framework. On this basis, we thoroughly assess translations of state-of-the-art large language models (LLMs) for this task, highlighting their strengths and identifying areas for future improvement.

Enhancing Attributed Question Answering using Tailored Progressive Curriculum Learning
Yuhan Chen | Bowei Zou | Yifan Fan | Yuchong Chen | Shujun Cao | Yu Hong
Findings of the Association for Computational Linguistics: EMNLP 2025

We study Attributed Question Answering (abbr., AQA), a newly-released long-form answer generation task. The tailored and efficient training programmes haven’t yet been leveraged to strengthen AQA models. This hinders the simultaneous enhancement of their essential capabilities, including evidence identification, cross-source relation recognition and anti-distraction reasoning. To address the issue, we propose a tailored progressive curriculum learning approach, and use it to optimize both encoder-decoder and decoder-only AQA models. Experiments on the benchmark QuoteSum show that our approach yields substantial improvements and enables the AQA performance to reach 73.9% Sem-F1 score.

2024

Comprehensive Abstractive Comment Summarization with Dynamic Clustering and Chain of Thought
Longyin Zhang | Bowei Zou | Jacintha Yi | AiTi Aw
Findings of the Association for Computational Linguistics: ACL 2024

Real-world news comments pose a significant challenge due to their noisy and ambiguous nature, which complicates their modeling for clustering and summarization tasks. Most previous research has predominantly focused on extractive summarization methods within specific constraints. This paper concentrates on Clustering and Abstractive Summarization of online news Comments (CASC). First, we introduce an enhanced fast clustering algorithm that maintains a dynamic similarity threshold to ensure the high density of each comment cluster being built. Moreover, we pioneer the exploration of tuning Large Language Models (LLMs) through a chain-of-thought strategy to generate summaries for each comment cluster. On the other hand, a notable challenge in CASC research is the scarcity of evaluation data. To address this problem, we design an annotation scheme and contribute a manual test suite tailored for CASC. Experimental results on the test suite demonstrate the effectiveness of our improvements to the baseline methods. In addition, the quantitative and qualitative analyses illustrate the adaptability of our approach to real-world news comment scenarios.

CLFFRD: Curriculum Learning and Fine-grained Fusion for Multimodal Rumor Detection
Fan Xu | Lei Zeng | Bowei Zou | Ai Ti Aw | Huan Rong
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

In an era where rumors can propagate rapidly across social media platforms such as Twitter and Weibo, automatic rumor detection has garnered considerable attention from both academia and industry. Existing multimodal rumor detection models often overlook the intricacies of sample difficulty, e.g., text-level difficulty, image-level difficulty, and multimodal-level difficulty, as well as their order when training. Inspired by the concept of curriculum learning, we propose the Curriculum Learning and Fine-grained Fusion-driven multimodal Rumor Detection (CLFFRD) framework, which employs curriculum learning to automatically select and train samples according to their difficulty at different training stages. Furthermore, we introduce a fine-grained fusion strategy that unifies entities from text and objects from images, enhancing their semantic cohesion. We also propose a novel data augmentation method that utilizes linear interpolation between textual and visual modalities to generate diverse data. Additionally, our approach incorporates deep fusion for both intra-modality (e.g., text entities and image objects) and inter-modality (e.g., CLIP and social graph) features. Extensive experimental results demonstrate that CLFFRD outperforms state-of-the-art models on both English and Chinese benchmark datasets for rumor detection in social media.

Empowering Tree-structured Entailment Reasoning: Rhetorical Perception and LLM-driven Interpretability
Longyin Zhang | Bowei Zou | Ai Ti Aw
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

The study delves into the construction of entailment trees for science question answering (SQA), employing a novel framework termed Tree-structured Entailment Reasoning (TER). Current research on entailment tree construction presents significant challenges, primarily due to the ambiguities and similarities among candidate science facts, which considerably complicate the fact retrieval process. Moreover, the existing models exhibit limitations in effectively modeling the sequence of reasoning states, understanding the intricate relations between neighboring entailment tree nodes, and generating intermediate conclusions. To this end, we explore enhancing the TER performance from three aspects: First, improving retrieval capabilities by modeling and referring to the chained reasoning states; Second, enhancing TER by infusing knowledge that bridges the gap between reasoning types and rhetorical relations. Third, exploring a task-specific large language model tuning scheme to mitigate deficiencies in intermediate conclusion generation. Experiments on the English EntailmentBank demonstrate the effectiveness of the proposed methods in augmenting the quality of tree-structured entailment reasoning to a certain extent.

An NLP-Focused Pilot Training Agent for Safe and Efficient Aviation Communication
Xiaochen Liu | Bowei Zou | AiTi Aw
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 6: Industry Track)

Aviation communication significantly influences the success of flight operations, ensuring safety of lives and efficient air transportation. In day-to-day flight operations, air traffic controllers (ATCos) would timely communicate instructions to pilots using specific phraseology for aircraft manipulation . However, pilots, originating from diverse backgrounds and understanding of English language, have struggled with conforming to strict phraseology for readback and communication in the live operation, this problem had not been effectively addressed over the past decades. Traditionally, aviation communication training involved expensive setups and resources, often relying on human-in-the-loop (HIL) air traffic simulations that demand allocating a specific environment, domain experts for participation, and substantial amount of annotated data for simulation. Therefore, we would like to propose an NLP-oriented training agent and address these challenges. Our approach involves leveraging only natural language capabilities and fine-tuning on communication data to generate instructions based on input scenarios (keywords). Given the absence of prior references for this business problem, we investigated the feasibility of our proposed solution by 1) generating all instructions at once and 2) generating one instruction while incorporating conversational history in each input. Our findings affirm the feasibility of this approach, highlighting the effectiveness of fine-tuning pre-trained models and large language models in advancing aviation communication training.

2023

Modeling What-to-ask and How-to-ask for Answer-unaware Conversational Question Generation
Xuan Long Do | Bowei Zou | Shafiq Joty | Tran Tai | Liangming Pan | Nancy Chen | Ai Ti Aw
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Conversational Question Generation (CQG) is a critical task for machines to assist humans in fulfilling their information needs through conversations. The task is generally cast into two different settings: answer-aware and answer-unaware. While the former facilitates the models by exposing the expected answer, the latter is more realistic and receiving growing attentions recently. What-to-ask and how-to-ask are the two main challenges in the answer-unaware setting. To address the first challenge, existing methods mainly select sequential sentences in context as the rationales. We argue that the conversation generated using such naive heuristics may not be natural enough as in reality, the interlocutors often talk about the relevant contents that are not necessarily sequential in context. Additionally, previous methods decide the type of question to be generated (boolean/span-based) implicitly. Modeling the question type explicitly is crucial as the answer, which hints the models to generate a boolean or span-based question, is unavailable. To this end, we present SG-CQG, a two-stage CQG framework. For the what-to-ask stage, a sentence is selected as the rationale from a semantic graph that we construct, and extract the answer span from it. For the how-to-ask stage, a classifier determines the target answer type of the question via two explicit control signals before generating and filtering. In addition, we propose Conv-Distinct, a novel evaluation metric for CQG, to evaluate the diversity of the generated conversation from a context. Compared with the existing answer-unaware CQG models, the proposed SG-CQG achieves state-of-the-art performance.

Ensemble Method via Ranking Model for Conversational Modeling with Subjective Knowledge
Xin Huang | Kye Min Tan | Richeng Duan | Bowei Zou
Proceedings of the Eleventh Dialog System Technology Challenge

This paper describes our submission to the fifth track of the 11th Dialog System Technology Challenge (DSTC-11), which focuses on “Task-oriented Conversational Modeling with Subjective Knowledge”. We focus on response generation and leverage a ranking strategy to ensemble individual models of BART, Long-T5, and a fine-tuned large language model based on LLaMA. The strategy is supplemented by other techniques like low rank adaptation to maintain efficient utilization of these large models while still achieving optimal performance. The experiments show that the ensemble method outperforms individual models and the baseline method. Our model was ranked 1st place in ROUGE_1, 2nd place in ROUGE_L score and 4th place in human evaluation among a total of 14 participating teams.

Interview Evaluation: A Novel Approach for Automatic Evaluation of Conversational Question Answering Models
Xibo Li | Bowei Zou | Yifan Fan | Yanling Li | Ai Ti Aw | Yu Hong
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Conversational Question Answering (CQA) aims to provide natural language answers to users in information-seeking dialogues. Existing CQA benchmarks often evaluate models using pre-collected human-human conversations. However, replacing the model-predicted dialogue history with ground truth compromises the naturalness and sustainability of CQA evaluation. While previous studies proposed using predicted history and rewriting techniques to address unresolved coreferences and incoherencies, this approach renders the question self-contained from the conversation. In this paper, we propose a novel automatic evaluation approach, interview evaluation. Specifically, ChatGPT acts as the interviewer (Q agent) with a set of carefully designed prompts, and the CQA model under test serves as the interviewee (A agent). During the interview evaluation, questions are dynamically generated by the Q agent to guide the A agent in predicting the correct answer through an interactive process. We evaluated four different models on QuAC and two models on CoQA in our experiments. The experiment results demonstrate that our interview evaluation has advantages over previous CQA evaluation approaches, particularly in terms of naturalness and coherence. The source code is made publicly available.

DSPM-NLG: A Dual Supervised Pre-trained Model for Few-shot Natural Language Generation in Task-oriented Dialogue System
Yufan Wang | Bowei Zou | Rui Fan | Ai Ti Aw | Tingting He
Findings of the Association for Computational Linguistics: ACL 2023

In few-shot settings, fully conveying the semantic information of the dialogue act is a crucial challenge for Natural Language Generation (NLG) in the task-oriented dialogue system. An interesting fact is that NLG and Spoken Language Understanding (SLU) are a natural dual problem pair. Suppose the response generated by the NLG module can be restored to the corresponding dialogue act by the SLU module, which reflects that the generated response fully conveys the semantic information of the dialogue act. Based on this idea, a novel Dual Supervised Pre-trained Model for a few-shot Natural Language Generation (DSPM-NLG) is proposed to regularize the pre-training process. We adopt a joint model with a dual supervised framework to learn the dual correlation between NLG and SLU from the perspective of probability. In addition, a slot-masked strategy is designed to enable the model to focus better on the key slot-value pairs. DSPM-NLG is continuously trained on existing public large-scale annotated data, which thoroughly learns the duality between two tasks to enhance the semantically controlling and generalization abilities of the pre-trained model. Experiments demonstrate that our proposed model performs outstandingly on the few-shot benchmark dataset and outperforms the previous SOTA results.

Making Pre-trained Language Models Better Learn Few-Shot Spoken Language Understanding in More Practical Scenarios
Yufan Wang | Jie Mei | Bowei Zou | Rui Fan | Tingting He | Ai Ti Aw
Findings of the Association for Computational Linguistics: ACL 2023

Most previous few-shot Spoken Language Understanding (SLU) models typically need to be trained on a set of data-rich source domains and adapt to the target domain with a few examples. In this paper, we explore a more practical scenario for few-shot SLU, in which we only assume access to a pre-trained language model and a few labeled examples without any other source domain data. We concentrate on understanding how far the few-shot SLU could be pushed in this setting. To this end, we develop a prompt-based intent detection model in few-shot settings, which leverages the BERT original pre-training next sentence prediction task and the prompt template to detect the user’s intent. For slot filling, we propose an approach of reconstructing slot labels, which reduces the training complexity by reducing the number of slot labels in few-shot settings. To evaluate the few-shot SLU for a more practical scenario, we present two benchmarks, FewShotATIS and FewShotSNIPS. And a dynamic sampling strategy is designed to construct the two datasets according to the learning difficulty of each intent and slot. Experiments on FewShotATIS and FewShotSNIPS demonstrate that our proposed model achieves state-of-the-art performance.

GLGR: Question-aware Global-to-Local Graph Reasoning for Multi-party Dialogue Reading Comprehension
Yanling Li | Bowei Zou | Yifan Fan | Xibo Li | Ai Ti Aw | Yu Hong
Findings of the Association for Computational Linguistics: EMNLP 2023

Graph reasoning contributes to the integration of discretely-distributed attentive information (clues) for Multi-party Dialogue Reading Comprehension (MDRC). This is attributed primarily to multi-hop reasoning over global conversational structures. However, existing approaches barely apply questions for anti-noise graph reasoning. More seriously, the local semantic structures in utterances are neglected, although they are beneficial for bridging across semantically-related clues. In this paper, we propose a question-aware global-to-local graph reasoning approach. It expands the canonical Interlocutor-Utterance graph by introducing a question node, enabling comprehensive global graph reasoning. More importantly, it constructs a semantic-role graph for each utterance, and accordingly performs local graph reasoning conditioned on the semantic relations. We design a two-stage encoder network to implement the progressive reasoning from the global graph to local. The experiments on the benchmark datasets Molweni and FriendsQA show that our approach yields significant improvements, compared to BERT and ELECTRA baselines. It achieves 73.6% and 77.2% F1-scores on Molweni and FriendsQA, respectively, outperforming state-of-the-art methods that employ different pretrained language models as backbones.

Leveraging Contrastive Learning and Knowledge Distillation for Incomplete Modality Rumor Detection
Fan Xu | Pinyun Fu | Qi Huang | Bowei Zou | AiTi Aw | Mingwen Wang
Findings of the Association for Computational Linguistics: EMNLP 2023

Rumors spread rapidly through online social microblogs at a relatively low cost, causing substantial economic losses and negative consequences in our daily lives. Existing rumor detection models often neglect the underlying semantic coherence between text and image components in multimodal posts, as well as the challenges posed by incomplete modalities in single modal posts, such as missing text or images. This paper presents CLKD-IMRD, a novel framework for Incomplete Modality Rumor Detection. CLKD-IMRD employs Contrastive Learning and Knowledge Distillation to capture the semantic consistency between text and image pairs, while also enhancing model generalization to incomplete modalities within individual posts. Extensive experimental results demonstrate that our CLKD-IMRD outperforms state-of-the-art methods on two English and two Chinese benchmark datasets for rumor detection in social media.

2022

Automatic True/False Question Generation for Educational Purpose
Bowei Zou | Pengfei Li | Liangming Pan | Ai Ti Aw
Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022)

In field of teaching, true/false questioning is an important educational method for assessing students’ general understanding of learning materials. Manually creating such questions requires extensive human effort and expert knowledge. Question Generation (QG) technique offers the possibility to automatically generate a large number of questions. However, there is limited work on automatic true/false question generation due to the lack of training data and difficulty finding question-worthy content. In this paper, we propose an unsupervised True/False Question Generation approach (TF-QG) that automatically generates true/false questions from a given passage for reading comprehension test. TF-QG consists of a template-based framework that aims to test the specific knowledge in the passage by leveraging various NLP techniques, and a generative framework to generate more flexible and complicated questions by using a novel masking-and-infilling strategy. Human evaluation shows that our approach can generate high-quality and valuable true/false questions. In addition, simulated testing on the generated questions challenges the state-of-the-art inference models from NLI, QA, and fact verification tasks.

CoHS-CQG: Context and History Selection for Conversational Question Generation
Xuan Long Do | Bowei Zou | Liangming Pan | Nancy F. Chen | Shafiq Joty | Ai Ti Aw
Proceedings of the 29th International Conference on Computational Linguistics

Conversational question generation (CQG) serves as a vital task for machines to assist humans, such as interactive reading comprehension, through conversations. Compared to traditional single-turn question generation (SQG), CQG is more challenging in the sense that the generated question is required not only to be meaningful, but also to align with the provided conversation. Previous studies mainly focus on how to model the flow and alignment of the conversation, but do not thoroughly study which parts of the context and history are necessary for the model. We believe that shortening the context and history is crucial as it can help the model to optimise more on the conversational alignment property. To this end, we propose CoHS-CQG, a two-stage CQG framework, which adopts a novel CoHS module to shorten the context and history of the input. In particular, it selects the top-p sentences and history turns by calculating the relevance scores of them. Our model achieves state-of-the-art performances on CoQA in both the answer-aware and answer-unaware settings.

Capturing Conversational Interaction for Question Answering via Global History Reasoning
Jin Qian | Bowei Zou | Mengxing Dong | Xiao Li | AiTi Aw | Yu Hong
Findings of the Association for Computational Linguistics: NAACL 2022

Conversational Question Answering (ConvQA) is required to answer the current question, conditioned on the observable paragraph-level context and conversation history. Previous works have intensively studied history-dependent reasoning. They perceive and absorb topic-related information of prior utterances in the interactive encoding stage. It yielded significant improvement compared to history-independent reasoning. This paper further strengthens the ConvQA encoder by establishing long-distance dependency among global utterances in multi-turn conversation. We use multi-layer transformers to resolve long-distance relationships, which potentially contribute to the reweighting of attentive information in historical utterances. Experiments on QuAC show that our method obtains a substantial improvement (1%), yielding the F1 score of 73.7%. All source codes are available at https://github.com/jaytsien/GHR.

2021

Unseen Entity Handling in Complex Question Answering over Knowledge Base via Language Generation
Xin Huang | Jung-Jae Kim | Bowei Zou
Findings of the Association for Computational Linguistics: EMNLP 2021

Complex question answering over knowledge base remains as a challenging task because it involves reasoning over multiple pieces of information, including intermediate entities/relations and other constraints. Previous methods simplify the SPARQL query of a question into such forms as a list or a graph, missing such constraints as “filter” and “order_by”, and present models specialized for generating those simplified forms from a given question. We instead introduce a novel approach that directly generates an executable SPARQL query without simplification, addressing the issue of generating unseen entities. We adapt large scale pre-trained encoder-decoder models and show that our method significantly outperforms the previous methods and also that our method has higher interpretability and computational efficiency than the previous methods.

Winnowing Knowledge for Multi-choice Question Answering
Yeqiu Li | Bowei Zou | Zhifeng Li | Ai Ti Aw | Yu Hong | Qiaoming Zhu
Findings of the Association for Computational Linguistics: EMNLP 2021

We tackle multi-choice question answering. Acquiring related commonsense knowledge to the question and options facilitates the recognition of the correct answer. However, the current reasoning models suffer from the noises in the retrieved knowledge. In this paper, we propose a novel encoding method which is able to conduct interception and soft filtering. This contributes to the harvesting and absorption of representative information with less interference from noises. We experiment on CommonsenseQA. Experimental results illustrate that our method yields substantial and consistent improvements compared to the strong Bert, RoBERTa and Albert-based baselines.

2020

Don’t Eclipse Your Arts Due to Small Discrepancies: Boundary Repositioning with a Pointer Network for Aspect Extraction
Zhenkai Wei | Yu Hong | Bowei Zou | Meng Cheng | Jianmin Yao
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

The current aspect extraction methods suffer from boundary errors. In general, these errors lead to a relatively minor difference between the extracted aspects and the ground-truth. However, they hurt the performance severely. In this paper, we propose to utilize a pointer network for repositioning the boundaries. Recycling mechanism is used, which enables the training data to be collected without manual intervention. We conduct the experiments on the benchmark datasets SE14 of laptop and SE14-16 of restaurant. Experimental results show that our method achieves substantial improvements over the baseline, and outperforms state-of-the-art methods.

基于多任务学习的生成式阅读理解(Generative Reading Comprehension via Multi-task Learning)
Jin Qian (钱锦) | Rongtao Huang (黄荣涛) | Bowei Zou (邹博伟) | Yu Hong (洪宇)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

生成式阅读理解是机器阅读理解领域一项新颖且极具挑战性的研究。与主流的抽取式阅读理解相比,生成式阅读理解模型不再局限于从段落中抽取答案,而是能结合问题和段落生成自然和完整的表述作为答案。然而,现有的生成式阅读理解模型缺乏对答案在段落中的边界信息以及对问题类型信息的理解。为解决上述问题,本文提出一种基于多任务学习的生成式阅读理解模型。该模型在训练阶段将答案生成任务作为主任务,答案抽取和问题分类任务作为辅助任务进行多任务学习,同时学习和优化模型编码层参数;在测试阶段加载模型编码层进行解码生成答案。实验结果表明,答案抽取模型和问题分类模型能够有效提升生成式阅读理解模型的性能。

汉语否定焦点识别研究:数据集与基线系统(Research on Chinese Negative Focus Identification: Dataset and Baseline)
Jiaxuan Sheng (盛佳璇) | Bowei Zou (邹博伟) | Longxiang Shen (沈龙骧) | Jing Ye (叶静) | Yu Hong (洪宇)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

自然语言文本中存在大量否定语义表达,否定焦点识别任务作为更细粒度的否定语义分析,近年来开始受到自然语言处理学者的关注。该任务旨在识别句子中被否定词修饰和强调的文本片段,其对自然语言处理的下游任务,如情感分析、观点挖掘等具有重要意义。与英语相比,目前面向汉语的否定焦点识别研究彶展缓慢,其主要原因是尚未有中文数据集为模型提供训练和测试数据。为解决上述问题,本文在汉语否定与不确定语料库上进行了否定焦点的标注工作,初步探索了否定焦点在汉语上的语言现象,并构建了一个包含5,762个样本的数据集。同时,本文还提出了一个基于神经网络模型的基线系统,为后续相关研究提供参照。

Multi-grained Chinese Word Segmentation with Weakly Labeled Data
Chen Gong | Zhenghua Li | Bowei Zou | Min Zhang
Proceedings of the 28th International Conference on Computational Linguistics

In contrast with the traditional single-grained word segmentation (SWS), where a sentence corresponds to a single word sequence, multi-grained Chinese word segmentation (MWS) aims to segment a sentence into multiple word sequences to preserve all words of different granularities. Due to the lack of manually annotated MWS data, previous work train and tune MWS models only on automatically generated pseudo MWS data. In this work, we further take advantage of the rich word boundary information in existing SWS data and naturally annotated data from dictionary example (DictEx) sentences, to advance the state-of-the-art MWS model based on the idea of weak supervision. Particularly, we propose to accommodate two types of weakly labeled data for MWS, i.e., SWS data and DictEx data by employing a simple yet competitive graph-based parser with local loss. Besides, we manually annotate a high-quality MWS dataset according to our newly compiled annotation guideline, consisting of over 9,000 sentences from two types of texts, i.e., canonical newswire (NEWS) and non-canonical web (BAIKE) data for better evaluation. Detailed evaluation shows that our proposed model with weakly labeled data significantly outperforms the state-of-the-art MWS model by 1.12 and 5.97 on NEWS and BAIKE data in F1.

NUT-RC: Noisy User-generated Text-oriented Reading Comprehension
Rongtao Huang | Bowei Zou | Yu Hong | Wei Zhang | AiTi Aw | Guodong Zhou
Proceedings of the 28th International Conference on Computational Linguistics

Reading comprehension (RC) on social media such as Twitter is a critical and challenging task due to its noisy, informal, but informative nature. Most existing RC models are developed on formal datasets such as news articles and Wikipedia documents, which severely limit their performances when directly applied to the noisy and informal texts in social media. Moreover, these models only focus on a certain type of RC, extractive or generative, but ignore the integration of them. To well address these challenges, we come up with a noisy user-generated text-oriented RC model. In particular, we first introduce a set of text normalizers to transform the noisy and informal texts to the formal ones. Then, we integrate the extractive and the generative RC model by a multi-task learning mechanism and an answer selection module. Experimental results on TweetQA demonstrate that our NUT-RC model significantly outperforms the state-of-the-art social media-oriented RC models.

GCDST: A Graph-based and Copy-augmented Multi-domain Dialogue State Tracking
Peng Wu | Bowei Zou | Ridong Jiang | AiTi Aw
Findings of the Association for Computational Linguistics: EMNLP 2020

As an essential component of task-oriented dialogue systems, Dialogue State Tracking (DST) takes charge of estimating user intentions and requests in dialogue contexts and extracting substantial goals (states) from user utterances to help the downstream modules to determine the next actions of dialogue systems. For practical usages, a major challenge to constructing a robust DST model is to process a conversation with multi-domain states. However, most existing approaches trained DST on a single domain independently, ignoring the information across domains. To tackle the multi-domain DST task, we first construct a dialogue state graph to transfer structured features among related domain-slot pairs across domains. Then, we encode the graph information of dialogue states by graph convolutional networks and utilize a hard copy mechanism to directly copy historical states from the previous conversation. Experimental results show that our model improves the performances of the multi-domain DST baseline (TRADE) with the absolute joint accuracy of 2.0% and 1.0% on the MultiWOZ 2.0 and 2.1 dialogue datasets, respectively.

2019

Negative Focus Detection via Contextual Attention Mechanism
Longxiang Shen | Bowei Zou | Yu Hong | Guodong Zhou | Qiaoming Zhu | AiTi Aw
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Negation is a universal but complicated linguistic phenomenon, which has received considerable attention from the NLP community over the last decade, since a negated statement often carries both an explicit negative focus and implicit positive meanings. For the sake of understanding a negated statement, it is critical to precisely detect the negative focus in context. However, how to capture contextual information for negative focus detection is still an open challenge. To well address this, we come up with an attention-based neural network to model contextual information. In particular, we introduce a framework which consists of a Bidirectional Long Short-Term Memory (BiLSTM) neural network and a Conditional Random Fields (CRF) layer to effectively encode the order information and the long-range context dependency in a sentence. Moreover, we design two types of attention mechanisms, word-level contextual attention and topic-level contextual attention, to take advantage of contextual information across sentences from both the word perspective and the topic perspective, respectively. Experimental results on the SEM’12 shared task corpus show that our approach achieves the best performance on negative focus detection, yielding an absolute improvement of 2.11% over the state-of-the-art. This demonstrates the great effectiveness of the two types of contextual attention mechanisms.

2018

Incorporating Image Matching Into Knowledge Acquisition for Event-Oriented Relation Recognition
Yu Hong | Yang Xu | Huibin Ruan | Bowei Zou | Jianmin Yao | Guodong Zhou
Proceedings of the 27th International Conference on Computational Linguistics

Event relation recognition is a challenging language processing task. It is required to determine the relation class of a pair of query events, such as causality, under the condition that there isn’t any reliable clue for use. We follow the traditional statistical approach in this paper, speculating the relation class of the target events based on the relation-class distributions on the similar events. There is minimal supervision used during the speculation process. In particular, we incorporate image processing into the acquisition of similar event instances, including the utilization of images for visually representing event scenes, and the use of the neural network based image matching for approximate calculation between events. We test our method on the ACE-R2 corpus and compared our model with the fully-supervised neural network models. Experimental results show that we achieve a comparable performance to CNN while slightly better than LSTM.

Adversarial Feature Adaptation for Cross-lingual Relation Classification
Bowei Zou | Zengzhuang Xu | Yu Hong | Guodong Zhou
Proceedings of the 27th International Conference on Computational Linguistics

Relation Classification aims to classify the semantic relationship between two marked entities in a given sentence. It plays a vital role in a variety of natural language processing applications. Most existing methods focus on exploiting mono-lingual data, e.g., in English, due to the lack of annotated data in other languages. In this paper, we come up with a feature adaptation approach for cross-lingual relation classification, which employs a generative adversarial network (GAN) to transfer feature representations from one language with rich annotated data to another language with scarce annotated data. Such a feature adaptation approach enables feature imitation via the competition between a relation classification network and a rival discriminator. Experimental results on the ACE 2005 multilingual training corpus, treating English as the source language and Chinese the target, demonstrate the effectiveness of our proposed approach, yielding an improvement of 5.7% over the state-of-the-art.

2015

Unsupervised Negation Focus Identification with Word-Topic Graph Model
Bowei Zou | Guodong Zhou | Qiaoming Zhu
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

Negation and Speculation Identification in Chinese Language
Bowei Zou | Qiaoming Zhu | Guodong Zhou
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

2014

Negation Focus Identification with Contextual Discourse Information
Bowei Zou | Guodong Zhou | Qiaoming Zhu
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2013

Tree Kernel-based Negation and Speculation Scope Detection with Structured Syntactic Parse Features
Bowei Zou | Guodong Zhou | Qiaoming Zhu
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

Co-authors

Liangming Pan 3

Longyin Zhang 3

Rongtao Huang 2

Longxiang Shen 2

Mengxing Dong 1

Chen Gong (龚晨) 1

Roy Ka-Wei Lee 1

Zhenghua Li (李正华) 1

Zhengyuan Liu 1

Jiaxuan Sheng 1

Mingwen Wang (王明文) 1

Zengzhuang Xu 1

Venues