Wenhan Chao

Also published as: Wen-Han Chao, WenHan Chao


pdf bib
A Quantitative Approach to Understand Self-Supervised Models as Cross-lingual Feature Extracters
Shuyue Stella Li | Beining Xu | Xiangyu Zhang | Hexin Liu | Wenhan Chao | Paola Garcia
Proceedings of the 6th International Conference on Natural Language and Speech Processing (ICNLSP 2023)

pdf bib
Characterizing and Verifying Scientific Claims: Qualitative Causal Structure is All You Need
Jinxuan Wu | Wenhan Chao | Xian Zhou | Zhunchen Luo
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

A scientific claim typically begins with the formulation of a research question or hypothesis, which is a tentative statement or proposition about a phenomenon or relationship between variables. Within the realm of scientific claim verification, considerable research efforts have been dedicated to attention architectures and leveraging the text comprehension capabilities of Pre-trained Language Models (PLMs), yielding promising performances. However, these models overlook the causal structure information inherent in scientific claims, thereby failing to establish a comprehensive chain of causal inference. This paper delves into the exploration to highlight the crucial role of qualitative causal structure in characterizing and verifying scientific claims based on evidence. We organize the qualitative causal structure into a heterogeneous graph and propose a novel attention-based graph neural network model to facilitate causal reasoning across relevant causally-potent factors. Our experiments demonstrate that by solely utilizing the qualitative causal structure, the proposed model achieves comparable performance to PLM-based models. Furthermore, by incorporating semantic features, our model outperforms state-of-the-art approaches comprehensively.

pdf bib
TP-Detector: Detecting Turning Points in the Engineering Process of Large-scale Projects
Qi Wu | WenHan Chao | Xian Zhou | Zhunchen Luo
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

This paper introduces a novel task of detecting turning points in the engineering process of large-scale projects, wherein the turning points signify significant transitions occurring between phases. Given the complexities involving diverse critical events and limited comprehension in individual news reports, we approach the problem by treating the sequence of related news streams as a window with multiple instances. To capture the evolution of changes effectively, we adopt a deep Multiple Instance Learning (MIL) framework and employ the multiple instance ranking loss to discern the transition patterns exhibited in the turning point window. Extensive experiments consistently demonstrate the effectiveness of our proposed approach on the constructed dataset compared to baseline methods. We deployed the proposed mode and provided a demonstration video to illustrate its functionality. The code and dataset are available on GitHub.


pdf bib
Argumentation-Driven Evidence Association in Criminal Cases
Yefei Teng | WenHan Chao
Findings of the Association for Computational Linguistics: EMNLP 2021

Evidence association in criminal cases is dividing a set of judicial evidence into several non-overlapping subsets, improving the interpretability and legality of conviction. Observably, evidence divided into the same subset usually supports the same claim. Therefore, we propose an argumentation-driven supervised learning method to calculate the distance between evidence pairs for the following evidence association step in this paper. Experimental results on a real-world dataset demonstrate the effectiveness of our method.


pdf bib
Identifying Principals and Accessories in a Complex Case based on the Comprehension of Fact Description
Yakun Hu | Zhunchen Luo | Wenhan Chao
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

In this paper, we study the problem of identifying the principals and accessories from the fact description with multiple defendants in a criminal case. We treat the fact descriptions as narrative texts and the defendants as roles over the narrative story. We propose to model the defendants with behavioral semantic information and statistical characteristics, then learning the importances of defendants within a learning-to-rank framework. Experimental results on a real-world dataset demonstrate the behavior analysis can effectively model the defendants’ impacts in a complex case.

pdf bib
Formality Style Transfer with Shared Latent Space
Yunli Wang | Yu Wu | Lili Mou | Zhoujun Li | WenHan Chao
Proceedings of the 28th International Conference on Computational Linguistics

Conventional approaches for formality style transfer borrow models from neural machine translation, which typically requires massive parallel data for training. However, the dataset for formality style transfer is considerably smaller than translation corpora. Moreover, we observe that informal and formal sentences closely resemble each other, which is different from the translation task where two languages have different vocabularies and grammars. In this paper, we present a new approach, Sequence-to-Sequence with Shared Latent Space (S2S-SLS), for formality style transfer, where we propose two auxiliary losses and adopt joint training of bi-directional transfer and auto-encoding. Experimental results show that S2S-SLS (with either RNN or Transformer architectures) consistently outperforms baselines in various settings, especially when we have limited data.


pdf bib
Harnessing Pre-Trained Neural Networks with Rules for Formality Style Transfer
Yunli Wang | Yu Wu | Lili Mou | Zhoujun Li | Wenhan Chao
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Formality text style transfer plays an important role in various NLP applications, such as non-native speaker assistants and child education. Early studies normalize informal sentences with rules, before statistical and neural models become a prevailing method in the field. While a rule-based system is still a common preprocessing step for formality style transfer in the neural era, it could introduce noise if we use the rules in a naive way such as data preprocessing. To mitigate this problem, we study how to harness rules into a state-of-the-art neural network that is typically pretrained on massive corpora. We propose three fine-tuning methods in this paper and achieve a new state-of-the-art on benchmark datasets


pdf bib
Interpretable Charge Predictions for Criminal Cases: Learning to Generate Court Views from Fact Descriptions
Hai Ye | Xin Jiang | Zhunchen Luo | Wenhan Chao
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

In this paper, we propose to study the problem of court view generation from the fact description in a criminal case. The task aims to improve the interpretability of charge prediction systems and help automatic legal document generation. We formulate this task as a text-to-text natural language generation (NLG) problem. Sequence-to-sequence model has achieved cutting-edge performances in many NLG tasks. However, due to the non-distinctions of fact descriptions, it is hard for Seq2Seq model to generate charge-discriminative court views. In this work, we explore charge labels to tackle this issue. We propose a label-conditioned Seq2Seq model with attention for this problem, to decode court views conditioned on encoded charge labels. Experimental results show the effectiveness of our method.

pdf bib
CRST: a Claim Retrieval System in Twitter
Wenjia Ma | WenHan Chao | Zhunchen Luo | Xin Jiang
Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations

For controversial topics, collecting argumentation-containing tweets which tend to be more convincing will help researchers analyze public opinions. Meanwhile, claim is the heart of argumentation. Hence, we present the first real-time claim retrieval system CRST that retrieves tweets containing claims for a given topic from Twitter. We propose a claim-oriented ranking module which can be divided into the offline topic-independent learning to rank model and the online topic-dependent lexicon model. Our system outperforms previous claim retrieval system and argument mining system. Moreover, the claim-oriented ranking module can be easily adapted to new topics without any manual process or external information, guaranteeing the practicability of our system.

pdf bib
Interpretable Rationale Augmented Charge Prediction System
Xin Jiang | Hai Ye | Zhunchen Luo | WenHan Chao | Wenjia Ma
Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations

This paper proposes a neural based system to solve the essential interpretability problem existing in text classification, especially in charge prediction task. First, we use a deep reinforcement learning method to extract rationales which mean short, readable and decisive snippets from input text. Then a rationale augmented classification model is proposed to elevate the prediction accuracy. Naturally, the extracted rationales serve as the introspection explanation for the prediction result of the model, enhancing the transparency of the model. Experimental results demonstrate that our system is able to extract readable rationales in a high consistency with manual annotation and is comparable with the attention model in prediction accuracy.

pdf bib
Joker at SemEval-2018 Task 12: The Argument Reasoning Comprehension with Neural Attention
Guobin Sui | Wenhan Chao | Zhunchen Luo
Proceedings of the 12th International Workshop on Semantic Evaluation

This paper describes a classification system that participated in the SemEval-2018 Task 12: The Argument Reasoning Comprehension Task. Briefly the task can be described as that a natural language “argument” is what we have, with reason, claim, and correct and incorrect warrants, and we need to choose the correct warrant. In order to make fully understand of the semantic information of the sentences, we proposed a neural network architecture with attention mechanism to achieve this goal. Besides we try to introduce keywords into the model to improve accuracy. Finally the proposed system achieved 5th place among 22 participating systems


pdf bib
Jointly Extracting Relations with Class Ties via Effective Deep Ranking
Hai Ye | Wenhan Chao | Zhunchen Luo | Zhoujun Li
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Connections between relations in relation extraction, which we call class ties, are common. In distantly supervised scenario, one entity tuple may have multiple relation facts. Exploiting class ties between relations of one entity tuple will be promising for distantly supervised relation extraction. However, previous models are not effective or ignore to model this property. In this work, to effectively leverage class ties, we propose to make joint relation extraction with a unified model that integrates convolutional neural network (CNN) with a general pairwise ranking framework, in which three novel ranking loss functions are introduced. Additionally, an effective method is presented to relieve the severe class imbalance problem from NR (not relation) for model training. Experiments on a widely used dataset show that leveraging class ties will enhance extraction and demonstrate the effectiveness of our model to learn class ties. Our model outperforms the baselines significantly, achieving state-of-the-art performance.


pdf bib
A Graph-based Bilingual Corpus Selection Approach for SMT
Wenhan Chao | Zhoujun Li
Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation


pdf bib
An Example-based Decoder for Spoken Language Machine Translation
Zhou-Jun Li | Wen-Han Chao | Yue-Xin Chen
Proceedings of the Sixth SIGHAN Workshop on Chinese Language Processing


pdf bib
NUDT machine translation system for IWSLT2007
Wen-Han Chao | Zhou-Jun Li
Proceedings of the Fourth International Workshop on Spoken Language Translation

In this paper, we describe our machine translation system which was used for the Chinese-to-English task in the IWSLT2007 evaluation campaign. The system is a statistical machine translation (SMT) system, while containing an example-based decoder. In this way, it will help to solve the re-ordering problem and other problems for spoken language MT, such as lots of omissions, idioms etc. We report the results of the system for the provided evaluation sets.

pdf bib
Incorporating constituent structure constraint into discriminative word alignment
Wen-Han Chao | Zhou-Jun Li
Proceedings of Machine Translation Summit XI: Papers