Wei-Fan Chen

2025

Explainable Hallucination through Natural Language Inference Mapping
Wei-Fan Chen | Zhixue Zhao | Akbar Karimi | Lucie Flek
Findings of the Association for Computational Linguistics: ACL 2025

Large language models (LLMs) often generate hallucinated content, making it crucial to identify and quantify inconsistencies in their outputs. We introduce HaluMap, a post-hoc framework that detects hallucinations by mapping entailment and contradiction relations between source inputs and generated outputs using a natural language inference (NLI) model. To improve reliability, we propose a calibration step leveraging intra-text relations to refine predictions. HaluMap outperforms state-of-the-art NLI-based methods by five percentage points compared to other training-free approaches, while providing clear, interpretable explanations. As a training-free and model-agnostic approach, HaluMap offers a practical solution for verifying LLM outputs across diverse NLP tasks. The resources of this paper are available at https://github.com/caisa-lab/acl25-halumap.

2024

pdf bib abs

Reference-guided Style-Consistent Content Transfer
Wei-Fan Chen | Milad Alshomary | Maja Stahl | Khalid Al-Khatib | Benno Stein | Henning Wachsmuth
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

In this paper, we introduce the task of style-consistent content transfer, which concerns modifying a text’s content based on a provided reference statement while preserving its original style. We approach the task by employing multi-task learning to ensure that the modified text meets three important conditions: reference faithfulness, style adherence, and coherence. In particular, we train three independent classifiers for each condition. During inference, these classifiers are used to determine the best modified text variant. Our evaluation, conducted on hotel reviews and news articles, compares our approach with sequence-to-sequence and error correction baselines. The results demonstrate that our approach reasonably generates text satisfying all three conditions. In subsequent analyses, we highlight the strengths and limitations of our approach, providing valuable insights for future research directions.

2022

pdf bib abs

Analyzing Culture-Specific Argument Structures in Learner Essays
Wei-Fan Chen | Mei-Hua Chen | Garima Mudgal | Henning Wachsmuth
Proceedings of the 9th Workshop on Argument Mining

Language education has been shown to benefit from computational argumentation, for example, from methods that assess quality dimensions of language learners’ argumentative essays, such as their organization and argument strength. So far, however, little attention has been paid to cultural differences in learners’ argument structures originating from different origins and language capabilities. This paper extends prior studies of learner argumentation by analyzing differences in the argument structure of essays from culturally diverse learners. Based on the ICLE corpus containing essays written by English learners of 16 different mother tongues, we train natural language processing models to mine argumentative discourse units (ADUs) as well as to assess the essays’ quality in terms of organization and argument strength. The extracted ADUs and the predicted quality scores enable us to look into the similarities and differences of essay argumentation across different English learners. In particular, we analyze the ADUs from learners with different mother tongues, different levels of arguing proficiency, and different context cultures.

2021

pdf bib abs

Controlled Neural Sentence-Level Reframing of News Articles
Wei-Fan Chen | Khalid Al Khatib | Benno Stein | Henning Wachsmuth
Findings of the Association for Computational Linguistics: EMNLP 2021

Framing a news article means to portray the reported event from a specific perspective, e.g., from an economic or a health perspective. Reframing means to change this perspective. Depending on the audience or the submessage, reframing can become necessary to achieve the desired effect on the readers. Reframing is related to adapting style and sentiment, which can be tackled with neural text generation techniques. However, it is more challenging since changing a frame requires rewriting entire sentences rather than single phrases. In this paper, we study how to computationally reframe sentences in news articles while maintaining their coherence to the context. We treat reframing as a sentence-level fill-in-the-blank task for which we train neural models on an existing media frame corpus. To guide the training, we propose three strategies: framed-language pretraining, named-entity preservation, and adversarial learning. We evaluate respective models automatically and manually for topic consistency, coherence, and successful reframing. Our results indicate that generating properly-framed text works well but with tradeoffs.

pdf bib abs

Belief-based Generation of Argumentative Claims
Milad Alshomary | Wei-Fan Chen | Timon Gurcke | Henning Wachsmuth
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

In this work, we argue that augmenting argument generation technology with the ability to encode beliefs is of twofold. First, it gives more control on the generated arguments leading to better reach for audience. Second, it is one way of modeling the human process of synthesizing arguments. Therefore, we propose the task of belief-based claim generation, and study the research question of how to model and encode a user’s beliefs into a generated argumentative text. To this end, we model users’ beliefs via their stances on big issues, and extend state of the art text generation models with extra input reflecting user’s beliefs. Through an automatic evaluation we show empirical evidence of the applicability to encode beliefs into argumentative text. In our manual evaluation, we highlight that the low effectiveness of our approach stems from the noise produced by the automatic collection of bag-of-words, which was mitigated by removing this noise. The finding of this paper lays the ground work to further investigate the role of beliefs in generating better reaching arguments.

2020

pdf bib abs

Analyzing Political Bias and Unfairness in News Articles at Different Levels of Granularity
Wei-Fan Chen | Khalid Al Khatib | Henning Wachsmuth | Benno Stein
Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science

Media is an indispensable source of information and opinion, shaping the beliefs and attitudes of our society. Obviously, media portals can also provide overly biased content, e.g., by reporting on political events in a selective or incomplete manner. A relevant question hence is whether and how such a form of unfair news coverage can be exposed. This paper addresses the automatic detection of bias, but it goes one step further in that it explores how political bias and unfairness are manifested linguistically. We utilize a new corpus of 6964 news articles with labels derived from adfontesmedia.com to develop a neural model for bias assessment. Analyzing the model on article excerpts, we find insightful bias patterns at different levels of text granularity, from single words to the whole article discourse.

pdf bib abs

We propose a shared task on abstractive snippet generation for web pages, a novel task of generating query-biased abstractive summaries for documents that are to be shown on a search results page. Conventional snippets are extractive in nature, which recently gave rise to copyright claims from news publishers as well as a new copyright legislation being passed in the European Union, limiting the fair use of web page contents for snippets. At the same time, abstractive summarization has matured considerably in recent years, potentially allowing for more personalization of snippets in the future. Taken together, these facts render further research into generating abstractive snippets both timely and promising.

pdf bib abs

Detecting Media Bias in News Articles using Gaussian Bias Distributions
Wei-Fan Chen | Khalid Al Khatib | Benno Stein | Henning Wachsmuth
Findings of the Association for Computational Linguistics: EMNLP 2020

Media plays an important role in shaping public opinion. Biased media can influence people in undesirable directions and hence should be unmasked as such. We observe that feature-based and neural text classification approaches which rely only on the distribution of low-level lexical information fail to detect media bias. This weakness becomes most noticeable for articles on new events, where words appear in new contexts and hence their “bias predictiveness” is unclear. In this paper, we therefore study how second-order information about biased statements in an article helps to improve detection effectiveness. In particular, we utilize the probability distributions of the frequency, positions, and sequential order of lexical and informational sentence-level bias in a Gaussian Mixture Model. On an existing media bias dataset, we find that the frequency and positions of biased statements strongly impact article-level bias, whereas their exact sequential order is secondary. Using a standard model for sentence-level bias detection, we provide empirical evidence that article-level bias detectors that use second-order information clearly outperform those without.

2019

pdf bib abs

Unraveling the Search Space of Abusive Language in Wikipedia with Dynamic Lexicon Acquisition
Wei-Fan Chen | Khalid Al Khatib | Matthias Hagen | Henning Wachsmuth | Benno Stein
Proceedings of the Second Workshop on Natural Language Processing for Internet Freedom: Censorship, Disinformation, and Propaganda

Many discussions on online platforms suffer from users offending others by using abusive terminology, threatening each other, or being sarcastic. Since an automatic detection of abusive language can support human moderators of online discussion platforms, detecting abusiveness has recently received increased attention. However, the existing approaches simply train one classifier for the whole variety of abusiveness. In contrast, our approach is to distinguish explicitly abusive cases from the more “shadowed” ones. By dynamically extending a lexicon of abusive terms (e.g., including new obfuscations of abusive terms), our approach can support a moderator with explicit unraveled explanations for why something was flagged as abusive: due to known explicitly abusive terms, due to newly detected (obfuscated) terms, or due to shadowed cases.

2018

pdf bib abs

Learning to Flip the Bias of News Headlines
Wei-Fan Chen | Henning Wachsmuth | Khalid Al-Khatib | Benno Stein
Proceedings of the 11th International Conference on Natural Language Generation

This paper introduces the task of “flipping” the bias of news articles: Given an article with a political bias (left or right), generate an article with the same topic but opposite bias. To study this task, we create a corpus with bias-labeled articles from all-sides.com. As a first step, we analyze the corpus and discuss intrinsic characteristics of bias. They point to the main challenges of bias flipping, which in turn lead to a specific setting in the generation process. The paper in hand narrows down the general bias flipping task to focus on bias flipping for news article headlines. A manual annotation of headlines from each side reveals that they are self-informative in general and often convey bias. We apply an autoencoder incorporating information from an article’s content to learn how to automatically flip the bias. From 200 generated headlines, 73 are classified as understandable by annotators, and 83 maintain the topic while having opposite bias. Insights from our analysis shed light on how to solve the main challenges of bias flipping.

2017

pdf bib abs

Unit Segmentation of Argumentative Texts
Yamen Ajjour | Wei-Fan Chen | Johannes Kiesel | Henning Wachsmuth | Benno Stein
Proceedings of the 4th Workshop on Argument Mining

The segmentation of an argumentative text into argument units and their non-argumentative counterparts is the first step in identifying the argumentative structure of the text. Despite its importance for argument mining, unit segmentation has been approached only sporadically so far. This paper studies the major parameters of unit segmentation systematically. We explore the effectiveness of various features, when capturing words separately, along with their neighbors, or even along with the entire text. Each such context is reflected by one machine learning model that we evaluate within and across three domains of texts. Among the models, our new deep learning approach capturing the entire text turns out best within all domains, with an F-score of up to 88.54. While structural features generalize best across domains, the domain transfer remains hard, which points to major challenges of unit segmentation.

2016

pdf bib abs

UTCNN: a Deep Learning Model of Stance Classification on Social Media Text
Wei-Fan Chen | Lun-Wei Ku
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Most neural network models for document classification on social media focus on text information to the neglect of other information on these platforms. In this paper, we classify post stance on social media channels and develop UTCNN, a neural network model that incorporates user tastes, topic tastes, and user comments on posts. UTCNN not only works on social media texts, but also analyzes texts in forums and message boards. Experiments performed on Chinese Facebook data and English online debate forum data show that UTCNN achieves a 0.755 macro average f-score for supportive, neutral, and unsupportive stance classes on Facebook data, which is significantly better than models in which either user, topic, or comment information is withheld. This model design greatly mitigates the lack of data for the minor class. In addition, UTCNN yields a 0.842 accuracy on English online debate forum data, which also significantly outperforms results from previous work, showing that UTCNN performs well regardless of language or platform.

pdf bib abs

Chinese Textual Sentiment Analysis: Datasets, Resources and Tools
Lun-Wei Ku | Wei-Fan Chen
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Tutorial Abstracts

The rapid accumulation of data in social media (in million and billion scales) has imposed great challenges in information extraction, knowledge discovery, and data mining, and texts bearing sentiment and opinions are one of the major categories of user generated data in social media. Sentiment analysis is the main technology to quickly capture what people think from these text data, and is a research direction with immediate practical value in ‘big data’ era. Learning such techniques will allow data miners to perform advanced mining tasks considering real sentiment and opinions expressed by users in additional to the statistics calculated from the physical actions (such as viewing or purchasing records) user perform, which facilitates the development of real-world applications. However, the situation that most tools are limited to the English language might stop academic or industrial people from doing research or products which cover a wider scope of data, retrieving information from people who speak different languages, or developing applications for worldwide users. More specifically, sentiment analysis determines the polarities and strength of the sentiment-bearing expressions, and it has been an important and attractive research area. In the past decade, resources and tools have been developed for sentiment analysis in order to provide subsequent vital applications, such as product reviews, reputation management, call center robots, automatic public survey, etc. However, most of these resources are for the English language. Being the key to the understanding of business and government issues, sentiment analysis resources and tools are required for other major languages, e.g., Chinese. In this tutorial, audience can learn the skills for retrieving sentiment from texts in another major language, Chinese, to overcome this obstacle. The goal of this tutorial is to introduce the proposed sentiment analysis technologies and datasets in the literature, and give the audience the opportunities to use resources and tools to process Chinese texts from the very basic preprocessing, i.e., word segmentation and part of speech tagging, to sentiment analysis, i.e., applying sentiment dictionaries and obtaining sentiment scores, through step-by-step instructions and a hand-on practice. The basic processing tools are from CKIP Participants can download these resources, use them and solve the problems they encounter in this tutorial. This tutorial will begin from some background knowledge of sentiment analysis, such as how sentiment are categorized, where to find available corpora and which models are commonly applied, especially for the Chinese language. Then a set of basic Chinese text processing tools for word segmentation, tagging and parsing will be introduced for the preparation of mining sentiment and opinions. After bringing the idea of how to pre-process the Chinese language to the audience, I will describe our work on compositional Chinese sentiment analysis from words to sentences, and an application on social media text (Facebook) as an example. All our involved and recently developed related resources, including Chinese Morphological Dataset, Augmented NTU Sentiment Dictionary (aug-NTUSD), E-hownet with sentiment information, Chinese Opinion Treebank, and the CopeOpi Sentiment Scorer, will also be introduced and distributed in this tutorial. The tutorial will end by a hands-on session of how to use these materials and tools to process Chinese sentiment. Content Details, Materials, and Program please refer to the tutorial URL: http://www.lunweiku.com/

pdf bib abs

WordForce: Visualizing Controversial Words in Debates
Wei-Fan Chen | Fang-Yu Lin | Lun-Wei Ku
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations

This paper presents WordForce, a system powered by the state of the art neural network model to visualize the learned user-dependent word embeddings from each post according to the post content and its engaged users. It generates the scatter plots to show the force of a word, i.e., whether the semantics of word embeddings from posts of different stances are clearly separated from the aspect of this controversial word. In addition, WordForce provides the dispersion and the distance of word embeddings from posts of different stance groups, and proposes the most controversial words accordingly to show clues to what people argue about in a debate.

Wei-Fan Chen

2025

2024

2022

2021

2020

2019

2018

2017

2016

2015

Co-authors

Venues