Yi-Shin Chen

2025

pdf bib abs
Beyond Binary: Enhancing Misinformation Detection with Nuance-Controlled Event Context
Elijah Frederick Albertson | Retnani Latifah | Yi-Shin Chen
Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025)

Misinformation rarely presents itself as entirely true or entirely false. Instead, it often embeds partial truths within misleading contexts, creating narratives that blur the boundary between fact and falsehood. Traditional binary fact-checking frameworks fail to capture this nuance, forcing complex claims into oversimplified categories. To address this gap, we introduce MEGA, a multidimensional graph framework designed to classify ambiguous claims, with a particular focus on those labelled Somewhat True. MEGA integrates event evidence, spatio-temporal metadata, and a quantifiable nuance score. Its Event Candidate Extraction (ECE) module identifies supporting or contradicting evidence, while the Nuance Control Module (NCM) injects or removes nuance to assess its effect on classification. Experiments show that nuance is both detectable and learnable: adding nuance improves borderline discrimination, while stripping it leads the decisions toward false extremes and conceals partial truth. Our top model— nuance-injected without score weighting— improve accuracy and F1 score by 15 and 16 points over the claims-only baseline, and 6 and 9 points over the ECE-only variant. These results show that explicitly modeling nuance alongside context is crucial for classifying mixed-truth claims and advancing fact-checking beyond binary judgments.

pdf bib abs
Language Modeling Using Entanglement Enhanced Tensor Trains
Ellis Reyes | Yi-Shin Chen
Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025)

Tensor Train Language Models (TTLMs) offer significant memory savings by representing text sequences as tensor networks, but naive implementations struggle with long-range dependencies and limited flexibility. We introduce a modular TTLM framework that combine local and non-local context modules to achieve scalable language modeling. Our non-local modules, inspired by entanglement in quantum information theory, enable efficient modeling of long-range interactions between hidden states. Experiments on Penn Treebank and Wikitext datasets show that our modular TTLM, including entanglement-augmented variants, outperform naive baselines. These results highlight TTLMs as a promising, memory-efficient alternatives for modern language modeling.

2024

pdf bib abs
Leveraging Conflicts in Social Media Posts: Unintended Offense Dataset
Che-Wei Tsai | Yen-Hao Huang | Tsu-Keng Liao | Didier Fernando Salazar Estrada | Retnani Latifah | Yi-Shin Chen
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

In multi-person communications, conflicts often arise. Each individual may have their own perspective, which can differ. Additionally, commonly referenced offensive datasets frequently neglect contextual information and are primarily constructed with a focus on intended offenses. This study suggests that conflicts are pivotal in revealing a broader range of human interactions, including instances of unintended offensive language. This paper proposes a conflict-based data collection method to utilize inter-conflict cues in multi-person communications. By focusing on specific cue posts within conversation threads, our proposed approach effectively identifies relevant instances for analysis. Detailed analyses are provided to showcase the proposed approach efficiently gathers data on subtly offensive content. The experimental results indicate that incorporating elements of conflict into data collection significantly enhances the comprehensiveness and accuracy of detecting offensive language but also enriches our understanding of conflict dynamics in digital communication.

2022

pdf bib abs
ConTextING: Granting Document-Wise Contextual Embeddings to Graph Neural Networks for Inductive Text Classification
Yen-Hao Huang | Yi-Hsin Chen | Yi-Shin Chen
Proceedings of the 29th International Conference on Computational Linguistics

Graph neural networks (GNNs) have been recently applied in natural language processing. Various GNN research studies are proposed to learn node interactions within the local graph of each document that contains words, sentences, or topics for inductive text classification. However, most inductive GNNs that are built on a word graph generally take global word embeddings as node features, without referring to document-wise contextual information. Consequently, we find that BERT models can perform better than inductive GNNs. An intuitive follow-up approach is used to enrich GNNs with contextual embeddings from BERT, yet there is a lack of related research. In this work, we propose a simple yet effective unified model, coined ConTextING, with a joint training mechanism to learn from both document embeddings and contextual word interactions simultaneously. Our experiments show that ConTextING outperforms pure inductive GNNs and BERT-style models. The analyses also highlight the benefits of the sub-word graph and joint training with separated classifiers.

pdf bib abs
Unsupervised Text Summarization of Long Documents using Dependency-based Noun Phrases and Contextual Order Arrangement
Yen-Hao Huang | Hsiao-Yen Lan | Yi-Shin Chen
Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022)

Unsupervised extractive summarization has recently gained importance since it does not require labeled data. Among unsupervised methods, graph-based approaches have achieved outstanding results. These methods represent each document by a graph, with sentences as nodes and word-level similarity among sentences as edges. Common words can easily lead to a strong connection between sentence nodes. Thus, sentences with many common words can be misinterpreted as salient sentences for a summary. This work addresses the common word issue with a phrase-level graph that (1) focuses on the noun phrases of a document based on grammar dependencies and (2) initializes edge weights by term-frequency within the target document and inverse document frequency over the entire corpus. The importance scores of noun phrases extracted from the graph are then used to select the most salient sentences. To preserve summary coherence, the order of the selected sentences is re-arranged by a flow-aware orderBERT. The results reveal that our unsupervised framework outperformed other extractive methods on ROUGE as well as two human evaluations for semantic similarity and summary coherence.

2021

pdf bib abs
Unsupervised Multi-document Summarization for News Corpus with Key Synonyms and Contextual Embeddings
Yen-Hao Huang | Ratana Pornvattanavichai | Fernando Henrique Calderon Alvarado | Yi-Shin Chen
Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing (ROCLING 2021)

Information overload has been one of the challenges regarding information from the Internet. It is not a matter of information access, instead, the focus had shifted towards the quality of the retrieved data. Particularly in the news domain, multiple outlets report on the same news events but may differ in details. This work considers that different news outlets are more likely to differ in their writing styles and the choice of words, and proposes a method to extract sentences based on their key information by focusing on the shared synonyms in each sentence. Our method also attempts to reduce redundancy through hierarchical clustering and arrange selected sentences on the proposed orderBERT. The results show that the proposed unsupervised framework successfully improves the coverage, coherence, and, meanwhile, reduces the redundancy for a generated summary. Moreover, due to the process of obtaining the dataset, we also propose a data refinement method to alleviate the problems of undesirable texts, which result from the process of automatic scraping.

2019

pdf bib
Discovering the Latent Writing Style from Articles: A Contextualized Feature Extraction Approach
Yen-Hao Huang | Ting-Wei Liu | Ssu-Rui Lee | Ya-Wen Yu | Wan-Hsuan Lee | Fernando Henrique Calderon Alvarado | Yi-Shin Chen
International Journal of Computational Linguistics & Chinese Language Processing, Volume 24, Number 1, June 2019

2018

pdf bib abs
CARER: Contextualized Affect Representations for Emotion Recognition
Elvis Saravia | Hsien-Chi Toby Liu | Yen-Hao Huang | Junlin Wu | Yi-Shin Chen
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Emotions are expressed in nuanced ways, which varies by collective or individual experiences, knowledge, and beliefs. Therefore, to understand emotion, as conveyed through text, a robust mechanism capable of capturing and modeling different linguistic nuances and phenomena is needed. We propose a semi-supervised, graph-based algorithm to produce rich structural descriptors which serve as the building blocks for constructing contextualized affect representations from text. The pattern-based representations are further enriched with word embeddings and evaluated through several emotion recognition tasks. Our experimental results demonstrate that the proposed method outperforms state-of-the-art techniques on emotion recognition tasks.

2014

pdf bib
Collaborative Ranking between Supervised and Unsupervised Approaches for Keyphrase Extraction
Gerardo Figueroa | Yi-Shin Chen
Proceedings of the 26th Conference on Computational Linguistics and Speech Processing (ROCLING 2014)

pdf bib
Multi-Lingual Sentiment Analysis of Social Data Based on Emotion-Bearing Patterns
Carlos Argueta | Yi-Shin Chen
Proceedings of the Second Workshop on Natural Language Processing for Social Media (SocialNLP)

Co-authors

Venues

Fix author