Yu-Yin Hsu - ACL Anthology

Yu-Yin Hsu

Also published as: Yu-yin Hsu

2026

LST at MWE-2026 AdMIRe 2: Advancing Multimodal Idiomaticity Representation
Le Qiu | Yu-Yin Hsu | Emmanuele Chersoni
Proceedings of the 22nd Workshop on Multiword Expressions (MWE 2026)

This paper presents our methods for the AdMIRe 2.0 shared task, which addresses multilingual and multimodal idiom understanding. Our submission focuses on the text-only track. Specifically, we employ an ensemble of three large language models (LLMs) to directly perform the presented image ranking task. Each model independently produces a ranking of the candidate images, and we aggregate their outputs using a hard voting strategy to determine the final prediction. This ensemble learning framework leverages the complementary strengths of different LLMs, improving robustness and reducing the variance of individual model predictions.

Large Language Models Put to the Test on Chinese Noun Compounds: Experiments on Natural Language Inference and Compound Semantics
Le Qiu | Emmanuele Chersoni | He Zhou | Yu-Yin Hsu
Proceedings of the 22nd Workshop on Multiword Expressions (MWE 2026)

Noun compounds are generally considered an open challenge for NLP systems, given to the difficulty of interpreting the implicit semantic relation between modifier and head, although the advent of Large Language Models (LLMs) recently led to remarkable performance leaps. However, most evaluations have been carried out on English benchmarks.In our work, we test LLMs on compound semantics understanding in Chinese, adopting two different evaluation scenarios: an extrinsic evaluation in a Natural Language Inference task, and an intrinsic evaluation in which models are directly asked to predict the semantic relation linking the two constituents.Our results show that the bigger and more recent LLMs are able to surpass supervised baselines in the inference task, especially when tested under the few-shot setting. In the more challenging task of selecting the correct interpretation of the compounds out of a fine-grained typology of semantic relations between head and modifier, the best Chinese LLM (Qwen-plus) manages to select the correct option in about one third of the cases.

2025

Branching Out: Exploration of Chinese Dependency Parsing with Fine-tuned Large Language Models
He Zhou | Emmanuele Chersoni | Yu-Yin Hsu
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era

In this paper, we investigate the effectiveness of large language models (LLMs) for Chinese dependency parsing through fine-tuning. We explore how different dependency representations impact parsing performance when fine-tuning the Chinese Llama-3 model. Our results demonstrate that while the Stanford typed dependency tuple representation yields the highest number of valid dependency trees, converting dependency structure into a lexical centered tree produces parses of significantly higher quality despite generating fewer valid structures. The results further show that fine-tuning enhances LLMs’ capability to handle longer dependencies to some extent, though challenges remain. Additionally, we evaluate the effectiveness of DeepSeek in correcting LLM-generated dependency structures, finding that it is effective for fixing index errors and cyclicity issues but still suffers from tokenization mismatches. Our analysis across dependency distances and relations reveals that fine-tuned LLMs outperform traditional parsers in specific syntactic structures while struggling with others. These findings contribute to the research on leveraging LLMs for syntactic analysis tasks.

Sensory and Affective Dimensions in Mandarin Monosyllabic Adjectives
Yimei Shao | Yu-Yin Hsu | Chu-Ren Huang
Proceedings of the 39th Pacific Asia Conference on Language, Information and Computation

‘But this one was so . . . male.’ A Corpus-Based and LLM-Augmented Analysis of Language and Gender Bias in Barbie
Xin Luo | Wing Hei Lok | Yu-Yin Hsu
Proceedings of the 39th Pacific Asia Conference on Language, Information and Computation

Facilitating Cross-lingual Transfer of Empathy through Language-independent Latent Diffusion: A Case Study in Chinese
Junlin Li | Peng Bo | Yu-Yin Hsu
Findings of the Association for Computational Linguistics: EMNLP 2025

Human empathy builds on the shared pragmatic common ground among different languages. However, existing human empathy data is limited to English. Inspired by multilingual coactivation as the neurocognitive underpinning of human bilingual proficiency, which predicts empathy, we integrate language-independent diffusion processes to facilitate the cross-lingual transfer of empathy. Taking Chinese language varieties as the target domain, automatic and human evaluations demonstrate successful transfers of source empathy into target contexts without compromising linguistic naturalness. The results of this work offer empirical clues on the importance of pragmatic transferability of empathy and its cross-lingual effects in conversation.

Not Every Metric is Equal: Cognitive Models for Predicting N400 and P600 Components During Reading Comprehension
Lavinia Salicchi | Yu-Yin Hsu
Proceedings of the 31st International Conference on Computational Linguistics

In recent years, numerous studies have sought to understand the cognitive dynamics underlying language processing by modeling reading times and ERP amplitudes using computational metrics like surprisal. In the present paper, we examine the predictive power of surprisal, entropy, and a novel metric based on semantic similarity for N400 and P600. Our experiments, conducted with Mandarin Chinese materials, revealed three key findings: 1) expectancy plays a primary role for N400; 2) P600 also reflects the cognitive effort required to evaluate linguistic input semantically; and 3) during the time window of interest, information uncertainty influences the language processing the most. Our findings show how computational metrics that capture distinct cognitive dimensions can effectively address psycholinguistic questions.

Towards LLM-powered Attentive Listener: A Pragmatic Approach through Quantity Self-Repair
Junlin Li | Peng Bo | Yu-Yin Hsu
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Grice’s Quantity Maxims dictate that human speakers aim for the optimal quantity of information during conversation. To empower LLMs to self-repair their responses toward optimal quantity and improve their attentive listening skills, we propose Q-Tuning and Q-Traveling, which draw on heuristic path-finding to enable decoder-only LLMs to travel among multiple “Q-alternatives” (Quantity Alternatives) and search for the optimal quantity in coordination with a conversation goal. Automatic and human evaluations demonstrate the effectiveness of Q-Tuning and Q-Traveling in constructing human-like, user-centered conversation agents.

2024

Investigating Aspect Features in Contextualized Embeddings with Semantic Scales and Distributional Similarity
Yuxi Li | Emmanuele Chersoni | Yu-Yin Hsu
Proceedings of the 13th Joint Conference on Lexical and Computational Semantics (*SEM 2024)

Aspect, a linguistic category describing how actions and events unfold over time, is traditionally characterized by three semantic properties: stativity, durativity and telicity. In this study, we investigate whether and to what extent these properties are encoded in the verb token embeddings of the contextualized spaces of two English language models – BERT and GPT-2. First, we propose an experiment using semantic projections to examine whether the values of the vector dimensions of annotated verbs for stativity, durativity and telicity reflect human linguistic distinctions. Second, we use distributional similarity to replicate the notorious Imperfective Paradox described by Dowty (1977), and assess whether the embedding models are sensitive to capture contextual nuances of the verb telicity. Our results show that both models encode the semantic distinctions for the aspect properties of stativity and telicity in most of their layers, while durativity is the most challenging feature. As for the Imperfective Paradox, only the embedding similarities computed with the vectors from the early layers of the BERT model align with the expected pattern.

Predicting Mandarin and Cantonese Adult Speakers’ Eye-Movement Patterns in Natural Reading
Li Junlin | Yu-Yin Hsu | Emmanuele Chersoni | Bo Peng
Proceedings of the 6th Workshop on Research in Computational Linguistic Typology and Multilingual NLP

Please find the attached PDF file for the extended abstract of our study.

Emstremo: Adapting Emotional Support Response with Enhanced Emotion-Strategy Integrated Selection
Junlin Li | Bo Peng | Yu-Yin Hsu
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

To provide effective support, it is essential for a skilled supporter to emotionally resonate with the help-seeker’s current emotional state. In conversational interactions, this emotional alignment is further influenced by the comforting strategies employed by the supporter. Different strategies guide the interlocutors to align their emotions in nuanced patterns. However, the incorporation of strategy into emotional alignment in the context of emotional support agents remains underexplored. To address this limitation, we propose an improved emotional support agent called Emstremo. Emstremo aims to achieve strategic control of emotional alignment by perceiving and responding to the user’s emotions. Our system’s state-of-the-art performance emphasizes the importance of integrating emotions and strategies in modeling conversations that provide emotional support.

Comparing Static and Contextual Distributional Semantic Models on Intrinsic Tasks: An Evaluation on Mandarin Chinese Datasets
Pranav A | Yan Cong | Emmanuele Chersoni | Yu-Yin Hsu | Alessandro Lenci
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

The field of Distributional Semantics has recently undergone important changes, with the contextual representations produced by Transformers taking the place of static word embeddings models. Noticeably, previous studies comparing the two types of vectors have only focused on the English language and a limited number of models. In our study, we present a comparative evaluation of static and contextualized distributional models for Mandarin Chinese, focusing on a range of intrinsic tasks. Our results reveal that static models remain stronger for some of the classical tasks that consider word meaning independent of context, while contextualized models excel in identifying semantic relations between word pairs and in the categorization of words into abstract semantic classes.

EmbodiedBERT: Cognitively Informed Metaphor Detection Incorporating Sensorimotor Information
Yu Xi Li | Bo Peng | Yu-Yin Hsu | Chu-Ren Huang
Findings of the Association for Computational Linguistics: EMNLP 2024

The identification of metaphor is a crucial prerequisite for many downstream language tasks, such as sentiment analysis, opinion mining, and textual entailment. State-of-the-art systems of metaphor detection implement heuristic principles such as Metaphor Identification Procedure (MIP) and Selection Preference Violation (SPV). We propose an innovative approach that leverages the cognitive information of embodiment that can be derived from word embeddings, and explicitly models the process of sensorimotor change that has been demonstrated as essential for human metaphor processing. We showed that this cognitively motivated module is effective and can improve metaphor detection, compared with the heuristic MIP that has been applied previously.

Be Helpful but Don’t Talk too Much - Enhancing Helpfulness in Conversations through Relevance in Multi-Turn Emotional Support
Junlin Li | Bo Peng | Yu-Yin Hsu | Chu-Ren Huang
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

For a conversation to help and support, speakers should maintain an “effect-effort” tradeoff. As outlined in the gist of “Cognitive Relevance Principle”, helpful speakers should optimize the “cognitive relevance” through maximizing the “cognitive effects” and minimizing the “processing effort” imposed on listeners. Although preference learning methods have given rise a boon of studies in pursuit of“effect-optimization”, none have delved into the critical “effort-optimiazation” to fully cultivate the awareness of “optimal relevance” into thecognition of conversation agents. To address this gap, we integrate the “Cognitive Relevance Principle” into emotional support agents in the environment of multi-turn conversation. The results demonstrate a significant and robust improvement against the baseline systems with respect to response quality, human-likedness and supportivenss. This study offers compelling evidence for the effectiveness of the “Relevance Principle” in generating human-like, helpful, and harmless emotional support conversations. The source code will be available at https://github.com/CN-Eyetk/VLESA-ORL.git

What’s in a Name? Electrophysiological Differences in Processing Proper Nouns in Mandarin Chinese
Bernard A. J. Jap | Yu-Yin Hsu | Lavinia Salicchi | Yu Xi Li
Proceedings of the Workshop on Cognitive Aspects of the Lexicon @ LREC-COLING 2024

The current study examines how proper names and common nouns in Chinese are cognitively processed during sentence comprehension. EEG data was recorded when participants were presented with neutral contexts followed by either a proper name or a common noun. Proper names in Chinese often consist of characters that can function independently as words or be combined with other characters to form words, potentially benefiting from the semantic features carried by each character. Using cluster-based permutation tests, we found a larger N400 for common nouns when compared to proper names. Our results suggest that the semantics of characters do play a role in facilitating the processing of proper names. This is consistent with previous behavioral findings on noun processing in Chinese, indicating that common nouns require more cognitive resources to process than proper names. Moreover, our results suggest that proper names are processed differently between alphabetic languages and Chinese language.

Proceedings of the Workshop on Cognitive Aspects of the Lexicon @ LREC-COLING 2024
Michael Zock | Emmanuele Chersoni | Yu-Yin Hsu | Simon de Deyne
Proceedings of the Workshop on Cognitive Aspects of the Lexicon @ LREC-COLING 2024

2023

Comparing and Predicting Eye-tracking Data of Mandarin and Cantonese
Junlin Li | Bo Peng | Yu-yin Hsu | Emmanuele Chersoni
Tenth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2023)

Eye-tracking data in Chinese languages present unique challenges due to the non-alphabetic and unspaced nature of the Chinese writing systems. This paper introduces the first deeply-annotated joint Mandarin-Cantonese eye-tracking dataset, from which we achieve a unified eye-tracking prediction system for both language varieties. In addition to the commonly studied first fixation duration and the total fixation duration, this dataset also includes the second fixation duration, expressing fixation patterns that are more relevant to higher-level, structural processing. A basic comparison of the features and measurements in our dataset revealed variation between Mandarin and Cantonese on fixation patterns related to word class and word position. The test of feature usefulness suggested that traditional features are less powerful in predicting the second-pass fixation, to which the linear distance to root makes a leading contribution in Mandarin. In contrast, Cantonese eye-movement behavior relies more on word position and part of speech.

Are Language Models Sensitive to Semantic Attraction? A Study on Surprisal
Yan Cong | Emmanuele Chersoni | Yu-yin Hsu | Alessandro Lenci
Proceedings of the 12th Joint Conference on Lexical and Computational Semantics (*SEM 2023)

In psycholinguistics, semantic attraction is a sentence processing phenomenon in which a given argument violates the selectional requirements of a verb, but this violation is not perceived by comprehenders due to its attraction to another noun in the same sentence, which is syntactically unrelated but semantically sound. In our study, we use autoregressive language models to compute the sentence-level and the target phrase-level Surprisal scores of a psycholinguistic dataset on semantic attraction. Our results show that the models are sensitive to semantic attraction, leading to reduced Surprisal scores, although none of them perfectly matches the human behavioral pattern.

Proceedings of the 37th Pacific Asia Conference on Language, Information and Computation
Chu-Ren Huang | Yasunari Harada | Jong-Bok Kim | Si Chen | Yu-Yin Hsu | Emmanuele Chersoni | Pranav A | Winnie Huiheng Zeng | Bo Peng | Yuxi Li | Junlin Li
Proceedings of the 37th Pacific Asia Conference on Language, Information and Computation

Collecting and Predicting Neurocognitive Norms for Mandarin Chinese
Le Qiu | Yu-Yin Hsu | Emmanuele Chersoni
Proceedings of the 15th International Conference on Computational Semantics

Language researchers have long assumed that concepts can be represented by sets of semantic features, and have traditionally encountered challenges in identifying a feature set that could be sufficiently general to describe the human conceptual experience in its entirety. In the dataset of English norms presented by Binder et al. (2016), also known as Binder norms, the authors introduced a new set of neurobiologically motivated semantic features in which conceptual primitives were defined in terms of modalities of neural information processing. However, no comparable norms are currently available for other languages. In our work, we built the Mandarin Chinese norm by translating the stimuli used in the original study and developed a comparable collection of human ratings for Mandarin Chinese. We also conducted some experiments on the automatic prediction of the Chinese Binder Norms based on the word embeddings of the corresponding words to assess the feasibility of modeling experiential semantic features via corpus-based representations.

Identifying ESG Impact with Key Information
Le Qiu | Bo Peng | Jinghang Gu | Yu-Yin Hsu | Emmanuele Chersoni
Proceedings of the Sixth Workshop on Financial Technology and Natural Language Processing

The paper presents a concise summary of our work for the ML-ESG-2 shared task, exclusively on the Chinese and English datasets. ML-ESG-2 aims to ascertain the influence of news articles on corporations, specifically from an ESG perspective. To this end, we generally explored the capability of key information for impact identification and experimented with various techniques at different levels. For instance, we attempted to incorporate important information at the word level with TF-IDF, at the sentence level with TextRank, and at the document level with summarization. The final results reveal that the one with GPT-4 for summarisation yields the best predictions.

Investigating the Effect of Discourse Connectives on Transformer Surprisal: Language Models Understand Connectives, Even So They Are Surprised
Yan Cong | Emmanuele Chersoni | Yu-Yin Hsu | Philippe Blache
Proceedings of the 6th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP

As neural language models (NLMs) based on Transformers are becoming increasingly dominant in natural language processing, several studies have proposed analyzing the semantic and pragmatic abilities of such models. In our study, we aimed at investigating the effect of discourse connectives on NLMs with regard to Transformer Surprisal scores by focusing on the English stimuli of an experimental dataset, in which the expectations about an event in a discourse fragment could be reversed by a concessive or a contrastive connective. By comparing the Surprisal scores of several NLMs, we found that bigger NLMs show patterns similar to humans’ behavioral data when a concessive connective is used, while connective-related effects tend to disappear with a contrastive one. We have additionally validated our findings with GPT-Neo using an extended dataset, and results mostly show a consistent pattern.

2022

PolyU-CBS at TSAR-2022 Shared Task: A Simple, Rank-Based Method for Complex Word Substitution in Two Steps
Emmanuele Chersoni | Yu-Yin Hsu
Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)

In this paper, we describe the system we presented at the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022) regarding the shared task on Lexical Simplification for English, Portuguese, and Spanish. We proposed an unsupervised approach in two steps: First, we used a masked language model with word masking for each language to extract possible candidates for the replacement of a difficult word; second, we ranked the candidates according to three different Transformer-based metrics. Finally, we determined our list of candidates based on the lowest average rank across different metrics.

Discovering Financial Hypernyms by Prompting Masked Language Models
Bo Peng | Emmanuele Chersoni | Yu-Yin Hsu | Chu-Ren Huang
Proceedings of the 4th Financial Narrative Processing Workshop @LREC2022

With the rising popularity of Transformer-based language models, several studies have tried to exploit their masked language modeling capabilities to automatically extract relational linguistic knowledge, although this kind of research has rarely investigated semantic relations in specialized domains. The present study aims at testing a general-domain and a domain-adapted Transformer models on two datasets of financial term-hypernym pairs using the prompt methodology. Our results show that the differences of prompts impact critically on models’ performance, and that domain adaptation on financial text generally improves the capacity of the models to associate the target terms with the right hypernyms, although the more successful models are those retaining a general-domain vocabulary.

(In)Alienable Possession in Mandarin Relative Clauses
Deran Kong | Yu-Yin Hsu
Proceedings of the Workshop on Cognitive Aspects of the Lexicon

Inalienable possession differs from alienable possession in that, in the former – e.g., kinships and part-whole relations – there is an intrinsic semantic dependency between the possessor and possessum. This paper reports two studies that used acceptability-judgment tasks to investigate whether native Mandarin speakers experienced different levels of interpretational costs while resolving different types of possessive relations, i.e., inalienable possessions (kinship terms and body parts) and alienable ones, expressed within relative clauses. The results show that sentences received higher acceptability ratings when body parts were the possessum as compared to sentences with alienable possessum, indicating that the inherent semantic dependency facilitates the resolution. However, inalienable kinship terms received the lowest acceptability ratings. We argue that this was because the kinship terms, which had the [+human] feature and appeared at the beginning of the experimental sentences, tended to be interpreted as the subject in shallow processing; these features contradicted the semantic-syntactic requirements of the experimental sentences.

Proceedings of the Workshop on Cognitive Aspects of the Lexicon
Michael Zock | Emmanuele Chersoni | Yu-Yin Hsu | Enrico Santus
Proceedings of the Workshop on Cognitive Aspects of the Lexicon

HkAmsters at CMCL 2022 Shared Task: Predicting Eye-Tracking Data from a Gradient Boosting Framework with Linguistic Features
Lavinia Salicchi | Rong Xiang | Yu-Yin Hsu
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics

Eye movement data are used in psycholinguistic studies to infer information regarding cognitive processes during reading. In this paper, we describe our proposed method for the Shared Task of Cognitive Modeling and Computational Linguistics (CMCL) 2022 - Subtask 1, which involves data from multiple datasets on 6 languages. We compared different regression models using features of the target word and its previous word, and target word surprisal as regression features. Our final system, using a gradient boosting regressor, achieved the lowest mean absolute error (MAE), resulting in the best system of the competition.

2021

Modeling the Influence of Verb Aspect on the Activation of Typical Event Locations with BERT
Won Ik Cho | Emmanuele Chersoni | Yu-Yin Hsu | Chu-Ren Huang
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

Is Domain Adaptation Worth Your Investment? Comparing BERT and FinBERT on Financial Tasks
Bo Peng | Emmanuele Chersoni | Yu-Yin Hsu | Chu-Ren Huang
Proceedings of the Third Workshop on Economics and Natural Language Processing

With the recent rise in popularity of Transformer models in Natural Language Processing, research efforts have been dedicated to the development of domain-adapted versions of BERT-like architectures. In this study, we focus on FinBERT, a Transformer model trained on text from the financial domain. By comparing its performances with the original BERT on a wide variety of financial text processing tasks, we found continual pretraining from the original model to be the more beneficial option. Domain-specific pretraining from scratch, conversely, seems to be less effective.

2018

Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation: 5th Workshop on Asian Translation: 5th Workshop on Asian Translation
Stephen Politzer-Ahles | Yu-Yin Hsu | Chu-Ren Huang | Yao Yao
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation: 5th Workshop on Asian Translation: 5th Workshop on Asian Translation

Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation: 25th Joint Workshop on Linguistics and Language Processing
Stephen Politzer-Ahles | Yu-Yin Hsu | Chu-Ren Huang | Yao Yao
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation: 25th Joint Workshop on Linguistics and Language Processing

Whether and How Mandarin Sandhied Tone 3 and Underlying Tone 2 differ in Terms of Vowel Quality?
Yu-Jung Lin | Yu-Yin Hsu
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation

Prosodic Organization and Focus Realization in Taiwan Mandarin
Yu-Yin Hsu | James German
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation

Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation
Stephen Politzer-Ahles | Yu-Yin Hsu | Chu-Ren Huang | Yao Yao
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation

2012

UBIU for Multilingual Coreference Resolution in OntoNotes
Desislava Zhekova | Sandra Kübler | Joshua Bonner | Marwa Ragheb | Yu-Yin Hsu
Joint Conference on EMNLP and CoNLL - Shared Task

Co-authors

Stephen Politzer-Ahles 3

Lavinia Salicchi 3

Alessandro Lenci 2

Philippe Blache 1

Joshua Bonner 1

Simon De Deyne 1

Yasunari Harada 1

Bernard A. J. Jap 1

Sandra Kübler 1

Enrico Santus 1

Winnie Huiheng Zeng 1

Desislava Zhekova 1

Venues