Chung-Chi Chen - ACL Anthology

Chung-Chi Chen

Also published as: Chung-chi Chen

2026

Confidence-Driven Multi-Scale Model Selection for Cost-Efficient Inference
Bo-Wei Chen | Chung-Chi Chen | An-Zi Yen
Findings of the Association for Computational Linguistics: EACL 2026

Large Language Models (LLMs) have revolutionized inference across diverse natural language tasks, with larger models performing better but at higher computational costs. We propose a confidence-driven strategy that dynamically selects the most suitable model based on confidence estimates. By assessing a model’s confidence in handling the task and response accuracy, tasks that are likely to be solved correctly are retained, while more uncertain or complex cases are delegated to a larger model, ensuring reliability while minimizing computation. Specifically, we evaluate a model’s likelihood of knowing the correct answer and the probability that its response is accurate.Experiments on the Massive Multitask Language Understanding (MMLU) benchmark show that our approach achieves accuracy comparable to the largest model while reducing computational costs by 20% to 40%. When applied to GPT-4o API calls, it reduces token usage by approximately 60%, further improving cost efficiency. These findings indicate the potential of confidence-based model selection to enhance real-world LLM deployment, particularly in resource-constrained settings such as edge devices and commercial API applications.

2025

Proceedings of the 2nd Workshop on Agent AI for Scenario Planning
Chung-Chi Chen | Tatsuya Ishigaki | Sophia Ananiadou | Hiroya Takamura
Proceedings of the 2nd Workshop on Agent AI for Scenario Planning

Overview of PBIG Shared Task at AgentScen 2025: Product Business Idea Generation from Patents
Wataru Hirota | Chung-Chi Chen | Tomoko Ohkuma | Tomoki Taniguchi | Tatsuya Ishigaki
Proceedings of the 2nd Workshop on Agent AI for Scenario Planning

From Facts to Insights: A Study on the Generation and Evaluation of Analytical Reports for Deciphering Earnings Calls
Tomas Goldsack | Yang Wang | Chenghua Lin | Chung-Chi Chen
Proceedings of the 31st International Conference on Computational Linguistics

This paper explores the use of Large Language Models (LLMs) in the generation and evaluation of analytical reports derived from Earnings Calls (ECs). Addressing a current gap in research, we explore the generation of analytical reports with LLMs in a multi-agent framework, designing specialized agents that introduce diverse viewpoints and desirable topics of analysis into the report generation process. Through multiple analyses, we examine the alignment between generated and human-written reports and the impact of both individual and collective agents. Our findings suggest that the introduction of additional agents results in more insightful reports, although reports generated by human experts remain preferred in the majority of cases. Finally, we address the challenging issue of report evaluation, we examine the limitations and strengths of LLMs in assessing the quality of generated reports in different settings, revealing a significant correlation with human experts across multiple dimensions.

GADFA: Generator-Assisted Decision-Focused Approach for Opinion Expressing Timing Identification
Chung-Chi Chen | Hiroya Takamura | Ichiro Kobayashi | Yusuke Miyao | Hsin-Hsi Chen
Proceedings of the 31st International Conference on Computational Linguistics

The advancement of text generation models has granted us the capability to produce coherent and convincing text on demand. Yet, in real-life circumstances, individuals do not continuously generate text or voice their opinions. For instance, consumers pen product reviews after weighing the merits and demerits of a product, and professional analysts issue reports following significant news releases. In essence, opinion expression is typically prompted by particular reasons or signals. Despite long-standing developments in opinion mining, the appropriate timing for expressing an opinion remains largely unexplored. To address this deficit, our study introduces an innovative task - the identification of news-triggered opinion expressing timing. We ground this task in the actions of professional stock analysts and develop a novel dataset for investigation. Our Generator-Assisted Decision-Focused Approach (GADFA) is decision-focused, leveraging text generation models to steer the classification model, thus enhancing overall performance. Our experimental findings demonstrate that the text generated by our model contributes fresh insights from various angles, effectively aiding in identifying the optimal timing for opinion expression.

ML-Promise: A Multilingual Dataset for Corporate Promise Verification
Yohei Seki | Hakusen Shu | Anaïs Lhuissier | Hanwool Lee | Juyeon Kang | Min-Yuh Day | Chung-Chi Chen
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Promises made by politicians, corporate leaders, and public figures have a significant impact on public perception, trust, and institutional reputation. However, the complexity and volume of such commitments, coupled with difficulties in verifying their fulfillment, necessitate innovative methods for assessing their credibility. This paper introduces the concept of Promise Verification, a systematic approach involving steps such as promise identification, evidence assessment, and the evaluation of timing for verification. We propose the first multilingual dataset, ML-Promise, which includes English, French, Chinese, Japanese, and Korean, aimed at facilitating in-depth verification of promises, particularly in the context of Environmental, Social, and Governance (ESG) reports. Given the growing emphasis on corporate environmental contributions, this dataset addresses the challenge of evaluating corporate promises, especially in light of practices like greenwashing. Our findings also explore textual and image-based baselines, with promising results from retrieval-augmented generation (RAG) approaches. This work aims to foster further discourse on the accountability of public commitments across multiple languages and domains.

Can GPT-4 Sway Experts’ Investment Decisions?
Takehiro Takayanagi | Hiroya Takamura | Kiyoshi Izumi | Chung-Chi Chen
Findings of the Association for Computational Linguistics: NAACL 2025

In the post-Turing era, evaluating large language models (LLMs) involves assessing generated text based on readers’ decisions rather than merely its indistinguishability from human-produced content. This paper explores how LLM-generated text impacts readers’ decisions, focusing on both amateur and expert audiences. Our findings indicate that GPT-4 can generate persuasive analyses affecting the decisions of both amateurs and professionals. Furthermore, we evaluate the generated text from the aspects of grammar, convincingness, logical coherence, and usefulness. The results highlight a high correlation between real-world evaluation through audience decisions and the current multi-dimensional evaluators commonly used for generative models. Overall, this paper shows the potential and risk of using generated text to sway human decisions and also points out a new direction for evaluating generated text, i.e., leveraging the decisions of readers. We release our dataset to assist future research.

Proceedings of the Joint Workshop of the 9th Financial Technology and Natural Language Processing (FinNLP), the 6th Financial Narrative Processing (FNP), and the 1st Workshop on Large Language Models for Finance and Legal (LLMFinLegal)
Chung-Chi Chen | Antonio Moreno-Sandoval | Jimin Huang | Qianqian Xie | Sophia Ananiadou | Hsin-Hsi Chen
Proceedings of the Joint Workshop of the 9th Financial Technology and Natural Language Processing (FinNLP), the 6th Financial Narrative Processing (FNP), and the 1st Workshop on Large Language Models for Finance and Legal (LLMFinLegal)

Proceedings of The 10th Workshop on Financial Technology and Natural Language Processing
Chung-Chi Chen | Genta Indra Winata | Stephen Rawls | Anirban Das | Hsin-Hsi Chen | Hiroya Takamura
Proceedings of The 10th Workshop on Financial Technology and Natural Language Processing

Earnings2Insights: Analyst Report Generation for Investment Guidance
Takehiro Takayanagi | Tomas Goldsack | Kiyoshi Izumi | Chenghua Lin | Hiroya Takamura | Chung-Chi Chen
Proceedings of The 10th Workshop on Financial Technology and Natural Language Processing

Observing Micromotives and Macrobehavior of Large Language Models
Yuyang Cheng | Xingwei Qu | Tomas Goldsack | Chenghua Lin | Chung-Chi Chen
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics

Thomas C. Schelling, awarded the 2005 Nobel Memorial Prize in Economic Sciences, pointed out that ”individuals decisions (micromotives), while often personal and localized, can lead to societal outcomes (macrobehavior) that are far more complex and different from what the individuals intended.” The current research related to large language models’ (LLMs’) micromotives, such as preferences or biases, assumes that users will make more appropriate decisions once LLMs are devoid of preferences or biases. Consequently, a series of studies has focused on removing bias from LLMs. In the NLP community, while there are many discussions on LLMs’ micromotives, previous studies have seldom conducted a systematic examination of how LLMs may influence society’s macrobehavior. In this paper, we follow the design of Schelling’s model of segregation to observe the relationship between the micromotives and macrobehavior of LLMs. Our results indicate that, regardless of the level of bias in LLMs, a highly segregated society will emerge as more people follow LLMs’ suggestions. We hope our discussion will spark further consideration of the fundamental assumption regarding the mitigation of LLMs’ micromotives and encourage a reevaluation of how LLMs may influence users and society.

Enhancing Investment Opinion Ranking through Argument-Based Sentiment Analysis
Chung-Chi Chen | Hen-Hsen Huang | Hsin-Hsi Chen | Hiroya Takamura | Ichiro Kobayashi | Yusuke Miyao
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics

In the era of rapid Internet and social media development, individuals readily share their investment opinions online. The overwhelming volume of such opinions makes comprehensive evaluation impractical, highlighting the need for an effective recommendation system that can identify valuable insights. To address this challenge, we propose an argument-based sentiment analysis framework that incorporates a new perspective on opinion strength. Our approach introduces the concept of a Fuzzy Strength Degree (FSD), derived from the difference between analysts’ target and closing prices, to quantify the intensity of opinions. By integrating argument mining techniques, we further decompose each opinion into claims and premises, examine their relationships, and use these structures to evaluate the persuasive strength of the arguments. This dual strategy allows us to rank both professional and amateur investor opinions without relying on user history or social signals. Experiments show that our method works best for analyst reports, while on social media, simpler approaches based on wording and professionalism features perform better. Moreover, our analysis of professional analysts’ and traders’ behaviors reveals that top-ranked opinions are more likely to influence subsequent market actions. These findings demonstrate that argument structure and quantified opinion strength provide a novel and reliable foundation for investment opinion recommendation.

Human–Agent Teaming for Higher-Order Thinking Augmentation
Chung-Chi Chen
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics: Tutorial Abstract

Human-agent teaming refers to humans and artificial agents working together toward shared goals, and recent advances in artificial intelligence, including large language models and autonomous robots, have intensified interest in using these agents not only for automation but also to augment higher-order cognition. Higher-order thinking involves complex mental processes such as critical thinking, creative problem solving, abstract reasoning, and metacognition, and intelligent agents hold the potential to act as genuine teammates that complement human strengths and address cognitive limitations. This tutorial synthesizes emerging research on human-agent teaming for cognitive augmentation by outlining the foundations of higher-order thinking and the psychological frameworks that describe it, reviewing key concepts and interaction paradigms in human–AI collaboration, and examining applications across education, healthcare, military decision-making, scientific discovery, and creative industries, where systems such as language models, decision-support tools, multi-agent architectures, explainable AI, and hybrid human–AI methods are used to support complex reasoning and expert judgment. It also discusses the major challenges involved in achieving meaningful augmentation, including the calibration of trust, the need for transparency, the development of shared mental models, the role of human adaptability and training, and broader ethical concerns. The tutorial further identifies gaps such as limited evidence of long-term improvement in human cognitive abilities and insufficient co-adaptation between humans and agents. Finally, it outlines future directions involving real-time cognitive alignment, long-term studies of cognitive development, co-adaptive learning systems, ethics-aware AI teammates, and new benchmarks for evaluating collaborative cognition, offering a comprehensive overview of current progress and a roadmap for advancing human-agent teaming as a means of enhancing higher-order human thinking.

Live Commentary Planning and Generation
Chung-Chi Chen | Huan-Wen Ho | Yu-Yu Chang | Ming-Hung Wang | Ramon Ruiz-Dolz | Chris Reed | Ichiro Kobayashi | Yusuke Miyao | Hiroya Takamura
Proceedings of the 18th International Natural Language Generation Conference: Generation Challenges

SemEval-2025 Task 6: Multinational, Multilingual, Multi-Industry Promise Verification
Chung-Chi Chen | Yohei Seki | Hakusen Shu | Anaïs Lhuissier | Juyeon Kang | Hanwool Lee | Min-Yuh Day | Hiroya Takamura
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

While extensive research exists on misinformation and disinformation, there is limited focus on future-oriented commitments, such as corporate ESG promises, which are often difficult to verify yet significantly impact public trust and market stability. To address this gap, we introduce the task of promise verification, leveraging natural language processing (NLP) techniques to automatically detect ESG commitments, identify supporting evidence, and evaluate the consistency between promises and evidence, while also inferring potential verification time points. This paper presents the dataset used in SemEval-2025 PromiseEval, outlines participant solutions, and discusses key findings. The goal is to enhance transparency in corporate discourse, strengthen investor trust, and support regulators in monitoring the fulfillment of corporate commitments.

2024

Enhancing Society-Undermining Disinformation Detection through Fine-Grained Sentiment Analysis Pre-Finetuning
Tsung-Hsuan Pan | Chung-Chi Chen | Hen-Hsen Huang | Hsin-Hsi Chen
Findings of the Association for Computational Linguistics: EACL 2024

In the era of the digital world, while freedom of speech has been flourishing, it has also paved the way for disinformation, causing detrimental effects on society. Legal and ethical criteria are insufficient to address this concern, thus necessitating technological intervention. This paper presents a novel method leveraging pre-finetuning concept for efficient detection and removal of disinformation that may undermine society, as deemed by judicial entities. We argue the importance of detecting this type of disinformation and validate our approach with real-world data derived from court orders. Following a study that highlighted four areas of interest for rumor analysis, our research proposes the integration of a fine-grained sentiment analysis task in the pre-finetuning phase of language models, using the GoEmotions dataset. Our experiments validate the effectiveness of our approach in enhancing performance significantly. Furthermore, we explore the application of our approach across different languages using multilingual language models, showing promising results. To our knowledge, this is the first study that investigates the role of sentiment analysis pre-finetuning in disinformation detection.

Argument-Based Sentiment Analysis on Forward-Looking Statements
Chin-Yi Lin | Chung-Chi Chen | Hen-Hsen Huang | Hsin-Hsi Chen
Findings of the Association for Computational Linguistics: ACL 2024

This paper introduces a novel approach to analyzing the forward-looking statements in equity research reports by integrating argument mining with sentiment analysis. Recognizing the limitations of traditional models in capturing the nuances of future-oriented analysis, we propose a refined categorization of argument units into claims, premises, and scenarios, coupled with a unique sentiment analysis framework. Furthermore, we incorporate a temporal dimension to categorize the anticipated impact duration of market events. To facilitate this study, we present the Equity Argument Mining and Sentiment Analysis (Equity-AMSA) dataset. Our research investigates the extent to which detailed domain-specific annotations can be provided, the necessity of fine-grained human annotations in the era of large language models, and whether our proposed framework can improve performance in downstream tasks over traditional methods. Experimental results reveal the significance of manual annotations, especially for scenario identification and sentiment analysis. The study concludes that our annotation scheme and dataset contribute to a deeper understanding of forward-looking statements in equity research reports.

DBQR-QA: A Question Answering Dataset on a Hybrid of Database Querying and Reasoning
Rungsiman Nararatwong | Chung-Chi Chen | Natthawut Kertkeidkachorn | Hiroya Takamura | Ryutaro Ichise
Findings of the Association for Computational Linguistics: ACL 2024

This paper introduces the Database Querying and Reasoning Dataset for Question Answering (DBQR-QA), aimed at addressing the gap in current question-answering (QA) research by emphasizing the essential processes of database querying and reasoning to answer questions. Specifically designed to accommodate sequential questions and multi-hop queries, DBQR-QA more accurately mirrors the dynamics of real-world information retrieval and analysis, with a particular focus on the financial reports of US companies. The dataset’s construction, the challenges encountered during its development, the performance of large language models on this dataset, and a human evaluation are thoroughly discussed to illustrate the dataset’s complexity and highlight future research directions in querying and reasoning tasks.

Proceedings of the Joint Workshop of the 7th Financial Technology and Natural Language Processing, the 5th Knowledge Discovery from Unstructured Data in Financial Services, and the 4th Workshop on Economics and Natural Language Processing
Chung-Chi Chen | Xiaomo Liu | Udo Hahn | Armineh Nourbakhsh | Zhiqiang Ma | Charese Smiley | Veronique Hoste | Sanjiv Ranjan Das | Manling Li | Mohammad Ghassemi | Hen-Hsen Huang | Hiroya Takamura | Hsin-Hsi Chen
Proceedings of the Joint Workshop of the 7th Financial Technology and Natural Language Processing, the 5th Knowledge Discovery from Unstructured Data in Financial Services, and the 4th Workshop on Economics and Natural Language Processing

Multi-Lingual ESG Impact Duration Inference
Chung-Chi Chen | Yu-Min Tseng | Juyeon Kang | Anais Lhuissier | Yohei Seki | Hanwool Lee | Min-Yuh Day | Teng-Tsai Tu | Hsin-Hsi Chen
Proceedings of the Joint Workshop of the 7th Financial Technology and Natural Language Processing, the 5th Knowledge Discovery from Unstructured Data in Financial Services, and the 4th Workshop on Economics and Natural Language Processing

To accurately assess the dynamic impact of a company’s activities on its Environmental, Social, and Governance (ESG) scores, we have initiated a series of shared tasks, named ML-ESG. These tasks adhere to the MSCI guidelines for annotating news articles across various languages. This paper details the third iteration of our series, ML-ESG-3, with a focus on impact duration inference—a task that poses significant challenges in estimating the enduring influence of events, even for human analysts. In ML-ESG-3, we provide datasets in five languages (Chinese, English, French, Korean, and Japanese) and share insights from our experience in compiling such subjective datasets. Additionally, this paper reviews the methodologies proposed by ML-ESG-3 participants and offers a comparative analysis of the models’ performances. Concluding the paper, we introduce the concept for the forthcoming series of shared tasks, namely multi-lingual ESG promise verification, and discuss its potential contributions to the field.

Proceedings of the Eighth Financial Technology and Natural Language Processing and the 1st Agent AI for Scenario Planning
Chung-Chi Chen | Tatsuya Ishigaki | Hiroya Takamura | Akihiko Murai | Suzuko Nishino | Hen-Hsen Huang | Hsin-Hsi Chen
Proceedings of the Eighth Financial Technology and Natural Language Processing and the 1st Agent AI for Scenario Planning

Learning Strategies for Robust Argument Mining: An Analysis of Variations in Language and Domain
Ramon Ruiz-Dolz | Chr-Jr Chiu | Chung-Chi Chen | Noriko Kando | Hsin-Hsi Chen
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Argument mining has typically been researched for specific corpora belonging to concrete languages and domains independently in each research work. Human argumentation, however, has domain- and language-dependent linguistic features that determine the content and structure of arguments. Also, when deploying argument mining systems in the wild, we might not be able to control some of these features. Therefore, an important aspect that has not been thoroughly investigated in the argument mining literature is the robustness of such systems to variations in language and domain. In this paper, we present a complete analysis across three different languages and three different domains that allow us to have a better understanding on how to leverage the scarce available corpora to design argument mining systems that are more robust to natural language variations.

NumHG: A Dataset for Number-Focused Headline Generation
Jian-Tao Huang | Chung-Chi Chen | Hen-Hsen Huang | Hsin-Hsi Chen
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Headline generation, a key task in abstractive summarization, strives to condense a full-length article into a succinct, single line of text. Notably, while contemporary encoder-decoder models excel based on the ROUGE metric, they often falter when it comes to the precise generation of numerals in headlines. We identify the lack of datasets providing fine-grained annotations for accurate numeral generation as a major roadblock. To address this, we introduce a new dataset, the NumHG, and provide over 27,000 annotated numeral-rich news articles for detailed investigation. Further, we evaluate five well-performing models from previous headline-generation tasks using human evaluation in terms of numerical accuracy, reasonableness, and readability. Our study reveals a need for improvement in numerical accuracy, demonstrating the potential of the NumHG dataset to drive progress in number-focused headline generation and stimulate further discussions in numeral-focused text generation.

Term-Driven Forward-Looking Claim Synthesis in Earnings Calls
Chung-Chi Chen | Hiroya Takamura
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Argument synthesis aims to generate rational claims, representing a fundamental objective in this field. While existing models excel in summarizing arguments and engaging in debates, we observe a critical gap in their ability to generate accurate arguments that incorporate forward-looking perspectives. In light of this observation, this paper introduces a novel task called “forward-looking claim planning.” We delve into this task by exploring the efficacy of well-performing classification and generation models. Furthermore, we propose several customized preprocessing methods that yield substantial performance improvements. Through comprehensive discussion and analysis, we also outline a future research agenda for the forward-looking claim planning task.

The Impact of Language on Arithmetic Proficiency: A Multilingual Investigation with Cross-Agent Checking Computation
Chung-Chi Chen | Hiroya Takamura | Ichiro Kobayashi | Yusuke Miyao
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers)

This paper critically examines the arithmetic capabilities of Large Language Models (LLMs), uncovering significant limitations in their performance. Our research reveals a notable decline in accuracy for complex calculations involving large numbers, with addition and subtraction tasks showing varying degrees of proficiency. Additionally, we challenge the notion that arithmetic is language-independent, finding up to a 10% difference in performance across twenty languages. The study also compares self-verification methods with cross-agent collaborations, showing that a single model often outperforms collaborative approaches in basic arithmetic tasks. These findings suggest a need to reassess the effectiveness of LLMs in tasks requiring numerical accuracy and precision.

SemEval-2024 Task 7: Numeral-Aware Language Understanding and Generation
Chung-chi Chen | Jian-tao Huang | Hen-hsen Huang | Hiroya Takamura | Hsin-hsi Chen
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

Numbers are frequently utilized in both our daily narratives and professional documents, such as clinical notes, scientific papers, financial documents, and legal court orders. The ability to understand and generate numbers is thus one of the essential aspects of evaluating large language models. In this vein, we propose a collection of datasets in SemEval-2024 Task 7 - NumEval. This collection encompasses several tasks focused on numeral-aware instances, including number prediction, natural language inference, question answering, reading comprehension, reasoning, and headline generation. This paper offers an overview of the dataset and presents the results of all subtasks in NumEval. Additionally, we contribute by summarizing participants’ methods and conducting an error analysis. To the best of our knowledge, NumEval represents one of the early tasks that perform peer evaluation in SemEval’s history. We will further share observations from this aspect and provide suggestions for future SemEval tasks.

2023

Proceedings of the 10th Workshop on Argument Mining
Milad Alshomary | Chung-Chi Chen | Smaranda Muresan | Joonsuk Park | Julia Romberg
Proceedings of the 10th Workshop on Argument Mining

Fidelity-Enriched Contrastive Search: Reconciling the Faithfulness-Diversity Trade-Off in Text Generation
Wei-Lin Chen | Cheng-Kuang Wu | Hsin-Hsi Chen | Chung-Chi Chen
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

In this paper, we address the hallucination problem commonly found in natural language generation tasks. Language models often generate fluent and convincing content but can lack consistency with the provided source, resulting in potential inaccuracies. We propose a new decoding method called Fidelity-Enriched Contrastive Search (FECS), which augments the contrastive search framework with context-aware regularization terms. FECS promotes tokens that are semantically similar to the provided source while penalizing repetitiveness in the generated text. We demonstrate its effectiveness across two tasks prone to hallucination: abstractive summarization and dialogue generation. Results show that FECS consistently enhances faithfulness across various language model sizes while maintaining output diversity comparable to well-performing decoding algorithms.

Improving Numeracy by Input Reframing and Quantitative Pre-Finetuning Task
Chung-Chi Chen | Hiroya Takamura | Ichiro Kobayashi | Yusuke Miyao
Findings of the Association for Computational Linguistics: EACL 2023

Numbers have unique characteristics to words. Teaching models to understand numbers in text is an open-ended research question. Instead of discussing the required calculation skills, this paper focuses on a more fundamental topic: understanding numerals. We point out that innumeracy—the inability to handle basic numeral concepts—exists in most pretrained language models (LMs), and we propose a method to solve this issue by exploring the notation of numbers. Further, we discuss whether changing notation and pre-finetuning along with the comparing-number task can improve performance in three benchmark datasets containing quantitative-related tasks. The results of this study indicate that input reframing and the proposed pre-finetuning task is useful for RoBERTa.

Entity-Aware Dual Co-Attention Network for Fake News Detection
Sin-han Yang | Chung-chi Chen | Hen-Hsen Huang | Hsin-Hsi Chen
Findings of the Association for Computational Linguistics: EACL 2023

Fake news and misinformation spread rapidly on the Internet. How to identify it and how to interpret the identification results have become important issues. In this paper, we propose a Dual Co-Attention Network (Dual-CAN) for fake news detection, which takes news content, social media replies, and external knowledge into consideration. Our experimental results support that the proposed Dual-CAN outperforms current representative models in two benchmark datasets. We further make in-depth discussions by comparing how models work in both datasets with empirical analysis of attention weights.

Proceedings of the Fifth Workshop on Financial Technology and Natural Language Processing and the Second Multimodal AI For Financial Forecasting
Chung-Chi Chen | Hiroya Takamura | Puneet Mathur | Remit Sawhney | Hen-Hsen Huang | Hsin-Hsi Chen
Proceedings of the Fifth Workshop on Financial Technology and Natural Language Processing and the Second Multimodal AI For Financial Forecasting

Multi-Lingual ESG Issue Identification
Chung-Chi Chen | Yu-Min Tseng | Juyeon Kang | Anaïs Lhuissier | Min-Yuh Day | Teng-Tsai Tu | Hsin-Hsi Chen
Proceedings of the Fifth Workshop on Financial Technology and Natural Language Processing and the Second Multimodal AI For Financial Forecasting

Proceedings of the Sixth Workshop on Financial Technology and Natural Language Processing
Chung-Chi Chen | Hen-Hsen Huang | Hiroya Takamura | Hsin-Hsi Chen | Hiroki Sakaji | Kiyoshi Izumi
Proceedings of the Sixth Workshop on Financial Technology and Natural Language Processing

Multi-Lingual ESG Impact Type Identification
Chung-Chi Chen | Yu-Min Tseng | Juyeon Kang | Anaïs Lhuissier | Yohei Seki | Min-Yuh Day | Teng-Tsai Tu | Hsin-Hsi Chen
Proceedings of the Sixth Workshop on Financial Technology and Natural Language Processing

Assessing a company’s sustainable development goes beyond just financial metrics; the inclusion of environmental, social, and governance (ESG) factors is becoming increasingly vital. The ML-ESG shared task series seeks to pioneer discussions on news-driven ESG ratings, drawing inspiration from the MSCI ESG rating guidelines. In its second edition, ML-ESG-2 emphasizes impact type identification, offering datasets in four languages: Chinese, English, French, and Japanese. Of the 28 teams registered, 8 participated in the official evaluation. This paper presents a comprehensive overview of ML-ESG-2, detailing the dataset specifics and summarizing the performance outcomes of the participating teams.

Enhancing Volatility Forecasting in Financial Markets: A General Numeral Attachment Dataset for Understanding Earnings Calls
Ming-Xuan Shi | Chung-Chi Chen | Hen-Hsen Huang | Hsin-Hsi Chen
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)

CustodiAI: A System for Predicting Child Custody Outcomes
Yining Juan | Chung-Chi Chen | Hsin-Hsi Chen | Daw-Wei Wang
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics: System Demonstrations

Generating Multiple Questions from Presentation Transcripts: A Pilot Study on Earnings Conference Calls
Yining Juan | Chung-Chi Chen | Hen-Hsen Huang | Hsin-Hsi Chen
Proceedings of the 16th International Natural Language Generation Conference

In various scenarios, such as conference oral presentations, company managers’ talks, and politicians’ speeches, individuals often contemplate the potential questions that may arise from their presentations. This common practice prompts the research question addressed in this study: to what extent can models generate multiple questions based on a given presentation transcript? To investigate this, we conduct pilot explorations using earnings conference call transcripts, which serve as regular meetings between professional investors and company managers. We experiment with different task settings and methods and evaluate the results from various perspectives. Our findings highlight that incorporating key points retrieval techniques enhances the accuracy and diversity of the generated questions.

2022

Proceedings of the Fourth Workshop on Financial Technology and Natural Language Processing (FinNLP)
Chung-Chi Chen | Hen-Hsen Huang | Hiroya Takamura | Hsin-Hsi Chen
Proceedings of the Fourth Workshop on Financial Technology and Natural Language Processing (FinNLP)

Overview of the FinNLP-2022 ERAI Task: Evaluating the Rationales of Amateur Investors
Chung-Chi Chen | Hen-Hsen Huang | Hiroya Takamura | Hsin-Hsi Chen
Proceedings of the Fourth Workshop on Financial Technology and Natural Language Processing (FinNLP)

This paper provides an overview of the shared task, Evaluating the Rationales of Amateur Investors (ERAI), in FinNLP-2022 at EMNLP-2022. This shared task aims to sort out investment opinions that would lead to higher profit from social platforms. We obtained 19 registered teams; 9 teams submitted their results for final evaluation, and 8 teams submitted papers to share their methods. The discussed directions are various: prompting, fine-tuning, translation system comparison, and tailor-made neural network architectures. We provide details of the task settings, data statistics, participants’ results, and fine-grained analysis.

2021

Dynamic Graph Transformer for Implicit Tag Recognition
Yi-Ting Liou | Chung-Chi Chen | Hen-Hsen Huang | Hsin-Hsi Chen
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Textual information extraction is a typical research topic in the NLP community. Several NLP tasks such as named entity recognition and relation extraction between entities have been well-studied in previous work. However, few works pay their attention to the implicit information. For example, a financial news article mentioned “Apple Inc.” may be also related to Samsung, even though Samsung is not explicitly mentioned in this article. This work presents a novel dynamic graph transformer that distills the textual information and the entity relations on the fly. Experimental results confirm the effectiveness of our approach to implicit tag recognition.

Semantics-Preserved Data Augmentation for Aspect-Based Sentiment Analysis
Ting-Wei Hsu | Chung-Chi Chen | Hen-Hsen Huang | Hsin-Hsi Chen
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Both the issues of data deficiencies and semantic consistency are important for data augmentation. Most of previous methods address the first issue, but ignore the second one. In the cases of aspect-based sentiment analysis, violation of the above issues may change the aspect and sentiment polarity. In this paper, we propose a semantics-preservation data augmentation approach by considering the importance of each word in a textual sequence according to the related aspects and sentiments. We then substitute the unimportant tokens with two replacement strategies without altering the aspect-level polarity. Our approach is evaluated on several publicly available sentiment analysis datasets and the real-world stock price/risk movement prediction scenarios. Experimental results show that our methodology achieves better performances in all datasets.

Financial Opinion Mining
Chung-Chi Chen | Hen-Hsen Huang | Hsin-Hsi Chen
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts

In this tutorial, we will show where we are and where we will be to those researchers interested in this topic. We divide this tutorial into three parts, including coarse-grained financial opinion mining, fine-grained financial opinion mining, and possible research directions. This tutorial starts by introducing the components in a financial opinion proposed in our research agenda and summarizes their related studies. We also highlight the task of mining customers’ opinions toward financial services in the FinTech industry, and compare them with usual opinions. Several potential research questions will be addressed. We hope the audiences of this tutorial will gain an overview of financial opinion mining and figure out their research directions.

Proceedings of the Third Workshop on Financial Technology and Natural Language Processing
Chung-Chi Chen | Hen-Hsen Huang | Hiroya Takamura | Hsin-Hsi Chen
Proceedings of the Third Workshop on Financial Technology and Natural Language Processing

2020

Proceedings of the Second Workshop on Financial Technology and Natural Language Processing
Chung-Chi Chen | Hen-Hsen Huang | Hiroya Takamura | Hsin-Hsi Chen
Proceedings of the Second Workshop on Financial Technology and Natural Language Processing

NTUNLPL at FinCausal 2020, Task 2:Improving Causality Detection Using Viterbi Decoder
Pei-Wei Kao | Chung-Chi Chen | Hen-Hsen Huang | Hsin-Hsi Chen
Proceedings of the 1st Joint Workshop on Financial Narrative Processing and MultiLing Financial Summarisation

In order to provide an explanation of machine learning models, causality detection attracts lots of attention in the artificial intelligence research community. In this paper, we explore the cause-effect detection in financial news and propose an approach, which combines the BIO scheme with the Viterbi decoder for addressing this challenge. Our approach is ranked the first in the official run of cause-effect detection (Task 2) of the FinCausal-2020 shared task. We not only report the implementation details and ablation analysis in this paper, but also publish our code for academic usage.

Issues and Perspectives from 10,000 Annotated Financial Social Media Data
Chung-Chi Chen | Hen-Hsen Huang | Hsin-Hsi Chen
Proceedings of the Twelfth Language Resources and Evaluation Conference

In this paper, we investigate the annotation of financial social media data from several angles. We present Fin-SoMe, a dataset with 10,000 labeled financial tweets annotated by experts from both the front desk and the middle desk in a bank’s treasury. These annotated results reveal that (1) writer-labeled market sentiment may be a misleading label; (2) writer’s sentiment and market sentiment of an investor may be different; (3) most financial tweets provide unfounded analysis results; and (4) almost no investors write down the gain/loss results for their positions, which would otherwise greatly facilitate detailed evaluation of their performance. Based on these results, we address various open problems and suggest possible directions for future work on financial social media data. We also provide an experiment on the key snippet extraction task to compare the performance of using a general sentiment dictionary and using the domain-specific dictionary. The results echo our findings from the experts’ annotations.

2019

Numeracy-600K: Learning Numeracy for Detecting Exaggerated Information in Market Comments
Chung-Chi Chen | Hen-Hsen Huang | Hiroya Takamura | Hsin-Hsi Chen
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

In this paper, we attempt to answer the question of whether neural network models can learn numeracy, which is the ability to predict the magnitude of a numeral at some specific position in a text description. A large benchmark dataset, called Numeracy-600K, is provided for the novel task. We explore several neural network models including CNN, GRU, BiGRU, CRNN, CNN-capsule, GRU-capsule, and BiGRU-capsule in the experiments. The results show that the BiGRU model gets the best micro-averaged F1 score of 80.16%, and the GRU-capsule model gets the best macro-averaged F1 score of 64.71%. Besides discussing the challenges through comprehensive experiments, we also present an important application scenario, i.e., detecting exaggerated information, for the task.

Proceedings of the First Workshop on Financial Technology and Natural Language Processing
Chung-Chi Chen | Hen-Hsen Huang | Hiroya Takamura | Hsin-Hsi Chen
Proceedings of the First Workshop on Financial Technology and Natural Language Processing

2017

NLG301 at SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs and News
Chung-Chi Chen | Hen-Hsen Huang | Hsin-Hsi Chen
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

Short length, multi-targets, target relation-ship, monetary expressions, and outside reference are characteristics of financial tweets. This paper proposes methods to extract target spans from a tweet and its referencing web page. Total 15 publicly available sentiment dictionaries and one sentiment dictionary constructed from training set, containing sentiment scores in binary or real numbers, are used to compute the sentiment scores of text spans. Moreover, the correlation coeffi-cients of the price return between any two stocks are learned with the price data from Bloomberg. They are used to capture the relationships between the interesting tar-get and other stocks mentioned in a tweet. The best result of our method in both sub-task are 56.68% and 55.43%, evaluated by evaluation method 2.

Co-authors

Ichiro Kobayashi 5

Anaïs Lhuissier 5

Tomas Goldsack 3

Tatsuya Ishigaki 3

Kiyoshi Izumi 3

Sophia Ananiadou 2

Jian-Tao Huang 2

Ramon Ruiz-Dolz 2

Takehiro Takayanagi 2

Milad Alshomary 1

Sanjiv Ranjan Das 1

Mohammad Ghassemi 1

Wataru Hirota 1

Veronique Hoste 1

Ryutaro Ichise 1

Natthawut Kertkeidkachorn 1

Puneet Mathur 1

Antonio Moreno-Sandoval 1

Akihiko Murai 1

Smaranda Muresan 1

Rungsiman Nararatwong 1

Suzuko Nishino 1

Armineh Nourbakhsh 1

Tomoko Ohkuma 1

Tsung-Hsuan Pan 1

Stephen Rawls 1

Julia Romberg 1

Hiroki Sakaji 1

Remit Sawhney 1

Ming-Xuan Shi 1

Charese Smiley 1

Tomoki Taniguchi 1

Ming-Hung Wang 1

Genta Indra Winata 1

Cheng-Kuang Wu 1

Venues