Hsin-Hsi Chen

Also published as: Hsin-hsi Chen


2024

pdf bib
Induct-Learn: Short Phrase Prompting with Instruction Induction
Po-Chun Chen | Sheng-Lun Wei | Hen-Hsen Huang | Hsin-Hsi Chen
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Large Language Models (LLMs) have demonstrated capability in “instruction induction,” generating instructions from demonstrations (input-output pairs). However, existing methods often rely on large datasets or numerous examples, which is impractical and costly in real-world scenarios. In this work, we propose a low-cost, task-level framework called Induct-Learn. It induces pseudo instructions from a few demonstrations and a short phrase, adding a CoT process into existing demonstrations. When encountering new problems, the learned pseudo instructions and demonstrations with the pseudo CoT process can be combined into a prompt to guide the LLM’s problem-solving process. We validate our approach on the BBH-Induct and Evals-Induct datasets, and the results show that the Induct-Learn framework outperforms state-of-the-art methods. We also exhibit cross-model adaptability and achieve superior performance at a lower cost compared to existing methods.

pdf bib
Enhancing Society-Undermining Disinformation Detection through Fine-Grained Sentiment Analysis Pre-Finetuning
Tsung-Hsuan Pan | Chung-Chi Chen | Hen-Hsen Huang | Hsin-Hsi Chen
Findings of the Association for Computational Linguistics: EACL 2024

In the era of the digital world, while freedom of speech has been flourishing, it has also paved the way for disinformation, causing detrimental effects on society. Legal and ethical criteria are insufficient to address this concern, thus necessitating technological intervention. This paper presents a novel method leveraging pre-finetuning concept for efficient detection and removal of disinformation that may undermine society, as deemed by judicial entities. We argue the importance of detecting this type of disinformation and validate our approach with real-world data derived from court orders. Following a study that highlighted four areas of interest for rumor analysis, our research proposes the integration of a fine-grained sentiment analysis task in the pre-finetuning phase of language models, using the GoEmotions dataset. Our experiments validate the effectiveness of our approach in enhancing performance significantly. Furthermore, we explore the application of our approach across different languages using multilingual language models, showing promising results. To our knowledge, this is the first study that investigates the role of sentiment analysis pre-finetuning in disinformation detection.

pdf bib
Unveiling Selection Biases: Exploring Order and Token Sensitivity in Large Language Models
Sheng-Lun Wei | Cheng-Kuang Wu | Hen-Hsen Huang | Hsin-Hsi Chen
Findings of the Association for Computational Linguistics: ACL 2024

In this paper, we investigate the phenomena of “selection biases” in Large Language Models (LLMs), focusing on problems where models are tasked with choosing the optimal option from an ordered sequence. We delve into biases related to option order and token usage, which significantly impact LLMs’ decision-making processes. We also quantify the impact of these biases through an extensive empirical analysis across multiple models and tasks. Furthermore, we propose mitigation strategies to enhance model performance. Our key contributions are threefold: 1) Precisely quantifying the influence of option order and token on LLMs, 2) Developing strategies to mitigate the impact of token and order sensitivity to enhance robustness, and 3) Offering a detailed analysis of sensitivity across models and tasks, which informs the creation of more stable and reliable LLM applications for selection problems.

pdf bib
Argument-Based Sentiment Analysis on Forward-Looking Statements
Chin-Yi Lin | Chung-Chi Chen | Hen-Hsen Huang | Hsin-Hsi Chen
Findings of the Association for Computational Linguistics: ACL 2024

This paper introduces a novel approach to analyzing the forward-looking statements in equity research reports by integrating argument mining with sentiment analysis. Recognizing the limitations of traditional models in capturing the nuances of future-oriented analysis, we propose a refined categorization of argument units into claims, premises, and scenarios, coupled with a unique sentiment analysis framework. Furthermore, we incorporate a temporal dimension to categorize the anticipated impact duration of market events. To facilitate this study, we present the Equity Argument Mining and Sentiment Analysis (Equity-AMSA) dataset. Our research investigates the extent to which detailed domain-specific annotations can be provided, the necessity of fine-grained human annotations in the era of large language models, and whether our proposed framework can improve performance in downstream tasks over traditional methods. Experimental results reveal the significance of manual annotations, especially for scenario identification and sentiment analysis. The study concludes that our annotation scheme and dataset contribute to a deeper understanding of forward-looking statements in equity research reports.

pdf bib
Proceedings of the Joint Workshop of the 7th Financial Technology and Natural Language Processing, the 5th Knowledge Discovery from Unstructured Data in Financial Services, and the 4th Workshop on Economics and Natural Language Processing
Chung-Chi Chen | Xiaomo Liu | Udo Hahn | Armineh Nourbakhsh | Zhiqiang Ma | Charese Smiley | Veronique Hoste | Sanjiv Ranjan Das | Manling Li | Mohammad Ghassemi | Hen-Hsen Huang | Hiroya Takamura | Hsin-Hsi Chen
Proceedings of the Joint Workshop of the 7th Financial Technology and Natural Language Processing, the 5th Knowledge Discovery from Unstructured Data in Financial Services, and the 4th Workshop on Economics and Natural Language Processing

pdf bib
Multi-Lingual ESG Impact Duration Inference
Chung-Chi Chen | Yu-Min Tseng | Juyeon Kang | Anais Lhuissier | Yohei Seki | Hanwool Lee | Min-Yuh Day | Teng-Tsai Tu | Hsin-Hsi Chen
Proceedings of the Joint Workshop of the 7th Financial Technology and Natural Language Processing, the 5th Knowledge Discovery from Unstructured Data in Financial Services, and the 4th Workshop on Economics and Natural Language Processing

To accurately assess the dynamic impact of a company’s activities on its Environmental, Social, and Governance (ESG) scores, we have initiated a series of shared tasks, named ML-ESG. These tasks adhere to the MSCI guidelines for annotating news articles across various languages. This paper details the third iteration of our series, ML-ESG-3, with a focus on impact duration inference—a task that poses significant challenges in estimating the enduring influence of events, even for human analysts. In ML-ESG-3, we provide datasets in five languages (Chinese, English, French, Korean, and Japanese) and share insights from our experience in compiling such subjective datasets. Additionally, this paper reviews the methodologies proposed by ML-ESG-3 participants and offers a comparative analysis of the models’ performances. Concluding the paper, we introduce the concept for the forthcoming series of shared tasks, namely multi-lingual ESG promise verification, and discuss its potential contributions to the field.

pdf bib
Proceedings of the Eighth Financial Technology and Natural Language Processing and the 1st Agent AI for Scenario Planning
Chung-Chi Chen | Tatsuya Ishigaki | Hiroya Takamura | Akihiko Murai | Suzuko Nishino | Hen-Hsen Huang | Hsin-Hsi Chen
Proceedings of the Eighth Financial Technology and Natural Language Processing and the 1st Agent AI for Scenario Planning

pdf bib
Learning Strategies for Robust Argument Mining: An Analysis of Variations in Language and Domain
Ramon Ruiz-Dolz | Chr-Jr Chiu | Chung-Chi Chen | Noriko Kando | Hsin-Hsi Chen
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Argument mining has typically been researched for specific corpora belonging to concrete languages and domains independently in each research work. Human argumentation, however, has domain- and language-dependent linguistic features that determine the content and structure of arguments. Also, when deploying argument mining systems in the wild, we might not be able to control some of these features. Therefore, an important aspect that has not been thoroughly investigated in the argument mining literature is the robustness of such systems to variations in language and domain. In this paper, we present a complete analysis across three different languages and three different domains that allow us to have a better understanding on how to leverage the scarce available corpora to design argument mining systems that are more robust to natural language variations.

pdf bib
NumHG: A Dataset for Number-Focused Headline Generation
Jian-Tao Huang | Chung-Chi Chen | Hen-Hsen Huang | Hsin-Hsi Chen
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Headline generation, a key task in abstractive summarization, strives to condense a full-length article into a succinct, single line of text. Notably, while contemporary encoder-decoder models excel based on the ROUGE metric, they often falter when it comes to the precise generation of numerals in headlines. We identify the lack of datasets providing fine-grained annotations for accurate numeral generation as a major roadblock. To address this, we introduce a new dataset, the NumHG, and provide over 27,000 annotated numeral-rich news articles for detailed investigation. Further, we evaluate five well-performing models from previous headline-generation tasks using human evaluation in terms of numerical accuracy, reasonableness, and readability. Our study reveals a need for improvement in numerical accuracy, demonstrating the potential of the NumHG dataset to drive progress in number-focused headline generation and stimulate further discussions in numeral-focused text generation.

pdf bib
SemEval-2024 Task 7: Numeral-Aware Language Understanding and Generation
Chung-chi Chen | Jian-tao Huang | Hen-hsen Huang | Hiroya Takamura | Hsin-hsi Chen
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

Numbers are frequently utilized in both our daily narratives and professional documents, such as clinical notes, scientific papers, financial documents, and legal court orders. The ability to understand and generate numbers is thus one of the essential aspects of evaluating large language models. In this vein, we propose a collection of datasets in SemEval-2024 Task 7 - NumEval. This collection encompasses several tasks focused on numeral-aware instances, including number prediction, natural language inference, question answering, reading comprehension, reasoning, and headline generation. This paper offers an overview of the dataset and presents the results of all subtasks in NumEval. Additionally, we contribute by summarizing participants’ methods and conducting an error analysis. To the best of our knowledge, NumEval represents one of the early tasks that perform peer evaluation in SemEval’s history. We will further share observations from this aspect and provide suggestions for future SemEval tasks.

2023

pdf bib
Entity-Aware Dual Co-Attention Network for Fake News Detection
Sin-han Yang | Chung-chi Chen | Hen-Hsen Huang | Hsin-Hsi Chen
Findings of the Association for Computational Linguistics: EACL 2023

Fake news and misinformation spread rapidly on the Internet. How to identify it and how to interpret the identification results have become important issues. In this paper, we propose a Dual Co-Attention Network (Dual-CAN) for fake news detection, which takes news content, social media replies, and external knowledge into consideration. Our experimental results support that the proposed Dual-CAN outperforms current representative models in two benchmark datasets. We further make in-depth discussions by comparing how models work in both datasets with empirical analysis of attention weights.

pdf bib
LED: A Dataset for Life Event Extraction from Dialogs
Yi-Pei Chen | An-Zi Yen | Hen-Hsen Huang | Hideki Nakayama | Hsin-Hsi Chen
Findings of the Association for Computational Linguistics: EACL 2023

Lifelogging has gained more attention due to its wide applications, such as personalized recommendations or memory assistance. The issues of collecting and extracting personal life events have emerged. People often share their life experiences with others through conversations. However, extracting life events from conversations is rarely explored. In this paper, we present Life Event Dialog, a dataset containing fine-grained life event annotations on conversational data. In addition, we initiate a novel Conversational Life Event Extraction task and differentiate the task from the public event extraction or the life event extraction from other sources like microblogs. We explore three information extraction (IE) frameworks to address the Conversational Life Event Extraction task: OpenIE, relation extraction, and event extraction. A comprehensive empirical analysis of the three baselines is established. The results suggest that the current event extraction model still struggles with extracting life events from human daily conversations. Our proposed Life Event Dialog dataset and in-depth analysis of IE frameworks will facilitate future research on life event extraction from conversations.

pdf bib
ZARA: Improving Few-Shot Self-Rationalization for Small Language Models
Wei-Lin Chen | An-Zi Yen | Cheng-Kuang Wu | Hen-Hsen Huang | Hsin-Hsi Chen
Findings of the Association for Computational Linguistics: EMNLP 2023

Language models (LMs) that jointly generate end-task answers as well as free-text rationales are known as self-rationalization models. Recent works demonstrate great performance gain for self-rationalization by few-shot prompting LMs with rationale-augmented exemplars. However, the ability to benefit from explanations only emerges with large-scale LMs, which have poor accessibility. In this work, we explore the less-studied setting of leveraging explanations for small LMs to improve few-shot self-rationalization. We first revisit the relationship between rationales and answers. Inspired by the implicit mental process of how human beings assess explanations, we present a novel approach, Zero-shot Augmentation of Rationale-Answer pairs (ZARA), to automatically construct pseudo-parallel data for self-training by reducing the problem of plausibility judgement to natural language inference. Experimental results show ZARA achieves SOTA performance on the FEB benchmark, for both the task accuracy and the explanation metric. In addition, we conduct human and quantitative evaluation validating ZARA’s ability to automatically identify plausible and accurate rationale-answer pairs.

pdf bib
Fidelity-Enriched Contrastive Search: Reconciling the Faithfulness-Diversity Trade-Off in Text Generation
Wei-Lin Chen | Cheng-Kuang Wu | Hsin-Hsi Chen | Chung-Chi Chen
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

In this paper, we address the hallucination problem commonly found in natural language generation tasks. Language models often generate fluent and convincing content but can lack consistency with the provided source, resulting in potential inaccuracies. We propose a new decoding method called Fidelity-Enriched Contrastive Search (FECS), which augments the contrastive search framework with context-aware regularization terms. FECS promotes tokens that are semantically similar to the provided source while penalizing repetitiveness in the generated text. We demonstrate its effectiveness across two tasks prone to hallucination: abstractive summarization and dialogue generation. Results show that FECS consistently enhances faithfulness across various language model sizes while maintaining output diversity comparable to well-performing decoding algorithms.

pdf bib
Self-ICL: Zero-Shot In-Context Learning with Self-Generated Demonstrations
Wei-Lin Chen | Cheng-Kuang Wu | Yun-Nung Chen | Hsin-Hsi Chen
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Large language models (LLMs) have exhibited striking in-context learning (ICL) ability to adapt to target tasks with a few input-output demonstrations. For better ICL, different methods are proposed to select representative demonstrations from existing training corpora. However, such settings are not aligned with real-world practices, as end-users usually query LMs without access to demonstration pools. In this work, we introduce Self-ICL—a simple framework which bootstraps LMs’ intrinsic capabilities to perform zero-shot ICL. Given a test input, Self-ICL first prompts the model to generate pseudo-inputs. Next, the model predicts pseudo-labels for the pseudo-inputs via zero-shot prompting. Finally, we perform ICL for the test input with the pseudo-input-label pairs as demonstrations. Evaluation on 23 BIG-Bench Hard tasks shows Self-ICL outperforms zero-shot baselines on both average accuracy and head-to-head comparison. Moreover, with zero-shot chain-of-thought, Self-ICL achieves results comparable to using real demonstrations. Additionally, we conduct a range of analyses to validate Self-ICL’s effectiveness and provide insights for its behaviors under different settings.

pdf bib
Enhancing Volatility Forecasting in Financial Markets: A General Numeral Attachment Dataset for Understanding Earnings Calls
Ming-Xuan Shi | Chung-Chi Chen | Hen-Hsen Huang | Hsin-Hsi Chen
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
CustodiAI: A System for Predicting Child Custody Outcomes
Yining Juan | Chung-Chi Chen | Hsin-Hsi Chen | Daw-Wei Wang
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics: System Demonstrations

pdf bib
Generating Multiple Questions from Presentation Transcripts: A Pilot Study on Earnings Conference Calls
Yining Juan | Chung-Chi Chen | Hen-Hsen Huang | Hsin-Hsi Chen
Proceedings of the 16th International Natural Language Generation Conference

In various scenarios, such as conference oral presentations, company managers’ talks, and politicians’ speeches, individuals often contemplate the potential questions that may arise from their presentations. This common practice prompts the research question addressed in this study: to what extent can models generate multiple questions based on a given presentation transcript? To investigate this, we conduct pilot explorations using earnings conference call transcripts, which serve as regular meetings between professional investors and company managers. We experiment with different task settings and methods and evaluate the results from various perspectives. Our findings highlight that incorporating key points retrieval techniques enhances the accuracy and diversity of the generated questions.

pdf bib
Proceedings of the Fifth Workshop on Financial Technology and Natural Language Processing and the Second Multimodal AI For Financial Forecasting
Chung-Chi Chen | Hiroya Takamura | Puneet Mathur | Remit Sawhney | Hen-Hsen Huang | Hsin-Hsi Chen
Proceedings of the Fifth Workshop on Financial Technology and Natural Language Processing and the Second Multimodal AI For Financial Forecasting

pdf bib
Multi-Lingual ESG Issue Identification
Chung-Chi Chen | Yu-Min Tseng | Juyeon Kang | Anaïs Lhuissier | Min-Yuh Day | Teng-Tsai Tu | Hsin-Hsi Chen
Proceedings of the Fifth Workshop on Financial Technology and Natural Language Processing and the Second Multimodal AI For Financial Forecasting

pdf bib
Proceedings of the Sixth Workshop on Financial Technology and Natural Language Processing
Chung-Chi Chen | Hen-Hsen Huang | Hiroya Takamura | Hsin-Hsi Chen | Hiroki Sakaji | Kiyoshi Izumi
Proceedings of the Sixth Workshop on Financial Technology and Natural Language Processing

pdf bib
Multi-Lingual ESG Impact Type Identification
Chung-Chi Chen | Yu-Min Tseng | Juyeon Kang | Anaïs Lhuissier | Yohei Seki | Min-Yuh Day | Teng-Tsai Tu | Hsin-Hsi Chen
Proceedings of the Sixth Workshop on Financial Technology and Natural Language Processing

Assessing a company’s sustainable development goes beyond just financial metrics; the inclusion of environmental, social, and governance (ESG) factors is becoming increasingly vital. The ML-ESG shared task series seeks to pioneer discussions on news-driven ESG ratings, drawing inspiration from the MSCI ESG rating guidelines. In its second edition, ML-ESG-2 emphasizes impact type identification, offering datasets in four languages: Chinese, English, French, and Japanese. Of the 28 teams registered, 8 participated in the official evaluation. This paper presents a comprehensive overview of ML-ESG-2, detailing the dataset specifics and summarizing the performance outcomes of the participating teams.

2022

pdf bib
SEEN: Structured Event Enhancement Network for Explainable Need Detection of Information Recall Assistance
You-En Lin | An-Zi Yen | Hen-Hsen Huang | Hsin-Hsi Chen
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

When recalling life experiences, people often forget or confuse life events, which necessitates information recall services. Previous work on information recall focuses on providing such assistance reactively, i.e., by retrieving the life event of a given query. Proactively detecting the need for information recall services is rarely discussed. In this paper, we use a human-annotated life experience retelling dataset to detect the right time to trigger the information recall service. We propose a pilot model—structured event enhancement network (SEEN) that detects life event inconsistency, additional information in life events, and forgotten events. A fusing mechanism is also proposed to incorporate event graphs of stories and enhance the textual representations. To explain the need detection results, SEEN simultaneously provides support evidence by selecting the related nodes from the event graph. Experimental results show that SEEN achieves promising performance in detecting information needs. In addition, the extracted evidence can be served as complementary information to remind users what events they may want to recall.

pdf bib
Proceedings of the Fourth Workshop on Financial Technology and Natural Language Processing (FinNLP)
Chung-Chi Chen | Hen-Hsen Huang | Hiroya Takamura | Hsin-Hsi Chen
Proceedings of the Fourth Workshop on Financial Technology and Natural Language Processing (FinNLP)

pdf bib
Overview of the FinNLP-2022 ERAI Task: Evaluating the Rationales of Amateur Investors
Chung-Chi Chen | Hen-Hsen Huang | Hiroya Takamura | Hsin-Hsi Chen
Proceedings of the Fourth Workshop on Financial Technology and Natural Language Processing (FinNLP)

This paper provides an overview of the shared task, Evaluating the Rationales of Amateur Investors (ERAI), in FinNLP-2022 at EMNLP-2022. This shared task aims to sort out investment opinions that would lead to higher profit from social platforms. We obtained 19 registered teams; 9 teams submitted their results for final evaluation, and 8 teams submitted papers to share their methods. The discussed directions are various: prompting, fine-tuning, translation system comparison, and tailor-made neural network architectures. We provide details of the task settings, data statistics, participants’ results, and fine-grained analysis.

pdf bib
Proceedings of the 29th International Conference on Computational Linguistics
Nicoletta Calzolari | Chu-Ren Huang | Hansaem Kim | James Pustejovsky | Leo Wanner | Key-Sun Choi | Pum-Mo Ryu | Hsin-Hsi Chen | Lucia Donatelli | Heng Ji | Sadao Kurohashi | Patrizia Paggio | Nianwen Xue | Seokhwan Kim | Younggyun Hahm | Zhong He | Tony Kyungil Lee | Enrico Santus | Francis Bond | Seung-Hoon Na
Proceedings of the 29th International Conference on Computational Linguistics

pdf bib
Learning to Generate Explanation from e-Hospital Services for Medical Suggestion
Wei-Lin Chen | An-Zi Yen | Hen-Hsen Huang | Hsin-Hsi Chen
Proceedings of the 29th International Conference on Computational Linguistics

Explaining the reasoning of neural models has attracted attention in recent years. Providing highly-accessible and comprehensible explanations in natural language is useful for humans to understand model’s prediction results. In this work, we present a pilot study to investigate explanation generation with a narrative and causal structure for the scenario of health consulting. Our model generates a medical suggestion regarding the patient’s concern and provides an explanation as the outline of the reasoning. To align the generated explanation with the suggestion, we propose a novel discourse-aware mechanism with multi-task learning. Experimental results show that our model achieves promising performances in both quantitative and human evaluation.

2021

pdf bib
Dynamic Graph Transformer for Implicit Tag Recognition
Yi-Ting Liou | Chung-Chi Chen | Hen-Hsen Huang | Hsin-Hsi Chen
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Textual information extraction is a typical research topic in the NLP community. Several NLP tasks such as named entity recognition and relation extraction between entities have been well-studied in previous work. However, few works pay their attention to the implicit information. For example, a financial news article mentioned “Apple Inc.” may be also related to Samsung, even though Samsung is not explicitly mentioned in this article. This work presents a novel dynamic graph transformer that distills the textual information and the entity relations on the fly. Experimental results confirm the effectiveness of our approach to implicit tag recognition.

pdf bib
Proceedings of the Third Workshop on Financial Technology and Natural Language Processing
Chung-Chi Chen | Hen-Hsen Huang | Hiroya Takamura | Hsin-Hsi Chen
Proceedings of the Third Workshop on Financial Technology and Natural Language Processing

pdf bib
Semantics-Preserved Data Augmentation for Aspect-Based Sentiment Analysis
Ting-Wei Hsu | Chung-Chi Chen | Hen-Hsen Huang | Hsin-Hsi Chen
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Both the issues of data deficiencies and semantic consistency are important for data augmentation. Most of previous methods address the first issue, but ignore the second one. In the cases of aspect-based sentiment analysis, violation of the above issues may change the aspect and sentiment polarity. In this paper, we propose a semantics-preservation data augmentation approach by considering the importance of each word in a textual sequence according to the related aspects and sentiments. We then substitute the unimportant tokens with two replacement strategies without altering the aspect-level polarity. Our approach is evaluated on several publicly available sentiment analysis datasets and the real-world stock price/risk movement prediction scenarios. Experimental results show that our methodology achieves better performances in all datasets.

pdf bib
Financial Opinion Mining
Chung-Chi Chen | Hen-Hsen Huang | Hsin-Hsi Chen
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts

In this tutorial, we will show where we are and where we will be to those researchers interested in this topic. We divide this tutorial into three parts, including coarse-grained financial opinion mining, fine-grained financial opinion mining, and possible research directions. This tutorial starts by introducing the components in a financial opinion proposed in our research agenda and summarizes their related studies. We also highlight the task of mining customers’ opinions toward financial services in the FinTech industry, and compare them with usual opinions. Several potential research questions will be addressed. We hope the audiences of this tutorial will gain an overview of financial opinion mining and figure out their research directions.

2020

pdf bib
Proceedings of the Second Workshop on Financial Technology and Natural Language Processing
Chung-Chi Chen | Hen-Hsen Huang | Hiroya Takamura | Hsin-Hsi Chen
Proceedings of the Second Workshop on Financial Technology and Natural Language Processing

pdf bib
Heterogeneous Recycle Generation for Chinese Grammatical Error Correction
Charles Hinson | Hen-Hsen Huang | Hsin-Hsi Chen
Proceedings of the 28th International Conference on Computational Linguistics

Most recent works in the field of grammatical error correction (GEC) rely on neural machine translation-based models. Although these models boast impressive performance, they require a massive amount of data to properly train. Furthermore, NMT-based systems treat GEC purely as a translation task and overlook the editing aspect of it. In this work we propose a heterogeneous approach to Chinese GEC, composed of a NMT-based model, a sequence editing model, and a spell checker. Our methodology not only achieves a new state-of-the-art performance for Chinese GEC, but also does so without relying on data augmentation or GEC-specific architecture changes. We further experiment with all possible configurations of our system with respect to model composition order and number of rounds of correction. A detailed analysis of each model and their contributions to the correction process is performed by adapting the ERRANT scorer to be able to score Chinese sentences.

pdf bib
NTU_NLP at SemEval-2020 Task 12: Identifying Offensive Tweets Using Hierarchical Multi-Task Learning Approach
Po-Chun Chen | Hen-Hsen Huang | Hsin-Hsi Chen
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This paper presents our hierarchical multi-task learning (HMTL) and multi-task learning (MTL) approaches for improving the text encoder in Sub-tasks A, B, and C of Multilingual Offensive Language Identification in Social Media (SemEval-2020 Task 12). We show that using the MTL approach can greatly improve the performance of complex problems, i.e. Sub-tasks B and C. Coupled with a hierarchical approach, the performances are further improved. Overall, our best model, HMTL outperforms the baseline model by 3% and 2% of Macro F-score in Sub-tasks B and C of OffensEval 2020, respectively.

pdf bib
MPDD: A Multi-Party Dialogue Dataset for Analysis of Emotions and Interpersonal Relationships
Yi-Ting Chen | Hen-Hsen Huang | Hsin-Hsi Chen
Proceedings of the Twelfth Language Resources and Evaluation Conference

A dialogue dataset is an indispensable resource for building a dialogue system. Additional information like emotions and interpersonal relationships labeled on conversations enables the system to capture the emotion flow of the participants in the dialogue. However, there is no publicly available Chinese dialogue dataset with emotion and relation labels. In this paper, we collect the conversions from TV series scripts, and annotate emotion and interpersonal relationship labels on each utterance. This dataset contains 25,548 utterances from 4,142 dialogues. We also set up some experiments to observe the effects of the responded utterance on the current utterance, and the correlation between emotion and relation types in emotion and relation classification tasks.

pdf bib
Chinese Discourse Parsing: Model and Evaluation
Lin Chuan-An | Shyh-Shiun Hung | Hen-Hsen Huang | Hsin-Hsi Chen
Proceedings of the Twelfth Language Resources and Evaluation Conference

Chinese discourse parsing, which aims to identify the hierarchical relationships of Chinese elementary discourse units, has not yet a consistent evaluation metric. Although Parseval is commonly used, variations of evaluation differ from three aspects: micro vs. macro F1 scores, binary vs. multiway ground truth, and left-heavy vs. right-heavy binarization. In this paper, we first propose a neural network model that unifies a pre-trained transformer and CKY-like algorithm, and then compare it with the previous models with different evaluation scenarios. The experimental results show that our model outperforms the previous systems. We conclude that (1) the pre-trained context embedding provides effective solutions to deal with implicit semantics in Chinese texts, and (2) using multiway ground truth is helpful since different binarization approaches lead to significant differences in performance.

pdf bib
MSD-1030: A Well-built Multi-Sense Evaluation Dataset for Sense Representation Models
Ting-Yu Yen | Yang-Yin Lee | Yow-Ting Shiue | Hen-Hsen Huang | Hsin-Hsi Chen
Proceedings of the Twelfth Language Resources and Evaluation Conference

Sense embedding models handle polysemy by giving each distinct meaning of a word form a separate representation. They are considered improvements over word models, and their effectiveness is usually judged with benchmarks such as semantic similarity datasets. However, most of these datasets are not designed for evaluating sense embeddings. In this research, we show that there are at least six concerns about evaluating sense embeddings with existing benchmark datasets, including the large proportions of single-sense words and the unexpected inferior performance of several multi-sense models to their single-sense counterparts. These observations call into serious question whether evaluations based on these datasets can reflect the sense model’s ability to capture different meanings. To address the issues, we propose the Multi-Sense Dataset (MSD-1030), which contains a high ratio of multi-sense word pairs. A series of analyses and experiments show that MSD-1030 serves as a more reliable benchmark for sense embeddings. The dataset is available at http://nlg.csie.ntu.edu.tw/nlpresource/MSD-1030/.

pdf bib
Issues and Perspectives from 10,000 Annotated Financial Social Media Data
Chung-Chi Chen | Hen-Hsen Huang | Hsin-Hsi Chen
Proceedings of the Twelfth Language Resources and Evaluation Conference

In this paper, we investigate the annotation of financial social media data from several angles. We present Fin-SoMe, a dataset with 10,000 labeled financial tweets annotated by experts from both the front desk and the middle desk in a bank’s treasury. These annotated results reveal that (1) writer-labeled market sentiment may be a misleading label; (2) writer’s sentiment and market sentiment of an investor may be different; (3) most financial tweets provide unfounded analysis results; and (4) almost no investors write down the gain/loss results for their positions, which would otherwise greatly facilitate detailed evaluation of their performance. Based on these results, we address various open problems and suggest possible directions for future work on financial social media data. We also provide an experiment on the key snippet extraction task to compare the performance of using a general sentiment dictionary and using the domain-specific dictionary. The results echo our findings from the experts’ annotations.

pdf bib
A Complete Shift-Reduce Chinese Discourse Parser with Robust Dynamic Oracle
Shyh-Shiun Hung | Hen-Hsen Huang | Hsin-Hsi Chen
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

This work proposes a standalone, complete Chinese discourse parser for practical applications. We approach Chinese discourse parsing from a variety of aspects and improve the shift-reduce parser not only by integrating the pre-trained text encoder, but also by employing novel training strategies. We revise the dynamic-oracle procedure for training the shift-reduce parser, and apply unsupervised data augmentation to enhance rhetorical relation recognition. Experimental results show that our Chinese discourse parser achieves the state-of-the-art performance.

pdf bib
NTUNLPL at FinCausal 2020, Task 2:Improving Causality Detection Using Viterbi Decoder
Pei-Wei Kao | Chung-Chi Chen | Hen-Hsen Huang | Hsin-Hsi Chen
Proceedings of the 1st Joint Workshop on Financial Narrative Processing and MultiLing Financial Summarisation

In order to provide an explanation of machine learning models, causality detection attracts lots of attention in the artificial intelligence research community. In this paper, we explore the cause-effect detection in financial news and propose an approach, which combines the BIO scheme with the Viterbi decoder for addressing this challenge. Our approach is ranked the first in the official run of cause-effect detection (Task 2) of the FinCausal-2020 shared task. We not only report the implementation details and ablation analysis in this paper, but also publish our code for academic usage.

2019

pdf bib
Numeracy-600K: Learning Numeracy for Detecting Exaggerated Information in Market Comments
Chung-Chi Chen | Hen-Hsen Huang | Hiroya Takamura | Hsin-Hsi Chen
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

In this paper, we attempt to answer the question of whether neural network models can learn numeracy, which is the ability to predict the magnitude of a numeral at some specific position in a text description. A large benchmark dataset, called Numeracy-600K, is provided for the novel task. We explore several neural network models including CNN, GRU, BiGRU, CRNN, CNN-capsule, GRU-capsule, and BiGRU-capsule in the experiments. The results show that the BiGRU model gets the best micro-averaged F1 score of 80.16%, and the GRU-capsule model gets the best macro-averaged F1 score of 64.71%. Besides discussing the challenges through comprehensive experiments, we also present an important application scenario, i.e., detecting exaggerated information, for the task.

pdf bib
Lexicon Guided Attentive Neural Network Model for Argument Mining
Jian-Fu Lin | Kuo Yu Huang | Hen-Hsen Huang | Hsin-Hsi Chen
Proceedings of the 6th Workshop on Argument Mining

Identification of argumentative components is an important stage of argument mining. Lexicon information is reported as one of the most frequently used features in the argument mining research. In this paper, we propose a methodology to integrate lexicon information into a neural network model by attention mechanism. We conduct experiments on the UKP dataset, which is collected from heterogeneous sources and contains several text types, e.g., microblog, Wikipedia, and news. We explore lexicons from various application scenarios such as sentiment analysis and emotion detection. We also compare the experimental results of leveraging different lexicons.

pdf bib
Proceedings of the First Workshop on Financial Technology and Natural Language Processing
Chung-Chi Chen | Hen-Hsen Huang | Hiroya Takamura | Hsin-Hsi Chen
Proceedings of the First Workshop on Financial Technology and Natural Language Processing

2018

pdf bib
Disambiguating False-Alarm Hashtag Usages in Tweets for Irony Detection
Hen-Hsen Huang | Chiao-Chen Chen | Hsin-Hsi Chen
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

The reliability of self-labeled data is an important issue when the data are regarded as ground-truth for training and testing learning-based models. This paper addresses the issue of false-alarm hashtags in the self-labeled data for irony detection. We analyze the ambiguity of hashtag usages and propose a novel neural network-based model, which incorporates linguistic information from different aspects, to disambiguate the usage of three hashtags that are widely used to collect the training data for irony detection. Furthermore, we apply our model to prune the self-labeled training data. Experimental results show that the irony detection model trained on the less but cleaner training instances outperforms the models trained on all data.

pdf bib
GenSense: A Generalized Sense Retrofitting Model
Yang-Yin Lee | Ting-Yu Yen | Hen-Hsen Huang | Yow-Ting Shiue | Hsin-Hsi Chen
Proceedings of the 27th International Conference on Computational Linguistics

With the aid of recently proposed word embedding algorithms, the study of semantic similarity has progressed and advanced rapidly. However, many natural language processing tasks need sense level representation. To address this issue, some researches propose sense embedding learning algorithms. In this paper, we present a generalized model from existing sense retrofitting model. The generalization takes three major components: semantic relations between the senses, the relation strength and the semantic strength. In the experiment, we show that the generalized model can outperform previous approaches in three types of experiment: semantic relatedness, contextual word similarity and semantic difference.

pdf bib
Correcting Chinese Word Usage Errors for Learning Chinese as a Second Language
Yow-Ting Shiue | Hen-Hsen Huang | Hsin-Hsi Chen
Proceedings of the 27th International Conference on Computational Linguistics

With more and more people around the world learning Chinese as a second language, the need of Chinese error correction tools is increasing. In the HSK dynamic composition corpus, word usage error (WUE) is the most common error type. In this paper, we build a neural network model that considers both target erroneous token and context to generate a correction vector and compare it against a candidate vocabulary to propose suitable corrections. To deal with potential alternative corrections, the top five proposed candidates are judged by native Chinese speakers. For more than 91% of the cases, our system can propose at least one acceptable correction within a list of five candidates. To the best of our knowledge, this is the first research addressing general-type Chinese WUE correction. Our system can help non-native Chinese learners revise their sentences by themselves.

pdf bib
A Unified RvNN Framework for End-to-End Chinese Discourse Parsing
Lin Chuan-An | Hen-Hsen Huang | Zi-Yuan Chen | Hsin-Hsi Chen
Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations

This paper demonstrates an end-to-end Chinese discourse parser. We propose a unified framework based on recursive neural network (RvNN) to jointly model the subtasks including elementary discourse unit (EDU) segmentation, tree structure construction, center labeling, and sense labeling. Experimental results show our parser achieves the state-of-the-art performance in the Chinese Discourse Treebank (CDTB) dataset. We release the source code with a pre-trained model for the NLP community. To the best of our knowledge, this is the first open source toolkit for Chinese discourse parsing. The standalone toolkit can be integrated into subsequent applications without the need of external resources such as syntactic parser.

pdf bib
A Chinese Writing Correction System for Learning Chinese as a Foreign Language
Yow-Ting Shiue | Hen-Hsen Huang | Hsin-Hsi Chen
Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations

We present a Chinese writing correction system for learning Chinese as a foreign language. The system takes a wrong input sentence and generates several correction suggestions. It also retrieves example Chinese sentences with English translations, helping users understand the correct usages of certain grammar patterns. This is the first available Chinese writing error correction system based on the neural machine translation framework. We discuss several design choices and show empirical results to support our decisions.

pdf bib
Transfer of Frames from English FrameNet to Construct Chinese FrameNet: A Bilingual Corpus-Based Approach
Tsung-Han Yang | Hen-Hsen Huang | An-Zi Yen | Hsin-Hsi Chen
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Learning to Map Natural Language Statements into Knowledge Base Representations for Knowledge Base Construction
Chin-Ho Lin | Hen-Hsen Huang | Hsin-Hsi Chen
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Proceedings of the 5th Workshop on Natural Language Processing Techniques for Educational Applications
Yuen-Hsien Tseng | Hsin-Hsi Chen | Vincent Ng | Mamoru Komachi
Proceedings of the 5th Workshop on Natural Language Processing Techniques for Educational Applications

pdf bib
NTU NLP Lab System at SemEval-2018 Task 10: Verifying Semantic Differences by Integrating Distributional Information and Expert Knowledge
Yow-Ting Shiue | Hen-Hsen Huang | Hsin-Hsi Chen
Proceedings of the 12th International Workshop on Semantic Evaluation

This paper presents the NTU NLP Lab system for the SemEval-2018 Capturing Discriminative Attributes task. Word embeddings, pointwise mutual information (PMI), ConceptNet edges and shortest path lengths are utilized as input features to build binary classifiers to tell whether an attribute is discriminative for a pair of concepts. Our neural network model reaches about 73% F1 score on the test set and ranks the 3rd in the task. Though the attributes to deal with in this task are all visual, our models are not provided with any image data. The results indicate that visual information can be derived from textual data.

2017

pdf bib
NLG301 at SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs and News
Chung-Chi Chen | Hen-Hsen Huang | Hsin-Hsi Chen
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

Short length, multi-targets, target relation-ship, monetary expressions, and outside reference are characteristics of financial tweets. This paper proposes methods to extract target spans from a tweet and its referencing web page. Total 15 publicly available sentiment dictionaries and one sentiment dictionary constructed from training set, containing sentiment scores in binary or real numbers, are used to compute the sentiment scores of text spans. Moreover, the correlation coeffi-cients of the price return between any two stocks are learned with the price data from Bloomberg. They are used to capture the relationships between the interesting tar-get and other stocks mentioned in a tweet. The best result of our method in both sub-task are 56.68% and 55.43%, evaluated by evaluation method 2.

pdf bib
NTU-1 at SemEval-2017 Task 12: Detection and classification of temporal events in clinical data with domain adaptation
Po-Yu Huang | Hen-Hsen Huang | Yu-Wun Wang | Ching Huang | Hsin-Hsi Chen
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

This study proposes a system to participate in the Clinical TempEval 2017 shared task, a part of the SemEval 2017 Tasks. Domain adaptation was the main challenge this year. We took part in the supervised domain adaption where data of 591 records of colon cancer patients and 30 records of brain cancer patients from Mayo clinic were given and we are asked to analyze the records from brain cancer patients. Based on the THYME corpus released by the organizer of Clinical TempEval, we propose a framework that automatically analyzes clinical temporal events in a fine-grained level. Support vector machine (SVM) and conditional random field (CRF) were implemented in our system for different subtasks, including detecting clinical relevant events and time expression, determining their attributes, and identifying their relations with each other within the document. The results showed the capability of domain adaptation of our system.

pdf bib
Integrating Subject, Type, and Property Identification for Simple Question Answering over Knowledge Base
Wei-Chuan Hsiao | Hen-Hsen Huang | Hsin-Hsi Chen
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

This paper presents an approach to identify subject, type and property from knowledge base (KB) for answering simple questions. We propose new features to rank entity candidates in KB. Besides, we split a relation in KB into type and property. Each of them is modeled by a bi-directional LSTM. Experimental results show that our model achieves the state-of-the-art performance on the SimpleQuestions dataset. The hard questions in the experiments are also analyzed in detail.

pdf bib
Proceedings of the 4th Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA 2017)
Yuen-Hsien Tseng | Hsin-Hsi Chen | Lung-Hao Lee | Liang-Chih Yu
Proceedings of the 4th Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA 2017)

pdf bib
Detection of Chinese Word Usage Errors for Non-Native Chinese Learners with Bidirectional LSTM
Yow-Ting Shiue | Hen-Hsen Huang | Hsin-Hsi Chen
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Selecting appropriate words to compose a sentence is one common problem faced by non-native Chinese learners. In this paper, we propose (bidirectional) LSTM sequence labeling models and explore various features to detect word usage errors in Chinese sentences. By combining CWINDOW word embedding features and POS information, the best bidirectional LSTM model achieves accuracy 0.5138 and MRR 0.6789 on the HSK dataset. For 80.79% of the test data, the model ranks the ground-truth within the top two at position level.

2016

pdf bib
Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA2016)
Hsin-Hsi Chen | Yuen-Hsien Tseng | Vincent Ng | Xiaofei Lu
Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA2016)

pdf bib
Detecting Word Usage Errors in Chinese Sentences for Learning Chinese as a Foreign Language
Yow-Ting Shiue | Hsin-Hsi Chen
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Automated grammatical error detection, which helps users improve their writing, is an important application in NLP. Recently more and more people are learning Chinese, and an automated error detection system can be helpful for the learners. This paper proposes n-gram features, dependency count features, dependency bigram features, and single-character features to determine if a Chinese sentence contains word usage errors, in which a word is written as a wrong form or the word selection is inappropriate. With marking potential errors on the level of sentence segments, typically delimited by punctuation marks, the learner can try to correct the problems without the assistant of a language teacher. Experiments on the HSK corpus show that the classifier combining all sets of features achieves an accuracy of 0.8423. By utilizing certain combination of the sets of features, we can construct a system that favors precision or recall. The best precision we achieve is 0.9536, indicating that our system is reliable and seldom produces misleading results.

pdf bib
Fine-Grained Chinese Discourse Relation Labelling
Huan-Yuan Chen | Wan-Shan Liao | Hen-Hsen Huang | Hsin-Hsi Chen
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper explores several aspects together for a fine-grained Chinese discourse analysis. We deal with the issues of ambiguous discourse markers, ambiguous marker linkings, and more than one discourse marker. A universal feature representation is proposed. The pair-once postulation, cross-discourse-unit-first rule and word-pair-marker-first rule select a set of discourse markers from ambiguous linkings. Marker-Sum feature considers total contribution of markers and Marker-Preference feature captures the probability distribution of discourse functions of a representative marker by using preference rule. The HIT Chinese discourse relation treebank (HIT-CDTB) is used to evaluate the proposed models. The 25-way classifier achieves 0.57 micro-averaged F-score.

pdf bib
Subtask Mining from Search Query Logs for How-Knowledge Acceleration
Chung-Lun Kuo | Hsin-Hsi Chen
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

How-knowledge is indispensable in daily life, but has relatively less quantity and poorer quality than what-knowledge in publicly available knowledge bases. This paper first extracts task-subtask pairs from wikiHow, then mines linguistic patterns from search query logs, and finally applies the mined patterns to extract subtasks to complete given how-to tasks. To evaluate the proposed methodology, we group tasks and the corresponding recommended subtasks into pairs, and evaluate the results automatically and manually. The automatic evaluation shows the accuracy of 0.4494. We also classify the mined patterns based on prepositions and find that the prepositions like “on”, “to”, and “with” have the better performance. The results can be used to accelerate how-knowledge base construction.

pdf bib
Chinese Preposition Selection for Grammatical Error Diagnosis
Hen-Hsen Huang | Yen-Chi Shao | Hsin-Hsi Chen
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Misuse of Chinese prepositions is one of common word usage errors in grammatical error diagnosis. In this paper, we adopt the Chinese Gigaword corpus and HSK corpus as L1 and L2 corpora, respectively. We explore gated recurrent neural network model (GRU), and an ensemble of GRU model and maximum entropy language model (GRU-ME) to select the best preposition from 43 candidates for each test sentence. The experimental results show the advantage of the GRU models over simple RNN and n-gram models. We further analyze the effectiveness of linguistic information such as word boundary and part-of-speech tag in this task.

pdf bib
Detection, Disambiguation and Argument Identification of Discourse Connectives in Chinese Discourse Parsing
Yong-Siang Shih | Hsin-Hsi Chen
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

In this paper, we investigate four important issues together for explicit discourse relation labelling in Chinese texts: (1) discourse connective extraction, (2) linking ambiguity resolution, (3) relation type disambiguation, and (4) argument boundary identification. In a pipelined Chinese discourse parser, we identify potential connective candidates by string matching, eliminate non-discourse usages from them with a binary classifier, resolve linking ambiguities among connective components by ranking, disambiguate relation types by a multiway classifier, and determine the argument boundaries by conditional random fields. The experiments on Chinese Discourse Treebank show that the F1 scores of 0.7506, 0.7693, 0.7458, and 0.3134 are achieved for discourse usage disambiguation, linking disambiguation, relation type disambiguation, and argument boundary identification, respectively, in a pipelined Chinese discourse parser.

pdf bib
Chinese Tense Labelling and Causal Analysis
Hen-Hsen Huang | Chang-Rui Yang | Hsin-Hsi Chen
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

This paper explores the role of tense information in Chinese causal analysis. Both tasks of causal type classification and causal directionality identification are experimented to show the significant improvement gained from tense features. To automatically extract the tense features, a Chinese tense predictor is proposed. Based on large amount of parallel data, our semi-supervised approach improves the dependency-based convolutional neural network (DCNN) models for Chinese tense labelling and thus the causal analysis.

pdf bib
NL2KB: Resolving Vocabulary Gap between Natural Language and Knowledge Base in Knowledge Base Construction and Retrieval
Sheng-Lun Wei | Yen-Pin Chiu | Hen-Hsen Huang | Hsin-Hsi Chen
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations

Words to express relations in natural language (NL) statements may be different from those to represent properties in knowledge bases (KB). The vocabulary gap becomes barriers for knowledge base construction and retrieval. With the demo system called NL2KB in this paper, users can browse which properties in KB side may be mapped to for a given relational pattern in NL side. Besides, they can retrieve the sets of relational patterns in NL side for a given property in KB side. We describe how the mapping is established in detail. Although the mined patterns are used for Chinese knowledge base applications, the methodology can be extended to other languages.

pdf bib
Implicit Polarity and Implicit Aspect Recognition in Opinion Mining
Huan-Yuan Chen | Hsin-Hsi Chen
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2015

pdf bib
Introduction to SIGHAN 2015 Bake-off for Chinese Spelling Check
Yuen-Hsien Tseng | Lung-Hao Lee | Li-Ping Chang | Hsin-Hsi Chen
Proceedings of the Eighth SIGHAN Workshop on Chinese Language Processing

pdf bib
Proceedings of the 2nd Workshop on Natural Language Processing Techniques for Educational Applications
Hsin-Hsi Chen | Yuen-Hsien Tseng | Yuji Matsumoto | Lung Hsiang Wong
Proceedings of the 2nd Workshop on Natural Language Processing Techniques for Educational Applications

pdf bib
Proceedings of ACL-IJCNLP 2015 System Demonstrations
Hsin-Hsi Chen | Katja Markert
Proceedings of ACL-IJCNLP 2015 System Demonstrations

2014

pdf bib
Sentence Rephrasing for Parsing Sentences with OOV Words
Hen-Hsen Huang | Huan-Yuan Chen | Chang-Sheng Yu | Hsin-Hsi Chen | Po-Ching Lee | Chun-Hsun Chen
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper addresses the problems of out-of-vocabulary (OOV) words, named entities in particular, in dependency parsing. The OOV words, whose word forms are unknown to the learning-based parser, in a sentence may decrease the parsing performance. To deal with this problem, we propose a sentence rephrasing approach to replace each OOV word in a sentence with a popular word of the same named entity type in the training set, so that the knowledge of the word forms can be used for parsing. The highest-frequency-based rephrasing strategy and the information-retrieval-based rephrasing strategy are explored to select the word to replace, and the Chinese Treebank 6.0 (CTB6) corpus is adopted to evaluate the feasibility of the proposed sentence rephrasing strategies. Experimental results show that rephrasing some specific types of OOV words such as Corporation, Organization, and Competition increases the parsing performances. This methodology can be applied to domain adaptation to deal with OOV problems.

pdf bib
Overview of SIGHAN 2014 Bake-off for Chinese Spelling Check
Liang-Chih Yu | Lung-Hao Lee | Yuen-Hsien Tseng | Hsin-Hsi Chen
Proceedings of the Third CIPS-SIGHAN Joint Conference on Chinese Language Processing

pdf bib
Leveraging Effective Query Modeling Techniques for Speech Recognition and Summarization
Kuan-Yu Chen | Shih-Hung Liu | Berlin Chen | Ea-Ee Jan | Hsin-Min Wang | Wen-Lian Hsu | Hsin-Hsi Chen
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
FAdR: A System for Recognizing False Online Advertisements
Yi-jie Tang | Hsin-Hsi Chen
Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations

pdf bib
Chinese Word Ordering Errors Detection and Correction for Non-Native Chinese Language Learners
Shuk-Man Cheng | Chi-Hsin Yu | Hsin-Hsi Chen
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf bib
Interpretation of Chinese Discourse Connectives for Explicit Discourse Relation Recognition
Hen-Hsen Huang | Tai-Wei Chang | Huan-Yuan Chen | Hsin-Hsi Chen
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf bib
Chinese Irony Corpus Construction and Ironic Structure Analysis
Yi-jie Tang | Hsin-Hsi Chen
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf bib
A Sentence Judgment System for Grammatical Error Detection
Lung-Hao Lee | Liang-Chih Yu | Kuei-Ching Lee | Yuen-Hsien Tseng | Li-Ping Chang | Hsin-Hsi Chen
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: System Demonstrations

pdf bib
Chinese Open Relation Extraction for Knowledge Acquisition
Yuen-Hsien Tseng | Lung-Hao Lee | Shu-Yen Lin | Bo-Shun Liao | Mei-Jun Liu | Hsin-Hsi Chen | Oren Etzioni | Anthony Fader
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers

pdf bib
Modeling Human Inference Process for Textual Entailment Recognition
Hen-Hsen Huang | Kai-Chun Chang | Hsin-Hsi Chen
International Journal of Computational Linguistics & Chinese Language Processing, Volume 19, Number 3, September 2014

2013

pdf bib
Modeling Human Inference Process for Textual Entailment Recognition
Hen-Hsen Huang | Kai-Chun Chang | Hsin-Hsi Chen
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Analyses of the Association between Discourse Relation and Sentiment Polarity with a Chinese Human-Annotated Corpus
Hen-Hsen Huang | Chi-Hsin Yu | Tai-Wei Chang | Cong-Kai Lin | Hsin-Hsi Chen
Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse

pdf bib
Uses of Monolingual In-Domain Corpora for Cross-Domain Adaptation with Hybrid MT Approaches
An-Chang Hsieh | Hen-Hsen Huang | Hsin-Hsi Chen
Proceedings of the Second Workshop on Hybrid Approaches to Translation

pdf bib
A Study of Language Modeling for Chinese Spelling Check
Kuan-Yu Chen | Hung-Shin Lee | Chung-Han Lee | Hsin-Min Wang | Hsin-Hsi Chen
Proceedings of the Seventh SIGHAN Workshop on Chinese Language Processing

2012

pdf bib
領域相關詞彙極性分析及文件情緒分類之研究 (Domain Dependent Word Polarity Analysis for Sentiment Classification) [In Chinese]
Ho-Cheng Yu | Ting-Hao Huang | Hsin-Hsi Chen
Proceedings of the 24th Conference on Computational Linguistics and Speech Processing (ROCLING 2012)

pdf bib
廣義知網詞彙意見極性的預測 (Predicting the Semantic Orientation of Terms in E-HowNet) [In Chinese]
Cheng-Ru Li | Chi-Hsin Yu | Hsin-Hsi Chen
International Journal of Computational Linguistics & Chinese Language Processing, Volume 17, Number 2, June 2012—Special Issue on Selected Papers from ROCLING XXIII

pdf bib
領域相關詞彙極性分析及文件情緒分類之研究 (Domain Dependent Word Polarity Analysis for Sentiment Classification) [In Chinese]
Ho-Cheng Yu | Ting-Hao Kenneth Huang | Hsin-Hsi Chen
International Journal of Computational Linguistics & Chinese Language Processing, Volume 17, Number 4, December 2012-Special Issue on Selected Papers from ROCLING XXIV

pdf bib
Contingency and Comparison Relation Labeling and Structure Prediction in Chinese Sentences
Hen-Hsen Huang | Hsin-Hsi Chen
Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue

pdf bib
A Simplification-Translation-Restoration Framework for Cross-Domain SMT Applications
Han-Bin Chen | Hen-Hsen Huang | Hsin-Hsi Chen | Ching-Ting Tan
Proceedings of COLING 2012

pdf bib
Detecting Word Ordering Errors in Chinese Sentences for Learning Chinese as a Foreign Language
Chi-Hsin Yu | Hsin-Hsi Chen
Proceedings of COLING 2012

pdf bib
Advertising Legality Recognition
Yi-jie Tang | Cong-kai Lin | Hsin-Hsi Chen
Proceedings of COLING 2012: Posters

pdf bib
An Annotation System for Development of Chinese Discourse Corpus
Hen-Hsen Huang | Hsin-Hsi Chen
Proceedings of COLING 2012: Demonstration Papers

pdf bib
Modeling Pollyanna Phenomena in Chinese Sentiment Analysis
Ting-Hao Huang | Ho-Cheng Yu | Hsin-Hsi Chen
Proceedings of COLING 2012: Demonstration Papers

pdf bib
Chinese Web Scale Linguistic Datasets and Toolkit
Chi-Hsin Yu | Hsin-Hsi Chen
Proceedings of COLING 2012: Demonstration Papers

pdf bib
Mining Sentiment Words from Microblogs for Predicting Writer-Reader Emotion Transition
Yi-jie Tang | Hsin-Hsi Chen
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The conversations between posters and repliers in microblogs form a valuable writer-reader emotion corpus. This paper adopts a log relative frequency ratio to investigate the linguistic features which affect emotion transitions, and applies the results to predict writers' and readers' emotions. A 4-class emotion transition predictor, a 2-class writer emotion predictor, and a 2-class reader emotion predictor are proposed and compared.

pdf bib
NTUSocialRec: An Evaluation Dataset Constructed from Microblogs for Recommendation Applications in Social Networks
Chieh-Jen Wang | Shuk-Man Cheng | Lung-Hao Lee | Hsin-Hsi Chen | Wen-shen Liu | Pei-Wen Huang | Shih-Peng Lin
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper proposes a method to construct an evaluation dataset from microblogs for the development of recommendation systems. We extract the relationships among three main entities in a recommendation event, i.e., who recommends what to whom. User-to-user friend relationships and user-to-resource interesting relationships in social media and resource-to-metadata descriptions in an external ontology are employed. In the experiments, the resources are restricted to visual entertainment media, movies in particular. A sequence of ground truths varying with time is generated. That reflects the dynamic of real world.

pdf bib
Development of a Web-Scale Chinese Word N-gram Corpus with Parts of Speech Information
Chi-Hsin Yu | Yi-jie Tang | Hsin-Hsi Chen
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Web provides a large-scale corpus for researchers to study the language usages in real world. Developing a web-scale corpus needs not only a lot of computation resources, but also great efforts to handle the large variations in the web texts, such as character encoding in processing Chinese web texts. In this paper, we aim to develop a web-scale Chinese word N-gram corpus with parts of speech information called NTU PN-Gram corpus using the ClueWeb09 dataset. We focus on the character encoding and some Chinese-specific issues. The statistics about the dataset is reported. We will make the resulting corpus a public available resource to boost the Chinese language processing.

2011

pdf bib
Emotion Modeling from Writer/Reader Perspectives Using a Microblog Dataset
Yi-jie Tang | Hsin-Hsi Chen
Proceedings of the Workshop on Sentiment Analysis where AI meets Psychology (SAAIP 2011)

pdf bib
Predicting Opinion Dependency Relations for Opinion Analysis
Lun-Wei Ku | Ting-Hao Huang | Hsin-Hsi Chen
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
Chinese Discourse Relation Recognition
Hen-Hsen Huang | Hsin-Hsi Chen
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
Identification and Translation of Significant Patterns for Cross-Domain SMT Applications
Han-Bin Chen | Hen-Hsen Huang | Jengwei Tjiu | Ching-Ting Tan | Hsin-Hsi Chen
Proceedings of Machine Translation Summit XIII: Papers

pdf bib
Pause and Stop Labeling for Chinese Sentence Boundary Detection
Hen-Hsen Huang | Hsin-Hsi Chen
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011

pdf bib
廣義知網詞彙意見極性的預測 (Predicting the Semantic Orientation of Terms in E-HowNet) [In Chinese]
Cheng-Ru Li | Chi-Hsin Yu | Hsin-Hsi Chen
Proceedings of the 23rd Conference on Computational Linguistics and Speech Processing (ROCLING 2011)

pdf bib
Intent Shift Detection Using Search Query Logs
Chieh-Jen Wang | Hsin-Hsi Chen
International Journal of Computational Linguistics & Chinese Language Processing, Volume 16, Number 3-4, September/December 2011

2010

pdf bib
Classical Chinese Sentence Segmentation
Hen-Hsen Huang | Chuen-Tsai Sun | Hsin-Hsi Chen
CIPS-SIGHAN Joint Conference on Chinese Language Processing

pdf bib
Comment Extraction from Blog Posts and Its Applications to Opinion Mining
Huan-An Kao | Hsin-Hsi Chen
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Blog posts containing many personal experiences or perspectives toward specific subjects are useful. Blogs allow readers to interact with bloggers by placing comments on specific blog posts. The comments carry viewpoints of readers toward the targets described in the post, or supportive/non-supportive attitude toward the post. Comment extraction is challenging due to that there does not exist a unique template among all blog service providers. This paper proposes methods to deal with this problem. Firstly, the repetitive patterns and their corresponding blocks are extracted from input posts by pattern identification algorithm. Secondly, three filtering strategies, i.e., tag pattern loop filtering, rule overlap filtering, and longest rule first, are used to remove non-comment blocks. Finally, a comment/non-comment classifier is learned to distinguish comment blocks from non-comment blocks with 14 block-level features and 5 rule-level features. In the experiments, we randomly select 600 blog posts from 12 blog service providers. F-measure, recall, and precision are 0.801, 0.855, and 0.780, respectively, by using all of the three filtering strategies together with some selected features. The application of comment extraction to blog mining is also illustrated. We show how to identify the relevant opinionated objects ― say, opinion holders, opinions, and targets, from posts.

pdf bib
Construction of a Chinese Opinion Treebank
Lun-Wei Ku | Ting-Hao Huang | Hsin-Hsi Chen
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In this paper, we base on the syntactic structural Chinese Treebank corpus, construct the Chinese Opinon Treebank for the research of opinion analysis. We introduce the tagging scheme and develop a tagging tool for constructing this corpus. Annotated samples are described. Information including opinions (yes or no), their polarities (positive, neutral or negative), types (expression, status, or action), is defined and annotated. In addition, five structure trios are introduced according to the linguistic relations between two Chinese words. Four of them that are possibly related to opinions are also annotated in the constructed corpus to provide the linguistic cues. The number of opinion sentences together with the number of their polarities, opinion types, and trio types are calculated. These statistics are compared and discussed. To know the quality of the annotations in this corpus, the kappa values of the annotations are calculated. The substantial agreement between annotations ensures the applicability and reliability of the constructed corpus.

pdf bib
Predicting Morphological Types of Chinese Bi-Character Words by Machine Learning Approaches
Ting-Hao Huang | Lun-Wei Ku | Hsin-Hsi Chen
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper presented an overview of Chinese bi-character words’ morphological types, and proposed a set of features for machine learning approaches to predict these types based on composite characters’ information. First, eight morphological types were defined, and 6,500 Chinese bi-character words were annotated with these types. After pre-processing, 6,178 words were selected to construct a corpus named Reduced Set. We analyzed Reduced Set and conducted the inter-annotator agreement test. The average kappa value of 0.67 indicates a substantial agreement. Second, Bi-character words’ morphological types are considered strongly related with the composite characters’ parts of speech in this paper, so we proposed a set of features which can simply be extracted from dictionaries to indicate the characters’ “tendency” of parts of speech. Finally, we used these features and adopted three machine learning algorithms, SVM, CRF, and Naïve Bayes, to predict the morphological types. On the average, the best algorithm CRF achieved 75% of the annotators’ performance.

2009

pdf bib
Using Morphological and Syntactic Structures for Chinese Opinion Analysis
Lun-Wei Ku | Ting-Hao Huang | Hsin-Hsi Chen
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
意見持有者辨識之研究 (A Study on Identification of Opinion Holders) [In Chinese]
Chia-Ying Lee | Lun-Wei Ku | Hsin-Hsi Chen
Proceedings of the 21st Conference on Computational Linguistics and Speech Processing

pdf bib
Identification of Opinion Holders
Lun-Wei Ku | Chia-Ying Lee | Hsin-Hsi Chen
International Journal of Computational Linguistics & Chinese Language Processing, Volume 14, Number 4, December 2009

2008

pdf bib
Question Analysis and Answer Passage Retrieval for Opinion Question Answering Systems
Lun-Wei Ku | Yu-Ting Liang | Hsin-Hsi Chen
International Journal of Computational Linguistics & Chinese Language Processing, Volume 13, Number 3, September 2008: Special Issue on Selected Papers from ROCLING XIX

pdf bib
Analysis of Intention in Dialogues Using Category Trees and Its Application to Advertisement Recommendation
Hung-Chi Huang | Hsin-Hsi Chen | Ming-Shun Lin
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-II

pdf bib
Event Detection and Summarization in Weblogs with Temporal Collocations
Chun-Yuan Teng | Hsin-Hsi Chen
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper deals with the relationship between weblog content and time. With the proposed temporal mutual information, we analyze the collocations in time dimension, and the interesting collocations related to special events. The temporal mutual information is employed to observe the strength of term-to-term associations over time. An event detection algorithm identifies the collocations that may cause an event in a specific timestamp. An event summarization algorithm retrieves a set of collocations which describe an event. We compare our approach with the approach without considering the time interval. The experimental results demonstrate that the temporal collocations capture the real world semantics and real world events over time.

pdf bib
Ranking Reader Emotions Using Pairwise Loss Minimization and Emotional Distribution Regression
Kevin Hsin-Yih Lin | Hsin-Hsi Chen
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

2007

pdf bib
Test Collection Selection and Gold Standard Generation for a Multiply-Annotated Opinion Corpus
Lun-Wei Ku | Yong-Sheng Lo | Hsin-Hsi Chen
Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions

pdf bib
Building Emotion Lexicon from Weblog Corpora
Changhua Yang | Kevin Hsin-Yih Lin | Hsin-Hsi Chen
Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions

pdf bib
Question Analysis and Answer Passage Retrieval for Opinion Question Answering Systems
Lun-Wei Ku | Yu-Ting Liang | Hsin-Hsi Chen
Proceedings of the 19th Conference on Computational Linguistics and Speech Processing

pdf bib
以部落格語料進行情緒趨勢分析 (Emotion Trend Analysis Using Blog Corpora) [In Chinese]
Chang-Hua Yang | Hung-An Kuo | Hsin-Hsi Chen
Proceedings of the 19th Conference on Computational Linguistics and Speech Processing

2006

pdf bib
Constructing a Named Entity Ontology from Web Corpora
Ming-Shun Lin | Hsin-Hsi Chen
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This paper proposes a named entity (NE) ontology generation engine, called XNE-Tree engine, which produces relational named entities by given a seed. The engine incrementally extracts high co-occurring named entities with the seed by using a common search engine. In each iterative step, the seed will be replaced by its siblings or descendants, which form new seeds. In this way, XNE-Tree engine will build a tree structure with the original seed as a root incrementally. Two seeds, Chinese transliteration names of Nicole Kidman (a famous actress) and Ernest Hemingway (a famous writer), are experimented to evaluate the performance of the XNE-Tree.¡@¡@For test the applicability of the ontology, we employ it to a phoneme-character conversion system, which convert input phoneme syllable sequences to text strings. Total 100 Chinese transliteration names, including 50 person names and 50 location names are used as test data. We derive an ontology composed of 7,642 named entities. The results of phoneme-character conversion show that both the recall rate and the MRR are improved from 0.79 and 0.50 to 0.84 to 0.55, respectively.

pdf bib
Tagging Heterogeneous Evaluation Corpora for Opinionated Tasks
Lun-Wei Ku | Yu-Ting Liang | Hsin-Hsi Chen
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

Opinion retrieval aims to tell if a document is positive, neutral or negative on a given topic. Opinion extraction further identifies the supportive and the non-supportive evidence of a document. To evaluate the performance of technologies for opinionated tasks, a suitable corpus is necessary. This paper defines the annotations for opinionated materials. Heterogeneous experimental materials are annotated, and the agreements among annotators are analyzed. How human can monitor opinions of the whole is also examined. The corpus can be employed to opinion extraction, opinion summarization, opinion tracking and opinionated question answering.

pdf bib
Novel Association Measures Using Web Search with Double Checking
Hsin-Hsi Chen | Ming-Shun Lin | Yu-Chuan Wei
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf bib
A High-Accurate Chinese-English NE Backward Translation System Combining Both Lexical Information and Web Statistics
Conrad Chen | Hsin-Hsi Chen
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

pdf bib
以部落格文本進行情緒分類之研究 (A Study of Emotion Classification Using Blog Articles) [In Chinese]
Chang-Hua Yang | Hsin-Hsi Chen
Proceedings of the 18th Conference on Computational Linguistics and Speech Processing

pdf bib
An Approach to Using the Web as a Live Corpus for Spoken Transliteration Name Access
Ming-Shun Lin | Chia-Ping Chen | Hsin-Hsi Chen
International Journal of Computational Linguistics & Chinese Language Processing, Volume 11, Number 3, September 2006: Special Issue on Selected Papers from ROCLING XVII

pdf bib
Classifying Biological Full-Text Articles for Multi-Database Curation
Wen-Juan Hou | Chih Lee | Hsin-Hsi Chen
Demonstrations

2005

pdf bib
Integrating Punctuation Rules and Naïve Bayesian Model for Chinese Creation Title Recognition
Conrad Chen | Hsin-Hsi Chen
Second International Joint Conference on Natural Language Processing: Full Papers

pdf bib
An Approach of Using the Web as a Live Corpus for Spoken Transliteration Name Access
Ming-Shun Lin | Chia-Ping Chen | Hsin-Hsi Chen
Proceedings of the 17th Conference on Computational Linguistics and Speech Processing

2004

pdf bib
語料庫統計值與全球資訊網統計值之比較:以中文斷詞應用為例 (Comparison of Corpus Statistics and Web Statistics: An Application to Chinese Word Segmentation) [In Chinese]
Hsiao-Ching Lin | Hsin-Hsi Chen
Proceedings of the 16th Conference on Computational Linguistics and Speech Processing

pdf bib
以語法分析為輔建立新聞名詞知識庫 (Construction of Knowledge Base for News Names by Applying Syntactic Rules) [In Chinese]
Chang-Hua Yang | Hsin-Hsi Chen
Proceedings of the 16th Conference on Computational Linguistics and Speech Processing

pdf bib
Event Clustering on Streaming News Using Co-Reference Chains and Event Words
June-Jei Kuo | Hsin-Hsi Chen
Proceedings of the Conference on Reference Resolution and Its Applications

pdf bib
Support Vector Machine Approach to Extracting Gene References into Function from Biological Documents
Chih Lee | Wen-Juan Hou | Hsin-Hsi Chen
Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA/BioNLP)

pdf bib
Annotating Multiple Types of Biomedical Entities: A Single Word Classification Approach
Chih Lee | Wen-Juan Hou | Hsin-Hsi Chen
Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA/BioNLP)

pdf bib
Collocation Extraction Using Web Statistics
Hsin-Hsi Chen | Yi-Cheng Yu | Chih-Long Lin
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
Pattern Discovery in Named Organization Corpus
Hsin-Hsi Chen | Yi-Lin Chu
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

2003

pdf bib
以網際網路內容為基礎之問答系統 “Why” 問句研究 (The Study of Why Questions in Web-based Question-Answering Systems) [In Chinese]
Tean-Zuo Shen | Chuan-Jie Lin | Hsin-Hsi Chen
Proceedings of Research on Computational Linguistics Conference XV

pdf bib
Enhancing Performance of Protein Name Recognizers Using Collocation
Wen-Juan Hou | Hsin-Hsi Chen
Proceedings of the ACL 2003 Workshop on Natural Language Processing in Biomedicine

pdf bib
Learning Formulation and Transformation Rules for Multilingual Named Entities
Hsin-Hsi Chen | Changhua Yang | Ying Lin
Proceedings of the ACL 2003 Workshop on Multilingual and Mixed-language Named Entity Recognition

2002

pdf bib
Backward Machine Transliteration by Learning Phonetic Similarity
Wei-Hao Lin | Hsin-Hsi Chen
COLING-02: The 6th Conference on Natural Language Learning 2002 (CoNLL-2002)

pdf bib
NLP and IR Approaches to Monolingual and Multilingual Link Detection
Ying-Ju Chen | Hsin-Hsi Chen
COLING 2002: The 19th International Conference on Computational Linguistics

2001

pdf bib
簡易影片字幕文字辨識法及其詢答應用 (A Simple Method for Video OCR and Its Application on Question Answering) [In Chinese]
Chuan-Jie Lin | Che-Chia Liu | Hsin-Hsi Chen
Proceedings of Research on Computational Linguistics Conference XIV

pdf bib
A Simple Method for Chinese Video OCR and Its Application to Question Answering
Chuan-Jie Lin | Che-Chia Liu | Hsin-Hsi Chen
International Journal of Computational Linguistics & Chinese Language Processing, Volume 6, Number 2, August 2001

2000

pdf bib
Sense-Tagging Chinese Corpus
Hsin-Hsi Chen | Chi-Ching Lin
Second Chinese Language Processing Workshop

pdf bib
A Muitilingual News Summarizer
Hsin-Hsi Chen | Chuan-Jie Lin
COLING 2000 Volume 1: The 18th International Conference on Computational Linguistics

pdf bib
Mining Tables from Large Scale HTML Texts
Hsin-Hsi Chen | Shih-Chung Tsai | Jin-He Tsai
COLING 2000 Volume 1: The 18th International Conference on Computational Linguistics

pdf bib
反向異文字音譯相似度評量方法與跨語言資訊檢索 (Similarity Measure in Backward Transliteration between Different Character Sets and Its Application to CLIR) [In Chinese]
Wei-Hao Lin | Hsin-Hsi Chen
Proceedings of Research on Computational Linguistics Conference XIII

1999

pdf bib
Resolving Translation Ambiguity and Target Polysemy in Cross-Language Information Retrieval
Hsin-Hsi Chen | Guo-Wei Bian | Wen-Cheng Lin
Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics

bib
Proceedings of Research on Computational Linguistics Conference XII
Sin-Horng Chen | Hsin-Hsi Chen
Proceedings of Research on Computational Linguistics Conference XII

pdf bib
A Mandarin to Taiwanese Min Nan Machine Translation System with Speech Synthesis of Taiwanese Min Nan
Chuan-Jie Lin | Hsin-Hsi Chen
International Journal of Computational Linguistics & Chinese Language Processing, Volume 4, Number 1, February 1999

pdf bib
Resolving Translation Ambiguity and Target Polysemy in Cross-Language Information Retrieval
Hsin-Hsi Chen | Guo-Wei Bian | Wen-Cheng Lin
International Journal of Computational Linguistics & Chinese Language Processing, Volume 4, Number 2, August 1999

1998

pdf bib
Description of the NTU System used for MET-2
Hsin-Hsi Chen | Yung-Wei Ding | Shih-Chung Tsai | Guo-Wei Bian
Seventh Message Understanding Conference (MUC-7): Proceedings of a Conference Held in Fairfax, Virginia, April 29 - May 1, 1998

pdf bib
An NTU-Approach to Automatic Sentence Extraction for Summary Generation
Kuang-hua Chert | Sheng-Jie Huang | Wen-Cheng Lin | Hsin-Hsi Chen
TIPSTER TEXT PROGRAM PHASE III: Proceedings of a Workshop held at Baltimore, Maryland, October 13-15, 1998

pdf bib
White Page Construction from Web Pages for Finding People on the Internet
Hsin-Hsi Chen | Guo-Wei Bian
International Journal of Computational Linguistics & Chinese Language Processing, Volume 3, Number 1, February 1998: Special Issue on the 10th Research on Computational Linguistics International Conference

pdf bib
Proper Name Translation in Cross-Language Information Retrieval
Hsin-Hsi Chen | Sheng-Jie Huang | Yung-Wei Ding | Shih-Chung Tsai
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 1

pdf bib
Integrating query translation and document translation in a cross-language information retrieval system
Guo-Wei Bian | Hsin-Hsi Chen
Proceedings of the Third Conference of the Association for Machine Translation in the Americas: Technical Papers

Due to the explosive growth of the WWW, very large multilingual textual resources have motivated the researches in Cross-Language Information Retrieval and online Web Machine Translation. In this paper, the integration of language translation and text processing system is proposed to build a multilingual information system. A distributed English-Chinese system on WWW is introduced to illustrate how to integrate query translation, search engines, and web translation system. Since July 1997, more than 46,000 users have accessed our system and about 250,000 English web pages have been translated to pages in Chinese or bilingual English-Chinese versions. And the average satisfaction degree of users at document level is 67.47%.

pdf bib
Proper Name Translation in Cross-Language Information Retrieval
Hsin-Hsi Chen | Sheng-Jie Huang | Yung-Wei Ding | Shih-Chung Tsai
COLING 1998 Volume 1: The 17th International Conference on Computational Linguistics

1997

pdf bib
Proper Name Extraction from Web Pages for Finding People in Internet
Hsin-Hsi Chen | Guo-Wei Bian
Proceedings of the 10th Research on Computational Linguistics International Conference

bib
Building a Bracketed Corpus Using f2 Statistics
Yue-Shi Lee | Hsin-Hsi Chen
International Journal of Computational Linguistics & Chinese Language Processing, Volume 2, Number 2, August 1997

pdf bib
Applying Repair Processing in Chinese Homophone Disambiguation
Yue-Shi Lee | Hsin-Hsi Chen
Fifth Conference on Applied Natural Language Processing

1996

pdf bib
Correcting Chinese Repetition Repairs In Spontaneous Speech
Yue-Shi Lee | Hsin-Hsi Chen
Proceedings of Rocling IX Computational Linguistics Conference IX

pdf bib
A Hybrid Approach to Machine Translation System Design
Kuang-Hua Chen | Hsin-Hsi Chen
International Journal of Computational Linguistics & Chinese Language Processing, Volume 1, Number 1, August 1996

pdf bib
A Rule-Based and MT-Oriented Approach to Prepositional Phrase Attachment
Kuang-hua Chen | Hsin-Hsi Chen
COLING 1996 Volume 1: The 16th International Conference on Computational Linguistics

pdf bib
Identification and Classification of Proper Nouns in Chinese Texts
Hsin-Hsi Chen | Jen-Chang Lee
COLING 1996 Volume 1: The 16th International Conference on Computational Linguistics

1995

pdf bib
A Chunking-and-Raising Partial Parser
Hsin-Hsi Chen | Yue-Shi Lee
Proceedings of the Fourth International Workshop on Parsing Technologies

Parsing is often seen as a combinatorial problem. It is not due to the properties of the natural languages, but due to the parsing strategies. This paper investigates a Constrained Grammar extracted from a Treebank and applies it in a non-combinatorial partial parser. This parser is a simpler version of a chunking-and-raising parser. The chunking and raising actions can be done in linear time. The short-term goal of this research is to help the development of a partially bracketed corpus, i.e., a simpler version of a treebank. The long-term goal is to provide high level linguistic constraints for many natural language applications.

pdf bib
Machine Translation: an Integration Approach
Kuang-hua Chen | Hsin-Hsi Chen
Proceedings of the Sixth Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages

pdf bib
Development of a Partially Bracketed Corpus with Part-of-Speech Information Only
Hsin-Hsi Chen | Yue-Shi Lee
Third Workshop on Very Large Corpora

bib
Proceedings of Rocling VIII Computational Linguistics Conference VIII
Hsin-Hsi Chen
Proceedings of Rocling VIII Computational Linguistics Conference VIII

1994

bib
中文文本人名辨識問題之研究 (Identification of Personal Names in Chinese Texts) [In Chinese]
Jen-Chang Lee | Yue-Shi Lee | Hsin-Hsi Chen
Proceedings of Rocling VII Computational Linguistics Conference VII

pdf bib
A Part-of-Speech-Based Alignment Algorithm
Kuang-hua Chen | Hsin-Hsi Chen
COLING 1994 Volume 1: The 15th International Conference on Computational Linguistics

pdf bib
Extracting Noun Phrases from Large-Scale Texts: A Hybrid Approach and Its Automatic Evaluation
Kuang-hua Chen | Hsin-Hsi Chen
32nd Annual Meeting of the Association for Computational Linguistics

1993

pdf bib
A Storage Reduction Method For Corpus-Based Language Models
Hsin-Hsi Chen | Yue-Shi Lee
Proceedings of Rocling VI Computational Linguistics Conference VI

pdf bib
A Probabilistic Chunker
Kuang-hua Chen | Hsin-Hsi Chen
Proceedings of Rocling VI Computational Linguistics Conference VI

1992

bib
Proceedings of Rocling V Computational Linguistics Conference V
Hsin-Hsi Chen
Proceedings of Rocling V Computational Linguistics Conference V

pdf bib
A Parallel Augmented Context-Free Parsing System For Natural Language Analysis
Hsin-Hsi Chen | Jiunn-Liang Leu | Yue-Shi Lee
Proceedings of Rocling V Computational Linguistics Conference V

1990

pdf bib
A Logic-Based Government-Binding Parser for Mandarin Chinese
Hsin-Hsi Chen
COLING 1990 Volume 2: Papers presented to the 13th International Conference on Computational Linguistics

1988

pdf bib
The Parsing Environment for Mandarin Syntax
I-Peng Lin | Shuan-fan Huang | Hsin-Hsi Chen | Ka-Wai Chui
Proceedings of Rocling I Computational Linguistics Conference I

pdf bib
A New Design of Prolog-Based Bottom-Up Parsing System With Government-Binding Theory
Hsin-Hsi Chen | I-Peng Lin | Chien-Ping Wu
Coling Budapest 1988 Volume 1: International Conference on Computational Linguistics

Search
Co-authors