Jinghang Gu


2024

pdf bib
Employing Glyphic Information for Chinese Event Extraction with Vision-Language Model
Xiaoyi Bao | Jinghang Gu | Zhongqing Wang | Minjie Qiang | Chu-Ren Huang
Findings of the Association for Computational Linguistics: EMNLP 2024

As a complex task that requires rich information input, features from various aspects have been utilized in event extraction. However, most of the previous works ignored the value of glyph, which could contain enriched semantic information and can not be fully expressed by the pre-trained embedding in hieroglyphic languages like Chinese. We argue that, compared with combining the sophisticated textual features, glyphic information from visual modality could provide us with extra and straight semantic information in extracting events. Motivated by this, we propose a glyphic multi-modal Chinese event extraction model with hieroglyphic images to capture the intra- and inter-character morphological structure from the sequence. Extensive experiments build a new state-of-the-art performance in the ACE2005 Chinese and KBP Eval 2017 dataset, which underscores the effectiveness of our proposed glyphic event extraction model, and more importantly, the glyphic feature can be obtained at nearly zero cost.

pdf bib
PolyuCBS at SMM4H 2024: LLM-based Medical Disorder and Adverse Drug Event Detection with Low-rank Adaptation
Zhai Yu | Xiaoyi Bao | Emmanuele Chersoni | Beatrice Portelli | Sophia Lee | Jinghang Gu | Chu-Ren Huang
Proceedings of The 9th Social Media Mining for Health Research and Applications (SMM4H 2024) Workshop and Shared Tasks

This is the demonstration of systems and results of our team’s participation in the Social Medical Mining for Health (SMM4H) 2024 Shared Task. Our team participated in two tasks: Task 1 and Task 5. Task 5 requires the detection of tweet sentences that claim children’s medical disorders from certain users. Task 1 needs teams to extract and normalize Adverse Drug Event terms in the tweet sentence. The team selected several Pre-trained Language Models and generative Large Language Models to meet the requirements. Strategies to improve the performance include cloze test, prompt engineering, Low Rank Adaptation etc. The test result of our system has an F1 score of 0.935, Precision of 0.954 and Recall of 0.917 in Task 5 and an overall F1 score of 0.08 in Task 1.

2023

pdf bib
Effectiveness of ChatGPT in Korean Grammatical Error Correction
Junghwan Maeng | Jinghang Gu | Sun-A Kim
Proceedings of the 37th Pacific Asia Conference on Language, Information and Computation

pdf bib
Identifying ESG Impact with Key Information
Le Qiu | Bo Peng | Jinghang Gu | Yu-Yin Hsu | Emmanuele Chersoni
Proceedings of the Sixth Workshop on Financial Technology and Natural Language Processing

The paper presents a concise summary of our work for the ML-ESG-2 shared task, exclusively on the Chinese and English datasets. ML-ESG-2 aims to ascertain the influence of news articles on corporations, specifically from an ESG perspective. To this end, we generally explored the capability of key information for impact identification and experimented with various techniques at different levels. For instance, we attempted to incorporate important information at the word level with TF-IDF, at the sentence level with TextRank, and at the document level with summarization. The final results reveal that the one with GPT-4 for summarisation yields the best predictions.

2022

pdf bib
Inclusion in CSR Reports: The Lens from a Data-Driven Machine Learning Model
Lu Lu | Jinghang Gu | Chu-Ren Huang
Proceedings of the First Computing Social Responsibility Workshop within the 13th Language Resources and Evaluation Conference

Inclusion, as one of the foundations in the diversity, equity, and inclusion initiative, concerns the degree of being treated as an ingroup member in a workplace. Despite of its importance in a corporate’s ecosystem, the inclusion strategies and its performance are not adequately addressed in corporate social responsibility (CSR) and CSR reporting. This study proposes a machine learning and big data-based model to examine inclusion through the use of stereotype content in actual language use. The distribution of the stereotype content in general corpora of a given society is utilized as a baseline, with which texts about corporate texts are compared. This study not only propose a model to identify and classify inclusion in language use, but also provides insights to measure and track progress by including inclusion in CSR reports as a strategy to build an inclusive corporate team.

2021

pdf bib
PolyU CBS-Comp at SemEval-2021 Task 1: Lexical Complexity Prediction (LCP)
Rong Xiang | Jinghang Gu | Emmanuele Chersoni | Wenjie Li | Qin Lu | Chu-Ren Huang
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

In this contribution, we describe the system presented by the PolyU CBS-Comp Team at the Task 1 of SemEval 2021, where the goal was the estimation of the complexity of words in a given sentence context. Our top system, based on a combination of lexical, syntactic, word embeddings and Transformers-derived features and on a Gradient Boosting Regressor, achieves a top correlation score of 0.754 on the subtask 1 for single words and 0.659 on the subtask 2 for multiword expressions.

2020

pdf bib
Affection Driven Neural Networks for Sentiment Analysis
Rong Xiang | Yunfei Long | Mingyu Wan | Jinghang Gu | Qin Lu | Chu-Ren Huang
Proceedings of the Twelfth Language Resources and Evaluation Conference

Deep neural network models have played a critical role in sentiment analysis with promising results in the recent decade. One of the essential challenges, however, is how external sentiment knowledge can be effectively utilized. In this work, we propose a novel affection-driven approach to incorporating affective knowledge into neural network models. The affective knowledge is obtained in the form of a lexicon under the Affect Control Theory (ACT), which is represented by vectors of three-dimensional attributes in Evaluation, Potency, and Activity (EPA). The EPA vectors are mapped to an affective influence value and then integrated into Long Short-term Memory (LSTM) models to highlight affective terms. Experimental results show a consistent improvement of our approach over conventional LSTM models by 1.0% to 1.5% in accuracy on three large benchmark datasets. Evaluations across a variety of algorithms have also proven the effectiveness of leveraging affective terms for deep model enhancement.