Hoang Ngo


2023

pdf bib
A Self-enhancement Multitask Framework for Unsupervised Aspect Category Detection
Thi-Nhung Nguyen | Hoang Ngo | Kiem-Hieu Nguyen | Tuan-Dung Cao
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Our work addresses the problem of unsupervised Aspect Category Detection using a small set of seed words. Recent works have focused on learning embedding spaces for seed words and sentences to establish similarities between sentences and aspects. However, aspect representations are limited by the quality of initial seed words, and model performances are compromised by noise. To mitigate this limitation, we propose a simple framework that automatically enhances the quality of initial seed words and selects high-quality sentences for training instead of using the entire dataset. Our main concepts are to add a number of seed words to the initial set and to treat the task of noise resolution as a task of augmenting data for a low-resource task. In addition, we jointly train Aspect Category Detection with Aspect Term Extraction and Aspect Term Polarity to further enhance performance. This approach facilitates shared representation learning, allowing Aspect Category Detection to benefit from the additional guidance offered by other tasks. Extensive experiments demonstrate that our framework surpasses strong baselines on standard datasets.

pdf bib
VTCC-NLP at SemEval-2023 Task 6:Long-Text Representation Based on Graph Neural Network for Rhetorical Roles Prediction
Hiep Nguyen | Hoang Ngo | Nam Bui
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

Rhetorical Roles (RR) prediction is to predict the label of each sentence in legal documents, which is regarded as an emergent task for legal document understanding. In this study, we present a novel method for the RR task by exploiting the long context representation. Specifically, legal documents are known as long texts, in which previous works have no ability to consider the inherent dependencies among sentences. In this paper, we propose GNNRR (Graph Neural Network for Rhetorical Roles Prediction), which is able to model the cross-information for long texts. Furthermore, we develop multitask learning by incorporating label shift prediction (LSP) for segmenting a legal document. The proposed model is evaluated on the SemEval 2023 Task 6 - Legal Eval Understanding Legal Texts for RR sub-task. Accordingly, our method achieves the top 4 in the public leaderboard of the sub-task. Our source code is available for further investigation\footnote{https://github.com/hiepnh137/SemEval2023-Task6-Rhetorical-Roles}.