Ha-Thanh Nguyen

Also published as: Ha Thanh Nguyen

2025

DRILL Shared Task 2025: The Challenge of Deep Retrieval in the Expansive Legal Landscape
Thi-Hai-Yen Vuong | Tan-Minh Nguyen | Hoang-Trung Nguyen | Trong-Khoi Dao | Ha-Thanh Nguyen | Hoang-Quynh Le
Proceedings of the 11th International Workshop on Vietnamese Language and Speech Processing

2024

pdf bib abs

Enhancing Legal Violation Identification with LLMs and Deep Learning Techniques: Achievements in the LegalLens 2024 Competition
Nguyen Tan Minh | Duy Ngoc Mai | Le Xuan Bach | Nguyen Huu Dung | Pham Cong Minh | Ha Thanh Nguyen | Thi Hai Yen Vuong
Proceedings of the Natural Legal Language Processing Workshop 2024

LegalLens is a competition organized to encourage advancements in automatically detecting legal violations. This paper presents our solutions for two tasks Legal Named Entity Recognition (L-NER) and Legal Natural Language Inference (L-NLI). Our approach involves fine-tuning BERT-based models, designing methods based on data characteristics, and a novel prompting template for data augmentation using LLMs. As a result, we secured first place in L-NER and third place in L-NLI among thirty-six participants. We also perform error analysis to provide valuable insights and pave the way for future enhancements in legal NLP. Our implementation is available at https://github.com/lxbach10012004/legal-lens/tree/main

2023

pdf bib abs

Joint Learning for Legal Text Retrieval and Textual Entailment: Leveraging the Relationship between Relevancy and Affirmation
Nguyen Hai Long | Thi Hai Yen Vuong | Ha Thanh Nguyen | Xuan-Hieu Phan
Proceedings of the Natural Legal Language Processing Workshop 2023

In legal text processing and reasoning, one normally performs information retrieval to find relevant documents of an input question, and then performs textual entailment to answer the question. The former is about relevancy whereas the latter is about affirmation (or conclusion). While relevancy and affirmation are two different concepts, there is obviously a connection between them. That is why performing retrieval and textual entailment sequentially and independently may not make the most of this mutually supportive relationship. This paper, therefore, propose a multi–task learning model for these two tasks to improve their performance. Technically, in the COLIEE dataset, we use the information of Task 4 (conclusions) to improve the performance of Task 3 (searching for legal provisions related to the question). Our empirical findings indicate that this supportive relationship truly exists. This important insight sheds light on how leveraging relationship between tasks can significantly enhance the effectiveness of our multi-task learning approach for legal text processing.

2020

pdf bib

Latent Topic Refinement based on Distance Metric Learning and Semantics-assisted Non-negative Matrix Factorization
Tran-Binh Dang | Ha-Thanh Nguyen | Le-Minh Nguyen
Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation

pdf bib abs

Text representation plays a vital role in retrieval-based question answering, especially in the legal domain where documents are usually long and complicated. The better the question and the legal documents are represented, the more accurate they are matched. In this paper, we focus on the task of answering legal questions at the article level. Given a legal question, the goal is to retrieve all the correct and valid legal articles, that can be used as the basic to answer the question. We present a retrieval-based model for the task by learning neural attentive text representation. Our text representation method first leverages convolutional neural networks to extract important information in a question and legal articles. Attention mechanisms are then used to represent the question and articles and select appropriate information to align them in a matching process. Experimental results on an annotated corpus consisting of 5,922 Vietnamese legal questions show that our model outperforms state-of-the-art retrieval-based methods for question answering by large margins in terms of both recall and NDCG.

pdf bib

How State-Of-The-Art Models Can Deal With Long-Form Question Answering
Minh-Quan Bui | Vu Tran | Ha-Thanh Nguyen | Le-Minh Nguyen
Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation