Thi Hai Yen Vuong

Also published as: Thi-Hai-Yen Vuong


2024

pdf bib
Enhancing Legal Violation Identification with LLMs and Deep Learning Techniques: Achievements in the LegalLens 2024 Competition
Nguyen Tan Minh | Duy Ngoc Mai | Le Xuan Bach | Nguyen Huu Dung | Pham Cong Minh | Ha Thanh Nguyen | Thi Hai Yen Vuong
Proceedings of the Natural Legal Language Processing Workshop 2024

LegalLens is a competition organized to encourage advancements in automatically detecting legal violations. This paper presents our solutions for two tasks Legal Named Entity Recognition (L-NER) and Legal Natural Language Inference (L-NLI). Our approach involves fine-tuning BERT-based models, designing methods based on data characteristics, and a novel prompting template for data augmentation using LLMs. As a result, we secured first place in L-NER and third place in L-NLI among thirty-six participants. We also perform error analysis to provide valuable insights and pave the way for future enhancements in legal NLP. Our implementation is available at https://github.com/lxbach10012004/legal-lens/tree/main

2023

pdf bib
Joint Learning for Legal Text Retrieval and Textual Entailment: Leveraging the Relationship between Relevancy and Affirmation
Nguyen Hai Long | Thi Hai Yen Vuong | Ha Thanh Nguyen | Xuan-Hieu Phan
Proceedings of the Natural Legal Language Processing Workshop 2023

In legal text processing and reasoning, one normally performs information retrieval to find relevant documents of an input question, and then performs textual entailment to answer the question. The former is about relevancy whereas the latter is about affirmation (or conclusion). While relevancy and affirmation are two different concepts, there is obviously a connection between them. That is why performing retrieval and textual entailment sequentially and independently may not make the most of this mutually supportive relationship. This paper, therefore, propose a multi–task learning model for these two tasks to improve their performance. Technically, in the COLIEE dataset, we use the information of Task 4 (conclusions) to improve the performance of Task 3 (searching for legal provisions related to the question). Our empirical findings indicate that this supportive relationship truly exists. This important insight sheds light on how leveraging relationship between tasks can significantly enhance the effectiveness of our multi-task learning approach for legal text processing.

pdf bib
Passage-based BM25 Hard Negatives: A Simple and Effective Negative Sampling Strategy For Dense Retrieval
Thanh-Do Nguyen | Chi Minh Bui | Thi-Hai-Yen Vuong | Xuan-Hieu Phan
Proceedings of the 37th Pacific Asia Conference on Language, Information and Computation