Thi-Hai-Yen Vuong

Also published as: Thi Hai Yen Vuong

2026

CareerPathKG: Knowledge Graph Integrated Framework for Career Intelligence
Ngoc-Quang Le | Duc Duong Hoang | Mai Vu Tran | Thi-Hai-Yen Vuong
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 5: Industry Track)

The labor market is experiencing rapid and continual shifts in required skills and competencies, driven by technological advancement and evolving industry structures. Within this dynamic environment, candidates increasingly face challenges in orienting their career development, requiring them to continuously update their knowledge and capabilities to meet contemporary job requirements; this need is particularly necessary for new entrants to the labor market, who must cultivate a comprehensive understanding of current labor-market conditions. To address these issues, this study proposes an enterprise recruitment framework grounded in a career path knowledge graph, capturing occupations, skill requirements, and career transitions using standardized taxonomies enriched with job-posting data. The framework integrates transformer-based embeddings, large language models, and knowledge-graph reasoning to support efficient and reliable CV assessment, CV-JD matching and career guidance.

2025

pdf bib

DRILL Shared Task 2025: The Challenge of Deep Retrieval in the Expansive Legal Landscape
Thi-Hai-Yen Vuong | Tan-Minh Nguyen | Hoang-Trung Nguyen | Trong-Khoi Dao | Ha-Thanh Nguyen | Hoang-Quynh Le
Proceedings of the 11th International Workshop on Vietnamese Language and Speech Processing

2024

pdf bib abs

Enhancing Legal Violation Identification with LLMs and Deep Learning Techniques: Achievements in the LegalLens 2024 Competition
Nguyen Tan Minh | Duy Ngoc Mai | Le Xuan Bach | Nguyen Huu Dung | Pham Cong Minh | Ha Thanh Nguyen | Thi Hai Yen Vuong
Proceedings of the Natural Legal Language Processing Workshop 2024

LegalLens is a competition organized to encourage advancements in automatically detecting legal violations. This paper presents our solutions for two tasks Legal Named Entity Recognition (L-NER) and Legal Natural Language Inference (L-NLI). Our approach involves fine-tuning BERT-based models, designing methods based on data characteristics, and a novel prompting template for data augmentation using LLMs. As a result, we secured first place in L-NER and third place in L-NLI among thirty-six participants. We also perform error analysis to provide valuable insights and pave the way for future enhancements in legal NLP. Our implementation is available at https://github.com/lxbach10012004/legal-lens/tree/main

2023

pdf bib

Passage-based BM25 Hard Negatives: A Simple and Effective Negative Sampling Strategy For Dense Retrieval
Thanh-Do Nguyen | Chi Minh Bui | Thi-Hai-Yen Vuong | Xuan-Hieu Phan
Proceedings of the 37th Pacific Asia Conference on Language, Information and Computation

pdf bib abs

Joint Learning for Legal Text Retrieval and Textual Entailment: Leveraging the Relationship between Relevancy and Affirmation
Nguyen Hai Long | Thi Hai Yen Vuong | Ha Thanh Nguyen | Xuan-Hieu Phan
Proceedings of the Natural Legal Language Processing Workshop 2023

In legal text processing and reasoning, one normally performs information retrieval to find relevant documents of an input question, and then performs textual entailment to answer the question. The former is about relevancy whereas the latter is about affirmation (or conclusion). While relevancy and affirmation are two different concepts, there is obviously a connection between them. That is why performing retrieval and textual entailment sequentially and independently may not make the most of this mutually supportive relationship. This paper, therefore, propose a multi–task learning model for these two tasks to improve their performance. Technically, in the COLIEE dataset, we use the information of Task 4 (conclusions) to improve the performance of Task 3 (searching for legal provisions related to the question). Our empirical findings indicate that this supportive relationship truly exists. This important insight sheds light on how leveraging relationship between tasks can significantly enhance the effectiveness of our multi-task learning approach for legal text processing.