Phu-Vinh Nguyen

2026

An Enhanced Training-Free Pipeline for Entity Recognition and Linking: A Low-Resource Case Study – 20-th Century Historical Medical Texts
Phu-Vinh Nguyen | Vera Danilova
Proceedings of the 1st Workshop on Linguistic Analysis for Health (HeaLing 2026)

Entity linking in biomedicine typically relies on large annotated corpora and supervised methods, which often fail in out-of-distribution settings. Historical medical texts are rich in biomedical terms but pose unique challenges: terminology has changed, some concepts are obsolete, and stylistic differences from modern journals prevent off-the-shelf models fine-tuned on contemporary datasets from aligning historical terms with current ontologies. Training-free methods based on LLMs offer a solution by linking historical terms to modern concepts and inferring their meaning from context. In this paper, we evaluate a state-of-the-art training-free entity linking method on historical medical texts and propose an improved pipeline—end-to-end entity extraction and linking with confidence estimation. We also assess performance on modern benchmarks to check whether the gains generalize to other domains and show their superior performance in most cases. We report an analysis of the findings. The code and curated dataset for historical medical entity linking are available on GitHub.

2025

pdf bib abs

SilVar: Speech-Driven Multimodal Model for Reasoning Visual Question Answering and Object Localization
Tan-Hanh Pham | Le Hoang Nam | Phu-Vinh Nguyen | Chris Ngo | Truong-Son Hy
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Visual Language Models have demonstrated remarkable capabilities across various tasks, including visual question answering and image captioning. However, most models rely on text-based instructions, limiting their effectiveness in natural human-machine interactions. Moreover, the quality of language models primarily depends on reasoning and prompting techniques, such as chain-of-thought, which remain underexplored when using speech instructions. To address these challenges, we propose SilVar, an end-to-end multimodal model that leverages speech instructions for reasoning-based visual question answering. Additionally, we investigate reasoning techniques at different levels, including conversational, simple, and complex speech instructions. SilVar is built upon CLIP, Whisper, and LLaMA 3.1-8B, enabling more intuitive interactions by allowing users to provide verbal or text-based instructions. To this end, we introduce a new dataset designed to challenge models with speech-based reasoning tasks for object localization. This dataset enhances the model’s ability to process and explain visual scenes from spoken input, moving beyond simple object recognition to reasoning-based interactions. To our knowledge, SilVar is the first open-source, speech-driven VLM. We believe SilVar will inspire the next generation of multimodal reasoning models, advancing toward expert artificial general intelligence.

2024

pdf bib abs

ViGLUE: A Vietnamese General Language Understanding Benchmark and Analysis of Vietnamese Language Models
Minh-Nam Tran | Phu-Vinh Nguyen | Long Nguyen | Dien Dinh
Findings of the Association for Computational Linguistics: NAACL 2024

As the number of language models has increased, various benchmarks have been suggested to assess the proficiency of the models in natural language understanding. However, there is a lack of such a benchmark in Vietnamese due to the difficulty in accessing natural language processing datasets or the scarcity of task-specific datasets. **ViGLUE**, the proposed dataset collection, is a **Vi**etnamese **G**eneral **L**anguage **U**nderstanding **E**valuation benchmark developed using three methods: translating an existing benchmark, generating new corpora, and collecting available datasets. ViGLUE contains twelve tasks and encompasses over ten areas and subjects, enabling it to evaluate models comprehensively over a broad spectrum of aspects. Baseline models utilizing multilingual language models are also provided for all tasks in the proposed benchmarks. In addition, the study of the available Vietnamese large language models is conducted to explore the language models’ ability in the few-shot learning framework, leading to the exploration of the relationship between specific tasks and the number of shots.

pdf bib abs

ViMedAQA: A Vietnamese Medical Abstractive Question-Answering Dataset and Findings of Large Language Model
Minh-Nam Tran | Phu-Vinh Nguyen | Long Nguyen | Dien Dinh
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)

Question answering involves creating answers to questions. With the growth of large language models, the ability of question-answering systems has dramatically improved. However, there is a lack of Vietnamese abstractive question-answering datasets, especially in the medical domain. Therefore, this research aims to mitigate this gap by introducing ViMedAQA. This **Vi**etnamese **Med**ical **A**bstractive **Q**uestion-**A**nswering dataset covers four topics in the Vietnamese medical domain, including body parts, disease, drugs and medicine. Additionally, the empirical results on the proposed dataset examine the capability of the large language models in the Vietnamese medical domain, including reasoning, memorizing and awareness of essential information.

Co-authors

Le Hoang Nam 1

Chris Ngo 1

Tan-Hanh Pham 1

Venues

Fix author