Long Nguyen

Also published as: L. Nguyen


2024

pdf bib
ViGLUE: A Vietnamese General Language Understanding Benchmark and Analysis of Vietnamese Language Models
Minh-Nam Tran | Phu-Vinh Nguyen | Long Nguyen | Dien Dinh
Findings of the Association for Computational Linguistics: NAACL 2024

As the number of language models has increased, various benchmarks have been suggested to assess the proficiency of the models in natural language understanding. However, there is a lack of such a benchmark in Vietnamese due to the difficulty in accessing natural language processing datasets or the scarcity of task-specific datasets. **ViGLUE**, the proposed dataset collection, is a **Vi**etnamese **G**eneral **L**anguage **U**nderstanding **E**valuation benchmark developed using three methods: translating an existing benchmark, generating new corpora, and collecting available datasets. ViGLUE contains twelve tasks and encompasses over ten areas and subjects, enabling it to evaluate models comprehensively over a broad spectrum of aspects. Baseline models utilizing multilingual language models are also provided for all tasks in the proposed benchmarks. In addition, the study of the available Vietnamese large language models is conducted to explore the language models’ ability in the few-shot learning framework, leading to the exploration of the relationship between specific tasks and the number of shots.

pdf bib
ViMedAQA: A Vietnamese Medical Abstractive Question-Answering Dataset and Findings of Large Language Model
Minh-Nam Tran | Phu-Vinh Nguyen | Long Nguyen | Dien Dinh
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)

Question answering involves creating answers to questions. With the growth of large language models, the ability of question-answering systems has dramatically improved. However, there is a lack of Vietnamese abstractive question-answering datasets, especially in the medical domain. Therefore, this research aims to mitigate this gap by introducing ViMedAQA. This **Vi**etnamese **Med**ical **A**bstractive **Q**uestion-**A**nswering dataset covers four topics in the Vietnamese medical domain, including body parts, disease, drugs and medicine. Additionally, the empirical results on the proposed dataset examine the capability of the large language models in the Vietnamese medical domain, including reasoning, memorizing and awareness of essential information.

2022

pdf bib
Multi-level Community-awareness Graph Neural Networks for Neural Machine Translation
Binh Nguyen | Long Nguyen | Dien Dinh
Proceedings of the 29th International Conference on Computational Linguistics

Neural Machine Translation (NMT) aims to translate the source- to the target-language while preserving the original meaning. Linguistic information such as morphology, syntactic, and semantics shall be grasped in token embeddings to produce a high-quality translation. Recent works have leveraged the powerful Graph Neural Networks (GNNs) to encode such language knowledge into token embeddings. Specifically, they use a trained parser to construct semantic graphs given sentences and then apply GNNs. However, most semantic graphs are tree-shaped and too sparse for GNNs which cause the over-smoothing problem. To alleviate this problem, we propose a novel Multi-level Community-awareness Graph Neural Network (MC-GNN) layer to jointly model local and global relationships between words and their linguistic roles in multiple communities. Intuitively, the MC-GNN layer substitutes a self-attention layer at the encoder side of a transformer-based machine translation model. Extensive experiments on four language-pair datasets with common evaluation metrics show the remarkable improvements of our method while reducing the time complexity in very long sentences.

2021

pdf bib
Matching The Statements: A Simple and Accurate Model for Key Point Analysis
Hoang Phan | Long Nguyen | Long Nguyen | Khanh Doan
Proceedings of the 8th Workshop on Argument Mining

Key Point Analysis (KPA) is one of the most essential tasks in building an Opinion Summarization system, which is capable of generating key points for a collection of arguments toward a particular topic. Furthermore, KPA allows quantifying the coverage of each summary by counting its matched arguments. With the aim of creating high-quality summaries, it is necessary to have an in-depth understanding of each individual argument as well as its universal semantic in a specified context. In this paper, we introduce a promising model, named Matching the Statements (MTS) that incorporates the discussed topic information into arguments/key points comprehension to fully understand their meanings, thus accurately performing ranking and retrieving best-match key points for an input argument. Our approach has achieved the 4th place in Track 1 of the Quantitative Summarization – Key Point Analysis Shared Task by IBM, yielding a competitive performance of 0.8956 (3rd) and 0.9632 (7th) strict and relaxed mean Average Precision, respectively.

pdf bib
Matching The Statements: A Simple and Accurate Model for Key Point Analysis
Hoang Phan | Long Nguyen | Long Nguyen | Khanh Doan
Proceedings of the 8th Workshop on Argument Mining

Key Point Analysis (KPA) is one of the most essential tasks in building an Opinion Summarization system, which is capable of generating key points for a collection of arguments toward a particular topic. Furthermore, KPA allows quantifying the coverage of each summary by counting its matched arguments. With the aim of creating high-quality summaries, it is necessary to have an in-depth understanding of each individual argument as well as its universal semantic in a specified context. In this paper, we introduce a promising model, named Matching the Statements (MTS) that incorporates the discussed topic information into arguments/key points comprehension to fully understand their meanings, thus accurately performing ranking and retrieving best-match key points for an input argument. Our approach has achieved the 4th place in Track 1 of the Quantitative Summarization – Key Point Analysis Shared Task by IBM, yielding a competitive performance of 0.8956 (3rd) and 0.9632 (7th) strict and relaxed mean Average Precision, respectively.

1994

pdf bib
On Using Written Language Training Data for Spoken Language Modeling
R. Schwartz | L. Nguyen | F. Kubala | G. Chou | G. Zavaliagkos | J. Makhoul
Human Language Technology: Proceedings of a Workshop held at Plainsboro, New Jersey, March 8-11, 1994

pdf bib
Is N-Best Dead?
Long Nguyen | Richard Schwartz | Ying Zhao | George Zavaliagkos
Human Language Technology: Proceedings of a Workshop held at Plainsboro, New Jersey, March 8-11, 1994

1993

pdf bib
Comparative Experiments on Large Vocabulary Speech Recognition
Richard Schwartz | Tasos Anastasakos | Francis Kubala | John Makhoul | Long Nguyen | George Zavaliagkos
Human Language Technology: Proceedings of a Workshop Held at Plainsboro, New Jersey, March 21-24, 1993

pdf bib
Search Algorithms for Software-Only Real-Time Recognition with Very Large Vocabularies
Long Nguyen | Richard Schwartz | Francis Kubala | Paul Placeway
Human Language Technology: Proceedings of a Workshop Held at Plainsboro, New Jersey, March 21-24, 1993

1992

pdf bib
BBN BYBLOS and HARC February 1992 ATIS Benchmark Results
Francis Kubala | Chris Barry | Madeleine Bates | Robert Bobrow | Pascale Fung | Robert Ingria | John Makhoul | Long Nguyen | Richard Schwartz | David Stallard
Speech and Natural Language: Proceedings of a Workshop Held at Harriman, New York, February 23-26, 1992

pdf bib
BBN Real-Time Speech Recognition Demonstrations
Steve Austin | Rusty Bobrow | Dan Ellard | Robert Ingria | John Makhoul | Long Nguyen | Pat Peterson | Paul Placeway | Richard Schwartz
Speech and Natural Language: Proceedings of a Workshop Held at Harriman, New York, February 23-26, 1992