Van-Khanh Tran


2020

pdf bib
Goals, Challenges and Findings of the VLSP 2020 English-Vietnamese News Translation Shared Task
Thanh-Le Ha | Van-Khanh Tran | Kim-Anh Nguyen
Proceedings of the 7th International Workshop on Vietnamese Language and Speech Processing

2018

pdf bib
Dual Latent Variable Model for Low-Resource Natural Language Generation in Dialogue Systems
Van-Khanh Tran | Le-Minh Nguyen
Proceedings of the 22nd Conference on Computational Natural Language Learning

Recent deep learning models have shown improving results to natural language generation (NLG) irrespective of providing sufficient annotated data. However, a modest training data may harm such models’ performance. Thus, how to build a generator that can utilize as much of knowledge from a low-resource setting data is a crucial issue in NLG. This paper presents a variational neural-based generation model to tackle the NLG problem of having limited labeled dataset, in which we integrate a variational inference into an encoder-decoder generator and introduce a novel auxiliary auto-encoding with an effective training procedure. Experiments showed that the proposed methods not only outperform the previous models when having sufficient training dataset but also demonstrate strong ability to work acceptably well when the training data is scarce.

pdf bib
Adversarial Domain Adaptation for Variational Neural Language Generation in Dialogue Systems
Van-Khanh Tran | Le-Minh Nguyen
Proceedings of the 27th International Conference on Computational Linguistics

Domain Adaptation arises when we aim at learning from source domain a model that can perform acceptably well on a different target domain. It is especially crucial for Natural Language Generation (NLG) in Spoken Dialogue Systems when there are sufficient annotated data in the source domain, but there is a limited labeled data in the target domain. How to effectively utilize as much of existing abilities from source domains is a crucial issue in domain adaptation. In this paper, we propose an adversarial training procedure to train a Variational encoder-decoder based language generator via multiple adaptation steps. In this procedure, a model is first trained on a source domain data and then fine-tuned on a small set of target domain utterances under the guidance of two proposed critics. Experimental results show that the proposed method can effectively leverage the existing knowledge in the source domain to adapt to another related domain by using only a small amount of in-domain data.

2017

pdf bib
Natural Language Generation for Spoken Dialogue System using RNN Encoder-Decoder Networks
Van-Khanh Tran | Le-Minh Nguyen
Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)

Natural language generation (NLG) is a critical component in a spoken dialogue system. This paper presents a Recurrent Neural Network based Encoder-Decoder architecture, in which an LSTM-based decoder is introduced to select, aggregate semantic elements produced by an attention mechanism over the input elements, and to produce the required utterances. The proposed generator can be jointly trained both sentence planning and surface realization to produce natural language sentences. The proposed model was extensively evaluated on four different NLG datasets. The experimental results showed that the proposed generators not only consistently outperform the previous methods across all the NLG domains but also show an ability to generalize from a new, unseen domain and learn from multi-domain datasets.

pdf bib
Neural-based Natural Language Generation in Dialogue using RNN Encoder-Decoder with Semantic Aggregation
Van-Khanh Tran | Le-Minh Nguyen | Satoshi Tojo
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue

Natural language generation (NLG) is an important component in spoken dialogue systems. This paper presents a model called Encoder-Aggregator-Decoder which is an extension of an Recurrent Neural Network based Encoder-Decoder architecture. The proposed Semantic Aggregator consists of two components: an Aligner and a Refiner. The Aligner is a conventional attention calculated over the encoded input information, while the Refiner is another attention or gating mechanism stacked over the attentive Aligner in order to further select and aggregate the semantic elements. The proposed model can be jointly trained both sentence planning and surface realization to produce natural language utterances. The model was extensively assessed on four different NLG domains, in which the experimental results showed that the proposed generator consistently outperforms the previous methods on all the NLG domains.