Dieu-Thu Le

Also published as: Dieu Thu Le


pdf bib
Combining semantic search and twin product classification for recognition of purchasable items in voice shopping
Dieu-Thu Le | Verena Weber | Melanie Bradford
Proceedings of The 4th Workshop on e-Commerce and NLP

The accuracy of an online shopping system via voice commands is particularly important and may have a great impact on customer trust. This paper focuses on the problem of detecting if an utterance contains actual and purchasable products, thus referring to a shopping-related intent in a typical Spoken Language Understanding architecture consist- ing of an intent classifier and a slot detec- tor. Searching through billions of products to check if a detected slot is a purchasable item is prohibitively expensive. To overcome this problem, we present a framework that (1) uses a retrieval module that returns the most rele- vant products with respect to the detected slot, and (2) combines it with a twin network that decides if the detected slot is indeed a pur- chasable item or not. Through various exper- iments, we show that this architecture outper- forms a typical slot detector approach, with a gain of +81% in accuracy and +41% in F1 score.


pdf bib
Joint learning of frequency and word embeddings for multilingual readability assessment
Dieu-Thu Le | Cam-Tu Nguyen | Xiaoliang Wang
Proceedings of the 5th Workshop on Natural Language Processing Techniques for Educational Applications

This paper describes two models that employ word frequency embeddings to deal with the problem of readability assessment in multiple languages. The task is to determine the difficulty level of a given document, i.e., how hard it is for a reader to fully comprehend the text. The proposed models show how frequency information can be integrated to improve the readability assessment. The experimental results testing on both English and Chinese datasets show that the proposed models improve the results notably when comparing to those using only traditional word embeddings.

pdf bib
Dave the debater: a retrieval-based and generative argumentative dialogue agent
Dieu Thu Le | Cam-Tu Nguyen | Kim Anh Nguyen
Proceedings of the 5th Workshop on Argument Mining

In this paper, we explore the problem of developing an argumentative dialogue agent that can be able to discuss with human users on controversial topics. We describe two systems that use retrieval-based and generative models to make argumentative responses to the users. The experiments show promising results although they have been trained on a small dataset.


pdf bib
Construction and Analysis of a Large Vietnamese Text Corpus
Dieu-Thu Le | Uwe Quasthoff
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper presents a new Vietnamese text corpus which contains around 4.05 billion words. It is a collection of Wikipedia texts, newspaper articles and random web texts. The paper describes the process of collecting, cleaning and creating the corpus. Processing Vietnamese texts faced several challenges, for example, different from many Latin languages, Vietnamese language does not use blanks for separating words, hence using common tokenizers such as replacing blanks with word boundary does not work. A short review about different approaches of Vietnamese tokenization is presented together with how the corpus has been processed and created. After that, some statistical analysis on this data is reported including the number of syllable, average word length, sentence length and topic analysis. The corpus is integrated into a framework which allows searching and browsing. Using this web interface, users can find out how many times a particular word appears in the corpus, sample sentences where this word occurs, its left and right neighbors.

pdf bib
Towards a text analysis system for political debates
Dieu-Thu Le | Ngoc Thang Vu | Andre Blessing
Proceedings of the 10th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities


pdf bib
TUHOI: Trento Universal Human Object Interaction Dataset
Dieu-Thu Le | Jasper Uijlings | Raffaella Bernardi
Proceedings of the Third Workshop on Vision and Language


pdf bib
Exploiting Language Models for Visual Recognition
Dieu-Thu Le | Jasper Uijlings | Raffaella Bernardi
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing


pdf bib
Query classification using topic models and support vector machine
Dieu-Thu Le | Raffaella Bernardi
Proceedings of ACL 2012 Student Research Workshop


pdf bib
Query classification via Topic Models for an art image archive
Dieu-Thu Le | Raffaella Bernardi | Ed Vald
Proceedings of the Workshop on Language Technologies for Digital Humanities and Cultural Heritage