Liliana Mamani Sanchez

Also published as: Liliana Mamani Sanchez, Liliana Mamani Sánchez

2020

We propose ThaiLMCut, a semi-supervised approach for Thai word segmentation which utilizes a bi-directional character language model (LM) as a way to leverage useful linguistic knowledge from unlabeled data. After the language model is trained on substantial unlabeled corpora, the weights of its embedding and recurrent layers are transferred to a supervised word segmentation model which continues fine-tuning them on a word segmentation task. Our experimental results demonstrate that applying the LM always leads to a performance gain, especially when the amount of labeled data is small. In such cases, the F1 Score increased by up to 2.02%. Even on abig labeled dataset, a small improvement gain can still be obtained. The approach has also shown to be very beneficial for out-of-domain settings with a gain in F1 Score of up to 3.13%. Finally, we show that ThaiLMCut can outperform other open source state-of-the-art models achieving an F1 Score of 98.78% on the standard benchmark, InterBEST2009.

2016

pdf bib abs

Tuning Bayes Baseline for Dialect Detection
Hector-Hugo Franco-Penya | Liliana Mamani Sanchez
Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3)

This paper describes an analysis of our submissions to the Dialect Detection Shared Task 2016. We proposed three different systems that involved simplistic features, to name: a Naive-bayes system, a Support Vector Machines-based system and a Tree Kernel-based system. These systems underperform when compared to other submissions in this shared task, since the best one achieved an accuracy of ~0.834.

pdf bib

Text-based experiments for Predicting mental health emergencies in online web forum posts
Hector-Hugo Franco-Penya | Liliana Mamani Sanchez
Proceedings of the Third Workshop on Computational Linguistics and Clinical Psychology

pdf bib

Combined Tree Kernel-based classifiers for Assessing Quality of Scientific Text
Liliana Mamani Sanchez | Hector-Hugo Franco-Penya
Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications