Liliana Mamani Sanchez

Also published as: Liliana Mamani Sanchez, Liliana Mamani Sánchez


pdf bib
ThaiLMCut: Unsupervised Pretraining for Thai Word Segmentation
Suteera Seeha | Ivan Bilan | Liliana Mamani Sanchez | Johannes Huber | Michael Matuschek | Hinrich Schütze
Proceedings of the Twelfth Language Resources and Evaluation Conference

We propose ThaiLMCut, a semi-supervised approach for Thai word segmentation which utilizes a bi-directional character language model (LM) as a way to leverage useful linguistic knowledge from unlabeled data. After the language model is trained on substantial unlabeled corpora, the weights of its embedding and recurrent layers are transferred to a supervised word segmentation model which continues fine-tuning them on a word segmentation task. Our experimental results demonstrate that applying the LM always leads to a performance gain, especially when the amount of labeled data is small. In such cases, the F1 Score increased by up to 2.02%. Even on abig labeled dataset, a small improvement gain can still be obtained. The approach has also shown to be very beneficial for out-of-domain settings with a gain in F1 Score of up to 3.13%. Finally, we show that ThaiLMCut can outperform other open source state-of-the-art models achieving an F1 Score of 98.78% on the standard benchmark, InterBEST2009.


pdf bib
Text-based experiments for Predicting mental health emergencies in online web forum posts
Hector-Hugo Franco-Penya | Liliana Mamani Sanchez
Proceedings of the Third Workshop on Computational Linguistics and Clinical Psychology

pdf bib
Combined Tree Kernel-based classifiers for Assessing Quality of Scientific Text
Liliana Mamani Sanchez | Hector-Hugo Franco-Penya
Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
Tuning Bayes Baseline for Dialect Detection
Hector-Hugo Franco-Penya | Liliana Mamani Sanchez
Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3)

This paper describes an analysis of our submissions to the Dialect Detection Shared Task 2016. We proposed three different systems that involved simplistic features, to name: a Naive-bayes system, a Support Vector Machines-based system and a Tree Kernel-based system. These systems underperform when compared to other submissions in this shared task, since the best one achieved an accuracy of ~0.834.


pdf bib
A hedging annotation scheme focused on epistemic phrases for informal language
Liliana Mamani Sanchez | Carl Vogel
Proceedings of the Workshop on Models for Modality Annotation


pdf bib
IMHO: An Exploratory Study of Hedging in Web Forums
Liliana Mamani Sanchez | Carl Vogel
Proceedings of the SIGDIAL 2013 Conference


pdf bib
Exploiting CCG Structures with Tree Kernels for Speculation Detection
Liliana Mamani Sánchez | Baoli Li | Carl Vogel
Proceedings of the Fourteenth Conference on Computational Natural Language Learning – Shared Task