Cristian Popa
2021
BART-TL: Weakly-Supervised Topic Label Generation
Cristian Popa
|
Traian Rebedea
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume
We propose a novel solution for assigning labels to topic models by using multiple weak labelers. The method leverages generative transformers to learn accurate representations of the most important topic terms and candidate labels. This is achieved by fine-tuning pre-trained BART models on a large number of potential labels generated by state of the art non-neural models for topic labeling, enriched with different techniques. The proposed BART-TL model is able to generate valuable and novel labels in a weakly-supervised manner and can be improved by adding other weak labelers or distant supervision on similar tasks.
2020
Applying Multilingual and Monolingual Transformer-Based Models for Dialect Identification
Cristian Popa
|
Vlad Ștefănescu
Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects
We study the ability of large fine-tuned transformer models to solve a binary classification task of dialect identification, with a special interest in comparing the performance of multilingual to monolingual ones. The corpus analyzed contains Romanian and Moldavian samples from the news domain, as well as tweets for assessing the performance. We find that the monolingual models are superior to the multilingual ones and the best results are obtained using an SVM ensemble of 5 different transformer-based models. We provide our experimental results and an analysis of the attention mechanisms of the best-performing individual classifiers to explain their decisions. The code we used was released under an open-source license.