Toqeer Ehsan
2023
AlphaBrains@DravidianLangTech: Sentiment Analysis of Code-Mixed Tamil and Tulu by Training Contextualized ELMo Word Representations
Toqeer Ehsan
|
Amina Tehseen
|
Kengatharaiyer Sarveswaran
|
Amjad Ali
Proceedings of the Third Workshop on Speech and Language Technologies for Dravidian Languages
Sentiment analysis in natural language processing (NLP), endeavors to computationally identify and extract subjective information from textual data. In code-mixed text, sentiment analysis presents a unique challenge due to the mixing of languages within a single textual context. For low-resourced languages such as Tamil and Tulu, predicting sentiment becomes a challenging task due to the presence of text comprising various scripts. In this research, we present the sentiment analysis of code-mixed Tamil and Tulu Youtube comments. We have developed a Bidirectional Long-Short Term Memory (BiLSTM) networks based models for both languages which further uses contextualized word embeddings at input layers of the models. For that purpose, ELMo embeddings have been trained on larger unannotated code-mixed text like corpora. Our models performed with macro average F1-scores of 0.2877 and 0.5133 on Tamil and Tulu code-mixed datasets respectively.
AlphaBrains at WojoodNER shared task: Arabic Named Entity Recognition by Using Character-based Context-Sensitive Word Representations
Toqeer Ehsan
|
Amjad Ali
|
Ala Al-Fuqaha
Proceedings of ArabicNLP 2023
This paper presents Arabic named entity recognition models by employing the single-task and the multi-task learning paradigms. The models have been developed using character-based contextualized Embeddings from Language Model (ELMo) in the input layers of the bidirectional long-short term memory networks. The ELMo embeddings are quite capable of learning the morphology and contextual information of the tokens in word sequences. The single-task learning models outperformed the multi-task learning models and achieved micro F1-scores of 0.8751 and 0.8884 for the flat and nested annotations, respectively.
2020
Dependency Parsing for Urdu: Resources, Conversions and Learning
Toqeer Ehsan
|
Miriam Butt
Proceedings of the Twelfth Language Resources and Evaluation Conference
This paper adds to the available resources for the under-resourced language Urdu by converting different types of existing treebanks for Urdu into a common format that is based on Universal Dependencies. We present comparative results for training two dependency parsers, the MaltParser and a transition-based BiLSTM parser on this new resource. The BiLSTM parser incorporates word embeddings which improve the parsing results significantly. The BiLSTM parser outperforms the MaltParser with a UAS of 89.6 and an LAS of 84.2 with respect to our standardized treebank resource.
Search