Solomon Atnafu
2020
Corpus based Amharic sentiment lexicon generation
Girma Neshir Alemneh
|
Andreas Rauber
|
Solomon Atnafu
Proceedings of the Fourth Widening Natural Language Processing Workshop
Sentiment classification is an active research area with several applications including analysis of political opinions, classifying comments, movie reviews, news reviews and product reviews. To employ rule based sentiment classification, we require sentiment lexicons. However, manual construction of sentiment lexicon is time consuming and costly for resource-limited languages. To bypass manual development time and costs, we tried to build Amharic Sentiment Lexicons relying on corpus based approach. The intention of this approach is to handle sentiment terms specific to Amharic language from Amharic Corpus. Small set of seed terms are manually prepared from three parts of speech such as noun, adjective and verb. We developed algorithms for constructing Amharic sentiment lexicons automatically from Amharic news corpus. Corpus based approach is proposed relying on the word co-occurrence distributional embedding including frequency based embedding (i.e. Positive Point-wise Mutual Information PPMI). Using PPMI with threshold value of 100 and 200, we got corpus based Amharic Sentiment lexicons of size 1811 and 3794 respectively by expanding 519 seeds. Finally, the lexicon generated in corpus based approach is evaluated.
Negation handling for Amharic sentiment classification
Girma Neshir Alemneh
|
Andreas Rauber
|
Solomon Atnafu
Proceedings of the Fourth Widening Natural Language Processing Workshop
User generated content is bringing new aspects of processing data on the web. Due to the advancement of World Wide Web technology, users are not only consumer of web contents but also they are producers of contents in the form of text, audio, video and picture. This study focuses on the analysis of textual contents with subjective information (referring to sentiment analysis). Most of conventional approaches of sentiment analysis do not effectively capture negation in languages where there are limited computational linguistic resources (e.g. Amharic). For this research, we proposed Amharic negation handling framework for Amharic sentiment classification. The proposed framework combines the lexicon based sentiment classification approach and character ngram based machine learning algorithms. Finally, the performance of framework is evaluated using the annotated Amharic news comments. The system is performing the best of all models and the baselines with accuracy of 98.0. The result is compared with the baselines (without negation handling and word level ngram model).
Search