Corpus based Amharic sentiment lexicon generation

Girma Neshir Alemneh, Andreas Rauber, Solomon Atnafu


Abstract
Sentiment classification is an active research area with several applications including analysis of political opinions, classifying comments, movie reviews, news reviews and product reviews. To employ rule based sentiment classification, we require sentiment lexicons. However, manual construction of sentiment lexicon is time consuming and costly for resource-limited languages. To bypass manual development time and costs, we tried to build Amharic Sentiment Lexicons relying on corpus based approach. The intention of this approach is to handle sentiment terms specific to Amharic language from Amharic Corpus. Small set of seed terms are manually prepared from three parts of speech such as noun, adjective and verb. We developed algorithms for constructing Amharic sentiment lexicons automatically from Amharic news corpus. Corpus based approach is proposed relying on the word co-occurrence distributional embedding including frequency based embedding (i.e. Positive Point-wise Mutual Information PPMI). Using PPMI with threshold value of 100 and 200, we got corpus based Amharic Sentiment lexicons of size 1811 and 3794 respectively by expanding 519 seeds. Finally, the lexicon generated in corpus based approach is evaluated.
Anthology ID:
2020.winlp-1.1
Volume:
Proceedings of the Fourth Widening Natural Language Processing Workshop
Month:
July
Year:
2020
Address:
Seattle, USA
Editors:
Rossana Cunha, Samira Shaikh, Erika Varis, Ryan Georgi, Alicia Tsai, Antonios Anastasopoulos, Khyathi Raghavi Chandu
Venue:
WiNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–3
Language:
URL:
https://aclanthology.org/2020.winlp-1.1
DOI:
10.18653/v1/2020.winlp-1.1
Bibkey:
Cite (ACL):
Girma Neshir Alemneh, Andreas Rauber, and Solomon Atnafu. 2020. Corpus based Amharic sentiment lexicon generation. In Proceedings of the Fourth Widening Natural Language Processing Workshop, pages 1–3, Seattle, USA. Association for Computational Linguistics.
Cite (Informal):
Corpus based Amharic sentiment lexicon generation (Alemneh et al., WiNLP 2020)
Copy Citation:
Video:
 http://slideslive.com/38929537