Kidane Woldemariyam
2020
Embedding Oriented Adaptable Semantic Annotation Framework for Amharic Web Documents
Kidane Woldemariyam
|
Dr. Fekade Getahun
Proceedings of the Fourth Widening Natural Language Processing Workshop
The Web has become a source of information, where information is provided by humans for humans and its growth has increased necessity to get solutions that intelligently extract valuable knowledge from existing and newly added web documents with no (minimal) supervisions. However, due to the unstructured nature of existing data on the Web, effective extraction of this knowledge is limited for both human beings and software agents. Thus, this research work designed generic and embedding oriented framework that automatically annotates semantically Amharic web documents using ontology. This framework significantly reduces manual annotation and learning cost used for semantic annotation of Amharic web documents with its nature of adaptability with minimal modification. The results have also implied that neural network techniques are promising for semantic annotation, especially for less resourced languages like Amharic in comparison to language dependent techniques that have cost of speed and challenge of adaptation into new domains and languages. We experiment the feasibility of the proposed approach using Amharic news collected from WALTA news agency and Amharic Wikipedia. Our results show that the proposed solution exhibits 70.68% of precision, 66.89% of recall and 68.53% of f-measure in semantic annotation for a morphologically complex Amharic language with limited size dataset.