Teresa Rodríguez-Ferreira
2016
Improving Information Extraction from Wikipedia Texts using Basic English
Teresa Rodríguez-Ferreira
|
Adrián Rabadán
|
Raquel Hervás
|
Alberto Díaz
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
The aim of this paper is to study the effect that the use of Basic English versus common English has on information extraction from online resources. The amount of online information available to the public grows exponentially, and is potentially an excellent resource for information extraction. The problem is that this information often comes in an unstructured format, such as plain text. In order to retrieve knowledge from this type of text, it must first be analysed to find the relevant details, and the nature of the language used can greatly impact the quality of the extracted information. In this paper, we compare triplets that represent definitions or properties of concepts obtained from three online collaborative resources (English Wikipedia, Simple English Wikipedia and Simple English Wiktionary) and study the differences in the results when Basic English is used instead of common English. The results show that resources written in Basic English produce less quantity of triplets, but with higher quality.