Improving Information Extraction from Wikipedia Texts using Basic English

Teresa Rodríguez-Ferreira, Adrián Rabadán, Raquel Hervás, Alberto Díaz


Abstract
The aim of this paper is to study the effect that the use of Basic English versus common English has on information extraction from online resources. The amount of online information available to the public grows exponentially, and is potentially an excellent resource for information extraction. The problem is that this information often comes in an unstructured format, such as plain text. In order to retrieve knowledge from this type of text, it must first be analysed to find the relevant details, and the nature of the language used can greatly impact the quality of the extracted information. In this paper, we compare triplets that represent definitions or properties of concepts obtained from three online collaborative resources (English Wikipedia, Simple English Wikipedia and Simple English Wiktionary) and study the differences in the results when Basic English is used instead of common English. The results show that resources written in Basic English produce less quantity of triplets, but with higher quality.
Anthology ID:
L16-1062
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
395–400
Language:
URL:
https://aclanthology.org/L16-1062
DOI:
Bibkey:
Cite (ACL):
Teresa Rodríguez-Ferreira, Adrián Rabadán, Raquel Hervás, and Alberto Díaz. 2016. Improving Information Extraction from Wikipedia Texts using Basic English. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 395–400, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
Improving Information Extraction from Wikipedia Texts using Basic English (Rodríguez-Ferreira et al., LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1062.pdf