Yasutaka Yokoi


pdf bib
Developing a Dataset of Overridden Information in Wikipedia
Masatoshi Tsuchiya | Yasutaka Yokoi
Proceedings of the Thirteenth Language Resources and Evaluation Conference

This paper proposes a new task of detecting information override. Since all information on the Web is not updated in a timely manner, the necessity is created for information that is overridden by another information source to be discarded. The task is formalized as a binary classification problem to determine whether a reference sentence has overridden a target sentence. In investigating this task, this paper describes a construction procedure for the dataset of overridden information by collecting sentence pairs from the difference between two versions of Wikipedia. Our developing dataset shows that the old version of Wikipedia contains much overridden information and that the detection of information override is necessary.