Jordan Schmidek
2019
KnowledgeNet: A Benchmark Dataset for Knowledge Base Population
Filipe Mesquita
|
Matteo Cannaviccio
|
Jordan Schmidek
|
Paramita Mirza
|
Denilson Barbosa
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
KnowledgeNet is a benchmark dataset for the task of automatically populating a knowledge base (Wikidata) with facts expressed in natural language text on the web. KnowledgeNet provides text exhaustively annotated with facts, thus enabling the holistic end-to-end evaluation of knowledge base population systems as a whole, unlike previous benchmarks that are more suitable for the evaluation of individual subcomponents (e.g., entity linking, relation extraction). We discuss five baseline approaches, where the best approach achieves an F1 score of 0.50, significantly outperforming a traditional approach by 79% (0.28). However, our best baseline is far from reaching human performance (0.82), indicating our dataset is challenging. The KnowledgeNet dataset and baselines are available at https://github.com/diffbot/knowledge-net
2014
Improving Open Relation Extraction via Sentence Re-Structuring
Jordan Schmidek
|
Denilson Barbosa
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Information Extraction is an important task in Natural Language Processing, consisting of finding a structured representation for the information expressed in natural language text. Two key steps in information extraction are identifying the entities mentioned in the text, and the relations among those entities. In the context of Information Extraction for the World Wide Web, unsupervised relation extraction methods, also called Open Relation Extraction (ORE) systems, have become prevalent, due to their effectiveness without domain-specific training data. In general, these systems exploit part-of-speech tags or semantic information from the sentences to determine whether or not a relation exists, and if so, its predicate. This paper discusses some of the issues that arise when even moderately complex sentences are fed into ORE systems. A process for re-structuring such sentences is discussed and evaluated. The proposed approach replaces complex sentences by several others that, together, convey the same meaning and are more amenable to extraction by current ORE systems. The results of an experimental evaluation show that this approach succeeds in reducing the processing time and increasing the accuracy of the state-of-the-art ORE systems.
2013
Effectiveness and Efficiency of Open Relation Extraction
Filipe Mesquita
|
Jordan Schmidek
|
Denilson Barbosa
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing
Search