Jiaju Du


2021

pdf bib
CodRED: A Cross-Document Relation Extraction Dataset for Acquiring Knowledge in the Wild
Yuan Yao | Jiaju Du | Yankai Lin | Peng Li | Zhiyuan Liu | Jie Zhou | Maosong Sun
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Existing relation extraction (RE) methods typically focus on extracting relational facts between entity pairs within single sentences or documents. However, a large quantity of relational facts in knowledge bases can only be inferred across documents in practice. In this work, we present the problem of cross-document RE, making an initial step towards knowledge acquisition in the wild. To facilitate the research, we construct the first human-annotated cross-document RE dataset CodRED. Compared to existing RE datasets, CodRED presents two key challenges: Given two entities, (1) it requires finding the relevant documents that can provide clues for identifying their relations; (2) it requires reasoning over multiple documents to extract the relational facts. We conduct comprehensive experiments to show that CodRED is challenging to existing RE methods including strong BERT-based models.

2020

pdf bib
Coreferential Reasoning Learning for Language Representation
Deming Ye | Yankai Lin | Jiaju Du | Zhenghao Liu | Peng Li | Maosong Sun | Zhiyuan Liu
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Language representation models such as BERT could effectively capture contextual semantic information from plain text, and have been proved to achieve promising results in lots of downstream NLP tasks with appropriate fine-tuning. However, most existing language representation models cannot explicitly handle coreference, which is essential to the coherent understanding of the whole discourse. To address this issue, we present CorefBERT, a novel language representation model that can capture the coreferential relations in context. The experimental results show that, compared with existing baseline models, CorefBERT can achieve significant improvements consistently on various downstream NLP tasks that require coreferential reasoning, while maintaining comparable performance to previous models on other common NLP tasks. The source code and experiment details of this paper can be obtained from https://github.com/thunlp/CorefBERT.