Implicit Discourse Relation Classification: We Need to Talk about Evaluation

Najoung Kim, Song Feng, Chulaka Gunasekara, Luis Lastras


Abstract
Implicit relation classification on Penn Discourse TreeBank (PDTB) 2.0 is a common benchmark task for evaluating the understanding of discourse relations. However, the lack of consistency in preprocessing and evaluation poses challenges to fair comparison of results in the literature. In this work, we highlight these inconsistencies and propose an improved evaluation protocol. Paired with this protocol, we report strong baseline results from pretrained sentence encoders, which set the new state-of-the-art for PDTB 2.0. Furthermore, this work is the first to explore fine-grained relation classification on PDTB 3.0. We expect our work to serve as a point of comparison for future work, and also as an initiative to discuss models of larger context and possible data augmentations for downstream transferability.
Anthology ID:
2020.acl-main.480
Volume:
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Month:
July
Year:
2020
Address:
Online
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5404–5414
Language:
URL:
https://aclanthology.org/2020.acl-main.480
DOI:
10.18653/v1/2020.acl-main.480
Bibkey:
Cite (ACL):
Najoung Kim, Song Feng, Chulaka Gunasekara, and Luis Lastras. 2020. Implicit Discourse Relation Classification: We Need to Talk about Evaluation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5404–5414, Online. Association for Computational Linguistics.
Cite (Informal):
Implicit Discourse Relation Classification: We Need to Talk about Evaluation (Kim et al., ACL 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.acl-main.480.pdf
Video:
 http://slideslive.com/38929324