TV-AfD: An Imperative-Annotated Corpus from The Big Bang Theory and Wikipedia’s Articles for Deletion Discussions
Yimin Xiao | Zong-Ying Slaton | Lu Xiao
Proceedings of the Twelfth Language Resources and Evaluation Conference
In this study, we created an imperative corpus with speech conversations from dialogues in The Big Bang Theory and with the written comments in Wikipedia’s Articles for Deletion discussions. For the TV show data, 59 episodes containing 25,076 statements are used. We manually annotated imperatives based on the annotation guideline adapted from Condoravdi and Lauer’s study (2012) and used the retrieved data to assess the performance of syntax-based classification rules. For the Wikipedia AfD comments data, we first developed and leveraged a syntax-based classifier to extract 10,624 statements that may be imperative, and we manually examined the statements and then identified true positives. With this corpus, we also examined the performance of the rule-based imperative detection tool. Our result shows different outcomes for speech (dialogue) and written data. The rule-based classification performs better in the written data in precision (0.80) compared to the speech data (0.44). Also, the rule-based classification has a low-performance overall for speech data with the precision of 0.44, recall of 0.41, and f-1 measure of 0.42. This finding implies the syntax-based model may need to be adjusted for a speech dataset because imperatives in oral communication have greater syntactic varieties and are highly context-dependent.