Selective Annotation of Modal Readings: Delving into the Difficult Data
Lori Moon | Patricija Kirvaitis | Noreen Madden
Linguistic Issues in Language Technology, Volume 14, 2016 - Modality: Logic, Semantics, Annotation, and Machine Learning
Modal auxiliaries have different readings, depending on the context in which they occur (Kratzer, 1981). Several projects have attempted to classify uses of modal auxiliaries in corpora according to their reading using supervised machine learning techniques (e.g., Rubinstein et al., 2013, Ruppenhofer & Rehbein, 2012). In each study, traditional taxonomic labels, such as ‘epistemic’ and ‘deontic’ are used by human annotators to label instances of modal auxiliaries in a corpus. In order to achieve higher agreement among annotators, results in these previous studies are reported after collapsing some of the initial categories. The results show that human annotators have fairly good agreement on some of the categories, such as whether or not a use is epistemic, but poor agreement on others. They also show that annotators agree more on modals such as might than on modals such as could. In this study, we used traditional taxonomic categories on sentences containing modal auxiliary verbs that were randomly extracted from the English Gigaword 4th edition corpus (Parker et al., 2009). The lowest inner-annotator agreement using traditional taxonomic labels occurred with uses of could, with raw agreements of 42%−48% (κ = 0.196−0.259), compared to might, for instance, with raw agreement of 98%. In response to the low numbers, rather than collapsing traditional categories, we tried a new method of classifying uses of could with respect to where the reading situates the eventuality being described relative to the speech time. For example, the sentence ‘Jess could swim.’ is about a swimming eventuality in the past leading up to the time of speech, if it is read as being an ability. The sentence is about a swimming eventuality in the future, if it is read as being a statement about a possibility. The classification labels we propose are crucial in separating uses of could that have actuality inferences (Bhatt, 1999, Hacquard, 2006) from uses that do not. For the temporal location of the event described by a use of could, using four category labels, we achieved 73%−90% raw agreement (κ = 0.614−0.744). Sequence of tense contexts (Abusch, 1997) present a major factor in the difficulty of determining the temporal properties present in uses of could. Among three annotators, we achieved raw agreement scores of 89%−96%(κ =0.779−0.919%) on identification of sequence of tense contexts. We discuss the role of our findings with respect to textual entailment.