Responding to the increasing need for automated writing evaluation (AWE) systems to assess language use beyond lexis and grammar (Burstein et al., 2016), we introduce a new approach to identify rhetorical features of stance in academic English writing. Drawing on the discourse-analytic framework of engagement in the Appraisal analysis (Martin & White, 2005), we manually annotated 4,688 sentences (126,411 tokens) for eight rhetorical stance categories (e.g., PROCLAIM, ATTRIBUTION) and additional discourse elements. We then report an experiment to train machine learning models to identify and categorize the spans of these stance expressions. The best-performing model (RoBERTa + LSTM) achieved macro-averaged F1 of .7208 in the span identification of stance-taking expressions, slightly outperforming the intercoder reliability estimates before adjudication (F1 = .6629).
In this paper, we introduce a dependency treebank of spoken second language (L2) English that is annotated with part of speech (Penn POS) tags and syntactic dependencies (Universal Dependencies). We then evaluate the degree to which the use of this treebank as training data affects POS and UD annotation accuracy for L1 web texts, L2 written texts, and L2 spoken texts as compared to models trained on L1 texts only.