A major challenge in the practical use of Machine Translation (MT) is that users lack information on translation quality to make informed decisions about how to rely on outputs. Progress in quality estimation research provides techniques to automatically assess MT quality, but these techniques have primarily been evaluated in vitro by comparison against human judgments outside of a specific context of use. This paper evaluates quality estimation feedback in vivo with a human study in realistic high-stakes medical settings. Using Emergency Department discharge instructions, we study how interventions based on quality estimation versus backtranslation assist physicians in deciding whether to show MT outputs to a patient. We find that quality estimation improves appropriate reliance on MT, but backtranslation helps physicians detect more clinically harmful errors that QE alone often misses.
It has been shown that anonymity affects various aspects of online communications such as message credibility, the trust among communicators, and the participants’ accountability and reputation. Anonymity influences social interactions in online communities in these many ways, which can lead to influences on opinion change and the persuasiveness of a message. Prior studies also suggest that the effect of anonymity can vary in different online communication contexts and online communities. In this study, we focus on Wikipedia Articles for Deletion (AfD) discussions as an example of online collaborative communities to study the relationship between anonymity and persuasiveness in this context. We find that in Wikipedia AfD discussions, more identifiable users tend to be more persuasive. The higher persuasiveness can be related to multiple aspects, including linguistic features of the comments, the user’s motivation to participate, persuasive skills the user learns over time, and the user’s identity and credibility established in the community through participation.
In this study, we created an imperative corpus with speech conversations from dialogues in The Big Bang Theory and with the written comments in Wikipedia’s Articles for Deletion discussions. For the TV show data, 59 episodes containing 25,076 statements are used. We manually annotated imperatives based on the annotation guideline adapted from Condoravdi and Lauer’s study (2012) and used the retrieved data to assess the performance of syntax-based classification rules. For the Wikipedia AfD comments data, we first developed and leveraged a syntax-based classifier to extract 10,624 statements that may be imperative, and we manually examined the statements and then identified true positives. With this corpus, we also examined the performance of the rule-based imperative detection tool. Our result shows different outcomes for speech (dialogue) and written data. The rule-based classification performs better in the written data in precision (0.80) compared to the speech data (0.44). Also, the rule-based classification has a low-performance overall for speech data with the precision of 0.44, recall of 0.41, and f-1 measure of 0.42. This finding implies the syntax-based model may need to be adjusted for a speech dataset because imperatives in oral communication have greater syntactic varieties and are highly context-dependent.