Dialogue & Discourse (2023)
Volumes
- Dialogue Discourse Volume 14 7 papers
up
Dialogue Discourse Volume 14
The Conversational Discourse Unit: Identification and Its Role in Conversational Turn-taking Management
Junfei Hu | Liesbeth Degand
Junfei Hu | Liesbeth Degand
This study investigates how discourse segmentation and turn-taking interact. Mapping syntactic, prosodic and pragmatic units, five types of conversational discourse units (CDU) were identified. Based on this segmentation, associations were examined between the syntactic, prosodic and pragmatic boundaries and turn-taking, as well as the transition speed after each type of CDU. Results show: 1) The relationships between the three linguistic boundaries and the occurrence of turn-taking were significant, and the association was the strongest for the pragmatic boundaries; it was weaker for prosodic boundaries and the weakest for the syntactic boundaries. 2) The type of CDU influenced the transition speed, with the pragmatic-syntax-bound CDU being fastest. The study highlights the importance of meaning-connection and earlier emergence of the utterance gist in timing turn-taking.
Exploring the Sensitivity to Alternative Signals of Coherence Relations
Ekaterina Tskhovrebova | Sandrine Zufferey | Pascal Gygax
Ekaterina Tskhovrebova | Sandrine Zufferey | Pascal Gygax
Coherence relations between elements of discourse can be signaled by linguistic devices such as connectives and/or alternative signals. While the use and comprehension of connectives have been studied in different categories of speakers, less is known about the functioning of alternative signals of coherence relations, especially in younger populations. In the current study, we aim to examine the sensitivity of French-speaking teenagers to the alternative signals of list relation (words such as plusieurs ‘several’ and différents ‘various’), combined with connectives varying in frequency and signaling two types of coherence relations (addition: en plus, en outre; consequence: donc, ainsi). Our results reveal that, as early as in teenage years, speakers are sensitive (i.e., they produce list continuation sentences) to alternative signals of list relation. Furthermore, the inference of list relation is not significantly changed when an alternative signal is combined with the more frequent additive connective en plus. However, this inference is inhibited by the less frequent additive connective en outre, and is almost completely hindered by the consequence connectives donc and ainsi. Overall, these results show that alternative list signals are an important source for the inference of the list relation, even in the presence of more salient signals of coherence such as connectives.
Scoring Coreference Chains with Split-Antecedent Anaphors
Silviu Paun | Juntao Yu | Nafise Sadat Moosavi | Massimo Poesio
Silviu Paun | Juntao Yu | Nafise Sadat Moosavi | Massimo Poesio
Anaphoric reference is an aspect of language interpretation covering a variety of types of interpretation beyond the simple case of identity reference to entities introduced via nominal expressions covered by the traditional coreference task in its most recent incarnation in ONTONOTES and similar datasets. One of these cases that go beyond simple coreference is anaphoric reference to entities that must be added to the discourse model via accommodation, and in particular split-antecedent references to entities constructed out of other entities, as in split-antecedent plurals and in some cases of discourse deixis. Although this type of anaphoric reference is now annotated in many datasets, systems interpreting such references cannot be evaluated using the Reference coreference scorer Pradhan et al. (2014). As part of the work towards a new scorer for anaphoric reference able to evaluate all aspects of anaphoric interpretation in the coverage of the Universal Anaphora initiative, we propose in this paper a solution to the technical problem of generalizing existing metrics for identity anaphora so that they can also be used to score cases of split-antecedents. This is the first such proposal in the literature on anaphora or coreference, and has been successfully used to score both split-antecedent plural references and discourse deixis in the recent CODI/CRAC anaphora resolution in dialogue shared tasks.
Connectives convey discourse functions that provide textual and pragmatic information in speech communication on top of canonical, sentential use. This paper proposes an applicable scheme with illustrative examples for distinguishing Sentential, Conclusion, Disfluency, Elaboration, and Resumption uses of Mandarin connectives, including conjunctions and adverbs. Quantitative results of our annotation works are presented to gain an overview of connectives in a Mandarin conversational speech corpus. A fine-grained taxonomy is also discussed, but it requires more empirical data to approve the applicability. By conducting a multinomial logistic regression model, we illustrate that connectives exhibit consistent patterns in positional, phonetic, and contextual features oriented to the associated discourse functions. Our results confirm that the position of Conclusion and Resumption connectives orient more to positions in semantically, rather than prosodically, determined units. We also found that connectives used for all four discourse functions tend to have a higher initial F0 value than those of sentential use. Resumption and Disfluency uses are expected to have the largest increase in initial F0 value, followed by Conclusion and Elaboration uses. Durational cues of the preceding context enable distinguishing Sentential use from discourse uses of Conclusion, Elaboration, and Resumption of connectives.
Fact checking and fake news detection has garnered increasing interest within the natural language processing (NLP) community in recent years, yet other aspects of misinformation remain unexplored. One such phenomenon is ‘bullshit’, which different disciplines have tried to define since it first entered academic discussion nearly four decades ago. Fact checking bullshitters is useless, because factual reality typically plays no part in their assertions: Where liars deceive about content, bullshitters deceive about their goals. Bullshitting is misleading about language itself, which necessitates identifying the points at which pragmatic conventions are broken with deceptive intent. This paper aims to introduce bullshitology into the field of NLP by tying it to questions in a QUD-based definition, providing two approaches to bullshit annotation, and finally outlining which combinations of NLP methods will be helpful to classify which kinds of linguistic bullshit.
I propose a discourse-level analysis of report constructions. Indirect discourse, mixed and direct quotation, free indirect discourse, and attitude ascriptions are all analyzed in terms of a discourse relation of ATTRIBUTION, connecting two propositional discourse units corresponding to (i) a frame segment (he said, she dreamed) and a (possibly complex, multi-sentence) report (“I’m an idiot”, (that) she was president). I provide a unified semantics for the discourse relation of ATTRIBUTION that invokes a flexible notion of ‘characterization’. A discourse unit may characterize a speech event by reproducing its linguistic surface form (as in quotation) or its propositional content (as in indirect speech and attitude reports), or some mixture of both (as in mixed quotation or free indirect discourse). I formalize this unified discourse-level ATTRIBUTION approach to reporting within the general framework of SDRT, and apply it to direct, indirect, and free indirect reports that extend beyond the single embedded or quoted clause. The resulting account is the first to do justice to the complex internal dependencies within stretches of reported discourse.
Automatic Essay Scoring Systems Are Both Overstable And Oversensitive: Explaining Why And Proposing Defenses
Yaman Kumar | Swapnil Parekh | Somesh Singh | Junyi Jessy Li | Rajiv Ratn Shah | Changyou Chen
Yaman Kumar | Swapnil Parekh | Somesh Singh | Junyi Jessy Li | Rajiv Ratn Shah | Changyou Chen
Deep-learning based Automatic Essay Scoring (AES) systems are being actively used in various high-stake applications in education and testing. However, little research has been put to understand and interpret the black-box nature of deep-learning-based scoring algorithms. While previous studies indicate that scoring models can be easily fooled, in this paper, we explore the reason behind their surprising adversarial brittleness. We utilize recent advances in interpretability to find the extent to which features such as coherence, content, vocabulary, and relevance are important for automated scoring mechanisms. We use this to investigate the oversensitivity (i.e., large change in output score with a little change in input essay content) and overstability (i.e., little change in output scores with large changes in input essay content) of AES. Our results indicate that autoscoring models, despite getting trained as “end-to-end” models with rich contextual embeddings such as BERT, behave like bag-of-words models. A few words determine the essay score without the requirement of any context making the model largely overstable. This is in stark contrast to recent probing studies on pre-trained representation learning models, which show that rich linguistic features such as parts-of-speech and morphology are encoded by them. Further, we also find that the models have learnt dataset biases, making them oversensitive. The presence of a few words with high co-occurrence with a certain score class makes the model associate the essay sample with that score. This causes score changes in ∼95% of samples with an addition of only a few words. To deal with these issues, we propose detection-based protection models that can detect oversensitivity and samples causing overstability with high accuracies. We find that our proposed models are able to detect unusual attribution patterns and flag adversarial samples successfully.