2024
pdf
bib
abs
Strong hallucinations from negation and how to fix them
Swarnadeep Bhar
|
Nicholas Asher
Findings of the Association for Computational Linguistics: ACL 2024
Despite great performance on many tasks, language models (LMs) still struggle with reasoning, sometimes providing responses that cannot possibly be true because they stem from logical incoherence. We call such responses strong hallucinations and prove that they follow from an LM’s computation of its internal representations for logical operators and outputs from those representations. Focusing on negation, we provide a novel solution in which negation is treated not as another element of a latent representation, but as an operation over an LM’s latent representations that constrains how they may evolve. We show that our approach improves model performance in cloze prompting and natural language inference tasks with negation without requiring training on sparse negative data.
pdf
bib
abs
Llamipa: An Incremental Discourse Parser
Kate Thompson
|
Akshay Chaturvedi
|
Julie Hunter
|
Nicholas Asher
Findings of the Association for Computational Linguistics: EMNLP 2024
This paper provides the first discourse parsing experiments with a large language model (LLM) finetuned on corpora annotated in the style of SDRT (Segmented Discourse Representation Theory, Asher (1993), Asher and Lascarides (2003)). The result is a discourse parser, Llamipa (Llama Incremental Parser), that leverages discourse context, leading to substantial performance gains over approaches that use encoder-only models to provide local, context-sensitive representations of discourse units. Furthermore, it is able to process discourse data incrementally, which is essential for the eventual use of discourse information in downstream tasks.
pdf
bib
abs
Nebula: A discourse aware Minecraft Builder
Akshay Chaturvedi
|
Kate Thompson
|
Nicholas Asher
Findings of the Association for Computational Linguistics: EMNLP 2024
When engaging in collaborative tasks, humans efficiently exploit the semantic structure of a conversation to optimize verbal and nonverbal interactions. But in recent “language to code” or “language to action” models, this information is lacking. We show how incorporating the prior discourse and nonlinguistic context of a conversation situated in a nonlinguistic environment can improve the “language to action” component of such interactions. We finetune an LLM to predict actions based on prior context; our model, Nebula, doubles the net-action F1 score over the baseline on this task of Jayannavar et al. (2020). We also investigate our model’s ability to construct shapes and understand location descriptions using a synthetic dataset.
pdf
bib
abs
Learning Semantic Structure through First-Order-Logic Translation
Akshay Chaturvedi
|
Nicholas Asher
Findings of the Association for Computational Linguistics: EMNLP 2024
In this paper, we study whether transformer-based language models can extract predicate argument structure from simple sentences. We firstly show that language models sometimes confuse which predicates apply to which objects. To mitigate this, we explore two tasks: question answering (Q/A), and first order logic (FOL) translation, and two regimes, prompting and finetuning. In FOL translation, we finetune several large language models on synthetic datasets designed to gauge their generalization abilities. For Q/A, we finetune encoder models like BERT and RoBERTa and use prompting for LLMs. The results show that FOL translation for LLMs is better suited to learn predicate argument structure.
pdf
bib
abs
In2Core: Leveraging Influence Functions for Coreset Selection in Instruction Finetuning of Large Language Models
Ayrton San Joaquin
|
Bin Wang
|
Zhengyuan Liu
|
Philippe Muller
|
Nicholas Asher
|
Brian Lim
|
Nancy F. Chen
Findings of the Association for Computational Linguistics: EMNLP 2024
Despite advancements, fine-tuning Large Language Models (LLMs) remains costly due to the extensive parameter count and substantial data requirements for model generalization. Accessibility to computing resources remains a barrier for the open-source community. To address this challenge, we propose the In2Core algorithm, which selects a coreset by analyzing the correlation between training and evaluation samples with a trained model. Notably, we assess the model’s internal gradients to estimate this relationship, aiming to rank the contribution of each training point. To enhance efficiency, we propose an optimization to compute influence functions with a reduced number of layers while achieving similar accuracy. By applying our algorithm to instruction fine-tuning data of LLMs, we can achieve similar performance with just 50% of the training data. Meantime, using influence functions to analyze model coverage to certain testing samples could provide a reliable and interpretable signal on the training set’s coverage of those test points.
pdf
bib
abs
Analyzing Semantic Faithfulness of Language Models via Input Intervention on Question Answering
Akshay Chaturvedi
|
Swarnadeep Bhar
|
Soumadeep Saha
|
Utpal Garain
|
Nicholas Asher
Computational Linguistics, Volume 50, Issue 1 - March 2024
Transformer-based language models have been shown to be highly effective for several NLP tasks. In this article, we consider three transformer models, BERT, RoBERTa, and XLNet, in both small and large versions, and investigate how faithful their representations are with respect to the semantic content of texts. We formalize a notion of semantic faithfulness, in which the semantic content of a text should causally figure in a model’s inferences in question answering. We then test this notion by observing a model’s behavior on answering questions about a story after performing two novel semantic interventions—deletion intervention and negation intervention. While transformer models achieve high performance on standard question answering tasks, we show that they fail to be semantically faithful once we perform these interventions for a significant number of cases (∼ 50% for deletion intervention, and ∼ 20% drop in accuracy for negation intervention). We then propose an intervention-based training regime that can mitigate the undesirable effects for deletion intervention by a significant margin (from ∼ 50% to ∼ 6%). We analyze the inner-workings of the models to better understand the effectiveness of intervention-based training for deletion intervention. But we show that this training does not attenuate other aspects of semantic unfaithfulness such as the models’ inability to deal with negation intervention or to capture the predicate–argument structure of texts. We also test InstructGPT, via prompting, for its ability to handle the two interventions and to capture predicate–argument structure. While InstructGPT models do achieve very high performance on predicate–argument structure task, they fail to respond adequately to our deletion and negation interventions.
pdf
bib
abs
Discourse Structure for the Minecraft Corpus
Kate Thompson
|
Julie Hunter
|
Nicholas Asher
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
We provide a new linguistic resource: The Minecraft Structured Dialogue Corpus (MSDC), a discourse annotated version of the Minecraft Dialogue Corpus (MDC; Narayan-Chen et al., 2019), with complete, situated discourse structures in the style of SDRT (Asher and Lascarides, 2003). Our structures feature both linguistic discourse moves and nonlinguistic actions. To show computational tractability, we train a discourse parser with a novel “2 pass architecture” on MSDC that gives excellent results on attachment prediction and relation labeling tasks especially long distance attachments.
2023
pdf
bib
abs
COCKATIEL: COntinuous Concept ranKed ATtribution with Interpretable ELements for explaining neural net classifiers on NLP
Fanny Jourdan
|
Agustin Picard
|
Thomas Fel
|
Laurent Risser
|
Jean-Michel Loubes
|
Nicholas Asher
Findings of the Association for Computational Linguistics: ACL 2023
Transformer architectures are complex and their use in NLP, while it has engendered many successes, makes their interpretability or explainability challenging. Recent debates have shown that attention maps and attribution methods are unreliable (Pruthi et al., 2019; Brunner et al., 2019). In this paper, we present some of their limitations and introduce COCKATIEL, which successfully addresses some of them. COCKATIEL is a novel, post-hoc, concept-based, model-agnostic XAI technique that generates meaningful explanations from the last layer of a neural net model trained on an NLP classification task by using Non-Negative Matrix Factorization (NMF) to discover the concepts the model leverages to make predictions and by exploiting a Sensitivity Analysis to estimate accurately the importance of each of these concepts for the model. It does so without compromising the accuracy of the underlying model or requiring a new one to be trained. We conduct experiments in single and multi-aspect sentiment analysis tasks and we show COCKATIEL’s superior ability to discover concepts that align with humans’ on Transformer models without any supervision, we objectively verify the faithfulness of its explanations through fidelity metrics, and we showcase its ability to provide meaningful explanations in two different datasets. Our code is freely available:
https://github.com/fanny-jourdan/cockatielpdf
bib
abs
A simple but effective model for attachment in discourse parsing with multi-task learning for relation labeling
Zineb Bennis
|
Julie Hunter
|
Nicholas Asher
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics
In this paper, we present a discourse parsing model for conversation trained on the STAC. We fine-tune a BERT-based model to encode pairs of discourse units and use a simple linear layer to predict discourse attachments. We then exploit a multi-task setting to predict relation labels. The multitask approach effectively aids in the difficult task of relation type prediction; our f1 score of 57 surpasses the state of the art with no loss in performance for attachment, confirming the intuitive interdependence of these two tasks. Our method also improves over previous discourse parsing models in allowing longer input sizes and in permitting attachments in which one node has multiple parents, an important feature of multiparty conversation.
pdf
bib
abs
Are fairness metric scores enough to assess discrimination biases in machine learning?
Fanny Jourdan
|
Laurent Risser
|
Jean-michel Loubes
|
Nicholas Asher
Proceedings of the 3rd Workshop on Trustworthy Natural Language Processing (TrustNLP 2023)
This paper presents novel experiments shedding light on the shortcomings of current metrics for assessing biases of gender discrimination made by machine learning algorithms on textual data. We focus on the Bios dataset, and our learning task is to predict the occupation of individuals, based on their biography. Such prediction tasks are common in commercial Natural Language Processing (NLP) applications such as automatic job recommendations. We address an important limitation of theoretical discussions dealing with group-wise fairness metrics: they focus on large datasets, although the norm in many industrial NLP applications is to use small to reasonably large linguistic datasets for which the main practical constraint is to get a good prediction accuracy. We then question how reliable are different popular measures of bias when the size of the training set is simply sufficient to learn reasonably accurate predictions. Our experiments sample the Bios dataset and learn more than 200 models on different sample sizes. This allows us to statistically study our results and to confirm that common gender bias indices provide diverging and sometimes unreliable results when applied to relatively small training and test samples. This highlights the crucial importance of variance calculations for providing sound results in this field.
pdf
bib
abs
Limits for learning with language models
Nicholas Asher
|
Swarnadeep Bhar
|
Akshay Chaturvedi
|
Julie Hunter
|
Soumya Paul
Proceedings of the 12th Joint Conference on Lexical and Computational Semantics (*SEM 2023)
With the advent of large language models (LLMs), the trend in NLP has been to train LLMs on vast amounts of data to solve diverse language understanding and generation tasks. The list of LLM successes is long and varied. Nevertheless, several recent papers provide empirical evidence that LLMs fail to capture important aspects of linguistic meaning. Focusing on universal quantification, we provide a theoretical foundation for these empirical findings by proving that LLMs cannot learn certain fundamental semantic properties including semantic entailment and consistency as they are defined in formal semantics. More generally, we show that LLMs are unable to learn concepts beyond the first level of the Borel Hierarchy, which imposes severe limits on the ability of LMs, both large and small, to capture many aspects of linguistic meaning. This means that LLMs will operate without formal guarantees on tasks that require entailments and deep linguistic understanding.
2019
pdf
bib
abs
Analyse faiblement supervisée de conversation en actes de dialogue (Weakly supervised dialog act analysis)
Catherine Thompson
|
Nicholas Asher
|
Philippe Muller
|
Jérémy Auguste
Actes de la Conférence sur le Traitement Automatique des Langues Naturelles (TALN) PFIA 2019. Volume II : Articles courts
Nous nous intéressons ici à l’analyse de conversation par chat dans un contexte orienté-tâche avec un conseiller technique s’adressant à un client, où l’objectif est d’étiqueter les énoncés en actes de dialogue, pour alimenter des analyses des conversations en aval. Nous proposons une méthode légèrement supervisée à partir d’heuristiques simples, de quelques annotations de développement, et une méthode d’ensemble sur ces règles qui sert à annoter automatiquement un corpus plus large de façon bruitée qui peut servir d’entrainement à un modèle supervisé. Nous comparons cette approche à une approche supervisée classique et montrons qu’elle atteint des résultats très proches, à un coût moindre et tout en étant plus facile à adapter à de nouvelles données.
pdf
bib
abs
Apprentissage faiblement supervisé de la structure discursive (Learning discourse structure using weak supervision )
Sonia Badene
|
Catherine Thompson
|
Nicholas Asher
|
Jean-Pierre Lorré
Actes de la Conférence sur le Traitement Automatique des Langues Naturelles (TALN) PFIA 2019. Volume II : Articles courts
L’avènement des techniques d’apprentissage automatique profond a fait naître un besoin énorme de données d’entraînement. De telles données d’entraînement sont extrêmement coûteuses à créer, surtout lorsqu’une expertise dans le domaine est requise. L’une de ces tâches est l’apprentissage de la structure sémantique du discours, tâche très complexe avec des structures récursives avec des données éparses, mais qui est essentielle pour extraire des informations sémantiques profondes du texte. Nous décrivons nos expérimentations sur l’attachement des unités discursives pour former une structure, en utilisant le paradigme du data programming dans lequel peu ou pas d’annotations sont utilisées pour construire un ensemble de données d’entraînement “bruité”. Le corpus de dialogues utilisé illustre des contraintes à la fois linguistiques et non-linguistiques intéressantes qui doivent être apprises. Nous nous concentrons sur la structure des règles utilisées pour construire un modèle génératif et montrons la compétitivité de notre approche par rapport à l’apprentissage supervisé classique.
pdf
bib
abs
Data Programming for Learning Discourse Structure
Sonia Badene
|
Kate Thompson
|
Jean-Pierre Lorré
|
Nicholas Asher
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
This paper investigates the advantages and limits of data programming for the task of learning discourse structure. The data programming paradigm implemented in the Snorkel framework allows a user to label training data using expert-composed heuristics, which are then transformed via the “generative step” into probability distributions of the class labels given the training candidates. These results are later generalized using a discriminative model. Snorkel’s attractive promise to create a large amount of annotated data from a smaller set of training data by unifying the output of a set of heuristics has yet to be used for computationally difficult tasks, such as that of discourse attachment, in which one must decide where a given discourse unit attaches to other units in a text in order to form a coherent discourse structure. Although approaching this problem using Snorkel requires significant modifications to the structure of the heuristics, we show that weak supervision methods can be more than competitive with classical supervised learning approaches to the attachment problem.
pdf
bib
abs
Weak Supervision for Learning Discourse Structure
Sonia Badene
|
Kate Thompson
|
Jean-Pierre Lorré
|
Nicholas Asher
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
This paper provides a detailed comparison of a data programming approach with (i) off-the-shelf, state-of-the-art deep learning architectures that optimize their representations (BERT) and (ii) handcrafted-feature approaches previously used in the discourse analysis literature. We compare these approaches on the task of learning discourse structure for multi-party dialogue. The data programming paradigm offered by the Snorkel framework allows a user to label training data using expert-composed heuristics, which are then transformed via the “generative step” into probability distributions of the class labels given the data. We show that on our task the generative model outperforms both deep learning architectures as well as more traditional ML approaches when learning discourse structure—it even outperforms the combination of deep learning methods and hand-crafted features. We also implement several strategies for “decoding” our generative model output in order to improve our results. We conclude that weak supervision methods hold great promise as a means for creating and improving data sets for discourse structure.
2018
pdf
bib
abs
A Dependency Perspective on RST Discourse Parsing and Evaluation
Mathieu Morey
|
Philippe Muller
|
Nicholas Asher
Computational Linguistics, Volume 44, Issue 2 - June 2018
Computational text-level discourse analysis mostly happens within Rhetorical Structure Theory (RST), whose structures have classically been presented as constituency trees, and relies on data from the RST Discourse Treebank (RST-DT); as a result, the RST discourse parsing community has largely borrowed from the syntactic constituency parsing community. The standard evaluation procedure for RST discourse parsers is thus a simplified variant of PARSEVAL, and most RST discourse parsers use techniques that originated in syntactic constituency parsing. In this article, we isolate a number of conceptual and computational problems with the constituency hypothesis. We then examine the consequences, for the implementation and evaluation of RST discourse parsers, of adopting a dependency perspective on RST structures, a view advocated so far only by a few approaches to discourse parsing. While doing that, we show the importance of the notion of headedness of RST structures. We analyze RST discourse parsing as dependency parsing by adapting to RST a recent proposal in syntactic parsing that relies on head-ordered dependency trees, a representation isomorphic to headed constituency trees. We show how to convert the original trees from the RST corpus, RST-DT, and their binarized versions used by all existing RST parsers to head-ordered dependency trees. We also propose a way to convert existing simple dependency parser output to constituent trees. This allows us to evaluate and to compare approaches from both constituent-based and dependency-based perspectives in a unified framework, using constituency and dependency metrics. We thus propose an evaluation framework to compare extant approaches easily and uniformly, something the RST parsing community has lacked up to now. We can also compare parsers’ predictions to each other across frameworks. This allows us to characterize families of parsing strategies across the different frameworks, in particular with respect to the notion of headedness. Our experiments provide evidence for the conceptual similarities between dependency parsers and shift-reduce constituency parsers, and confirm that dependency parsing constitutes a viable approach to RST discourse parsing.
2017
pdf
bib
abs
How much progress have we made on RST discourse parsing? A replication study of recent results on the RST-DT
Mathieu Morey
|
Philippe Muller
|
Nicholas Asher
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
This article evaluates purported progress over the past years in RST discourse parsing. Several studies report a relative error reduction of 24 to 51% on all metrics that authors attribute to the introduction of distributed representations of discourse units. We replicate the standard evaluation of 9 parsers, 5 of which use distributed representations, from 8 studies published between 2013 and 2017, using their predictions on the test set of the RST-DT. Our main finding is that most recently reported increases in RST discourse parser performance are an artefact of differences in implementations of the evaluation procedure. We evaluate all these parsers with the standard Parseval procedure to provide a more accurate picture of the actual RST discourse parsers performance in standard evaluation settings. Under this more stringent procedure, the gains attributable to distributed representations represent at most a 16% relative error reduction on fully-labelled structures.
pdf
bib
Proceedings of the IWCS workshop on Foundations of Situated and Multimodal Communication
Nicholas Asher
|
Julie Hunter
|
Alex Lascarides
Proceedings of the IWCS workshop on Foundations of Situated and Multimodal Communication
2016
pdf
bib
Integer Linear Programming for Discourse Parsing
Jérémy Perret
|
Stergos Afantenos
|
Nicholas Asher
|
Mathieu Morey
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
pdf
bib
abs
Parallel Discourse Annotations on a Corpus of Short Texts
Manfred Stede
|
Stergos Afantenos
|
Andreas Peldszus
|
Nicholas Asher
|
Jérémy Perret
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
We present the first corpus of texts annotated with two alternative approaches to discourse structure, Rhetorical Structure Theory (Mann and Thompson, 1988) and Segmented Discourse Representation Theory (Asher and Lascarides, 2003). 112 short argumentative texts have been analyzed according to these two theories. Furthermore, in previous work, the same texts have already been annotated for their argumentation structure, according to the scheme of Peldszus and Stede (2013). This corpus therefore enables studies of correlations between the two accounts of discourse structure, and between discourse and argumentation. We converted the three annotation formats to a common dependency tree format that enables to compare the structures, and we describe some initial findings.
pdf
bib
abs
Discourse Structure and Dialogue Acts in Multiparty Dialogue: the STAC Corpus
Nicholas Asher
|
Julie Hunter
|
Mathieu Morey
|
Benamara Farah
|
Stergos Afantenos
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
This paper describes the STAC resource, a corpus of multi-party chats annotated for discourse structure in the style of SDRT (Asher and Lascarides, 2003; Lascarides and Asher, 2009). The main goal of the STAC project is to study the discourse structure of multi-party dialogues in order to understand the linguistic strategies adopted by interlocutors to achieve their conversational goals, especially when these goals are opposed. The STAC corpus is not only a rich source of data on strategic conversation, but also the first corpus that we are aware of that provides full discourse structures for multi-party dialogues. It has other remarkable features that make it an interesting resource for other topics: interleaved threads, creative language, and interactions between linguistic and extra-linguistic contexts.
pdf
bib
Integrating Type Theory and Distributional Semantics: A Case Study on Adjective–Noun Compositions
Nicholas Asher
|
Tim Van de Cruys
|
Antoine Bride
|
Márta Abrusán
Computational Linguistics, Volume 42, Issue 4 - December 2016
2015
pdf
bib
Discourse parsing for multi-party chat dialogues
Stergos Afantenos
|
Eric Kow
|
Nicholas Asher
|
Jérémy Perret
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing
pdf
bib
Integrating Non-Linguistic Events into Discourse Structure
Julie Hunter
|
Nicholas Asher
|
Alex Lascarides
Proceedings of the 11th International Conference on Computational Semantics
pdf
bib
Dynamics of Public Commitments in Dialogue
Antoine Venant
|
Nicholas Asher
Proceedings of the 11th International Conference on Computational Semantics
pdf
bib
A Generalisation of Lexical Functions for Composition in Distributional Semantics
Antoine Bride
|
Tim Van de Cruys
|
Nicholas Asher
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
2014
pdf
bib
Unsupervised extraction of semantic relations using discourse cues
Juliette Conrath
|
Stergos Afantenos
|
Nicholas Asher
|
Philippe Muller
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers
pdf
bib
An evaluation of various methods for adjective-nouns composition (Une évaluation approfondie de différentes méthodes de compositionalité sémantique) [in French]
Antoine Bride
|
Tim Van de Cruys
|
Nicolas Asher
Proceedings of TALN 2014 (Volume 1: Long Papers)
pdf
bib
Unsupervised extraction of semantic relations (Extraction non supervisée de relations sémantiques lexicales) [in French]
Juliette Conrath
|
Stergos Afantenos
|
Nicholas Asher
|
Philippe Muller
Proceedings of TALN 2014 (Volume 1: Long Papers)
2013
pdf
bib
Grounding Strategic Conversation: Using Negotiation Dialogues to Predict Trades in a Win-Lose Game
Anaïs Cadilhac
|
Nicholas Asher
|
Farah Benamara
|
Alex Lascarides
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing
pdf
bib
Sentiment Composition Using a Parabolic Model
Baptiste Chardon
|
Farah Benamara
|
Yannick Mathieu
|
Vladimir Popescu
|
Nicholas Asher
Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013) – Long Papers
pdf
bib
Expressivity and comparison of models of discourse structure
Antoine Venant
|
Nicholas Asher
|
Philippe Muller
|
Pascal Denis
|
Stergos Afantenos
Proceedings of the SIGDIAL 2013 Conference
2012
pdf
bib
Extraction de préférences à partir de dialogues de négociation (Towards Preference Extraction From Negotiation Dialogues) [in French]
Anaïs Cadilhac
|
Farah Benamara
|
Vladimir Popescu
|
Nicholas Asher
|
Mohamadou Seck
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 2: TALN
pdf
bib
Annotating Preferences in Chats for Strategic Games
Anaïs Cadilhac
|
Nicholas Asher
|
Farah Benamara
Proceedings of the Sixth Linguistic Annotation Workshop
pdf
bib
How do Negation and Modality Impact on Opinions?
Farah Benamara
|
Baptiste Chardon
|
Yannick Mathieu
|
Vladimir Popescu
|
Nicholas Asher
Proceedings of the Workshop on Extra-Propositional Aspects of Meaning in Computational Linguistics
pdf
bib
Constrained Decoding for Text-Level Discourse Parsing
Philippe Muller
|
Stergos Afantenos
|
Pascal Denis
|
Nicholas Asher
Proceedings of COLING 2012
pdf
bib
abs
An empirical resource for discovering cognitive principles of discourse organisation: the ANNODIS corpus
Stergos Afantenos
|
Nicholas Asher
|
Farah Benamara
|
Myriam Bras
|
Cécile Fabre
|
Mai Ho-dac
|
Anne Le Draoulec
|
Philippe Muller
|
Marie-Paule Péry-Woodley
|
Laurent Prévot
|
Josette Rebeyrolles
|
Ludovic Tanguy
|
Marianne Vergez-Couret
|
Laure Vieu
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
This paper describes the ANNODIS resource, a discourse-level annotated corpus for French. The corpus combines two perspectives on discourse: a bottom-up approach and a top-down approach. The bottom-up view incrementally builds a structure from elementary discourse units, while the top-down view focuses on the selective annotation of multi-level discourse structures. The corpus is composed of texts that are diversified with respect to genre, length and type of discursive organisation. The methodology followed here involves an iterative design of annotation guidelines in order to reach satisfactory inter-annotator agreement levels. This allows us to raise a few issues relevant for the comparison of such complex objects as discourse structures. The corpus also serves as a source of empirical evidence for discourse theories. We present here two first analyses taking advantage of this new annotated corpus --one that tested hypotheses on constraints governing discourse structure, and another that studied the variations in composition and signalling of multi-level discourse structures.
pdf
bib
Annotating Preferences in Negotiation Dialogues
Anaïs Cadilhac
|
Nicholas Asher
|
Farah Benamara
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)
2011
pdf
bib
Theorie et Praxis Une optique sur les travaux en TAL sur le discours et le dialogue (Theory and Praxis A view on the NLP works in discourse and dialogue)
Nicholas Asher
Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Conférences invitées
pdf
bib
Le corpus ANNODIS, un corpus enrichi d’annotations discursives [The ANNODIS corpus, a corpus enriched with discourse annotations]
Marie-Paule Péry-Woodley
|
Stergos D. Afantenos
|
Lydia-Mai Ho-Dac
|
Nicholas Asher
Traitement Automatique des Langues, Volume 52, Numéro 3 : Ressources linguistiques libres [Free Language Resources]
pdf
bib
Commitments to Preferences in Dialogue
Anais Cadilhac
|
Nicholas Asher
|
Farah Benamara
|
Alex Lascarides
Proceedings of the SIGDIAL 2011 Conference
2010
pdf
bib
Testing SDRT’s Right Frontier
Stergos Afantenos
|
Nicholas Asher
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)
2009
pdf
bib
abs
ANNODIS: une approche outillée de l’annotation de structures discursives
Marie-Paule Péry-Woodley
|
Nicholas Asher
|
Patrice Enjalbert
|
Farah Benamara
|
Myriam Bras
|
Cécile Fabre
|
Stéphane Ferrari
|
Lydia-Mai Ho-Dac
|
Anne Le Draoulec
|
Yann Mathet
|
Philippe Muller
|
Laurent Prévot
|
Josette Rebeyrolle
|
Ludovic Tanguy
|
Marianne Vergez-Couret
|
Laure Vieu
|
Antoine Widlöcher
Actes de la 16ème conférence sur le Traitement Automatique des Langues Naturelles. Articles courts
Le projet ANNODIS vise la construction d’un corpus de textes annotés au niveau discursif ainsi que le développement d’outils pour l’annotation et l’exploitation de corpus. Les annotations adoptent deux points de vue complémentaires : une perspective ascendante part d’unités de discours minimales pour construire des structures complexes via un jeu de relations de discours ; une perspective descendante aborde le texte dans son entier et se base sur des indices pré-identifiés pour détecter des structures discursives de haut niveau. La construction du corpus est associée à la création de deux interfaces : la première assiste l’annotation manuelle des relations et structures discursives en permettant une visualisation du marquage issu des prétraitements ; une seconde sera destinée à l’exploitation des annotations. Nous présentons les modèles et protocoles d’annotation élaborés pour mettre en oeuvre, au travers de l’interface dédiée, la campagne d’annotation.
2008
pdf
bib
Agreement and Disputes in Dialogue
Alex Lascarides
|
Nicholas Asher
Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue
pdf
bib
Distilling Opinion in Discourse: A Preliminary Study
Nicholas Asher
|
Farah Benamara
|
Yvette Yannick Mathieu
Coling 2008: Companion volume: Posters
1994
pdf
bib
Intentions and Information in Discourse
Nicholas Asher
|
Alex Lascarides
32nd Annual Meeting of the Association for Computational Linguistics
1993
pdf
bib
A Semantics and Pragmatics for the Pluperfect
Alex Lascarides
|
Nicholas Asher
Sixth Conference of the European Chapter of the Association for Computational Linguistics
1992
pdf
bib
Inferring Discourse Relations in Context
Alex Lascarides
|
Nicholas Asher
|
Jon Oberlander
30th Annual Meeting of the Association for Computational Linguistics
1991
pdf
bib
Discourse Relations and Defeasible Knowledge
Alex Lascarides
|
Nicholas Asher
29th Annual Meeting of the Association for Computational Linguistics
1986
pdf
bib
BUILDRS: An Implementation of DR Theory and LFG
Hajime Wada
|
Nicholas Asher
Coling 1986 Volume 1: The 11th International Conference on Computational Linguistics