Leila Kosseim - ACL Anthology

Leila Kosseim

2026

CLaC at SemEval-2026 Task 6: Response Clarity Detection in Political Discourse
Nawar Turk | Lucas Miquet-Westphal | Leila Kosseim
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)

In this paper, we present our system for SemEval-2026 Task 6 (CLARITY) on response clarity and evasion detection in question-answer pairs from U.S. presidential interviews, comparing fine-tuned encoders with prompt-based LLMs. Our LLM ensemble achieves 80 macro-F1 on the 3-class Task 1 (9th/41) and 59 on the 9-class Task 2 (3rd/33). Across 8 transformer encoders optimized through a four-stage pipeline, partial encoder layer unfreezing outperforms full fine-tuning by a wide margin. Combining English and multilingual encoders further improves ensemble performance over either family alone, despite multilingual models being individually weaker. Prompt-based LLMs, without any task-specific parameter updates, outperform fine-tuned encoders, particularly on minority classes; among open-weight LLMs, parameter count does not predict performance. Enriched input, concatenating the full interviewer turn, improves LLM performance but not that of encoders, an effect that persists with Longformer’s extended context window, suggesting the divergence is not attributable to sequence-length capacity alone in our settings. The Clear Reply/Ambivalent boundary remains the dominant failure mode, mirroring the disagreement among human annotators. Our code, prompts, model configurations, and results are publicly available.

2025

Multi-Lingual Implicit Discourse Relation Recognition with Multi-Label Hierarchical Learning
Nelson Filipe Costa | Leila Kosseim
Proceedings of the 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue

This paper introduces the first multi-lingual and multi-label classification model for implicit discourse relation recognition (IDRR). Our model, HArch, is evaluated on the recently released DiscoGeM 2.0 corpus and leverages hierarchical dependencies between discourse senses to predict probability distributions across all three sense levels in the PDTB 3.0 framework. We compare several pre-trained encoder backbones and find that RoBERTa-HArch achieves the best performance in English, while XLM-RoBERTa-HArch performs best in the multi-lingual setting. In addition, we compare our fine-tuned models against GPT-4o and Llama-4-Maverick using few-shot prompting across all language configurations. Our results show that our fine-tuned models consistently outperform these LLMs, highlighting the advantages of task-specific fine-tuning over prompting in IDRR. Finally, we report SOTA results on the DiscoGeM 1.0 corpus, further validating the effectiveness of our hierarchical approach.

A Multi-Task and Multi-Label Classification Model for Implicit Discourse Relation Recognition
Nelson Filipe Costa | Leila Kosseim
Proceedings of the 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue

We propose a novel multi-label classification approach to implicit discourse relation recognition (IDRR). Our approach features a multi-task model that jointly learns multi-label representations of implicit discourse relations across all three sense levels in the PDTB 3.0 framework. The model can also be adapted to the traditional single-label IDRR setting by selecting the sense with the highest probability in the multi-label representation. We conduct extensive experiments to identify optimal model configurations and loss functions in both settings. Our approach establishes the first benchmark for multi-label IDRR and achieves SOTA results on single-label IDRR using DiscoGeM. Finally, we evaluate our model on the PDTB 3.0 corpus in the single-label setting, presenting the first analysis of transfer learning between the DiscoGeM and PDTB 3.0 corpora for IDRR.

CLaC at SemEval-2025 Task 6: A Multi-Architecture Approach for Corporate Environmental Promise Verification
Nawar Turk | Eeham Khan | Leila Kosseim
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

This paper presents our approach to the PromiseEval task at SemEval-2025, which focuses on verifying promises in corporate ESG (Environmental, Social, and Governance) reports. We explore three model architectures to address the four subtasks of promise identification, supporting evidence assessment, clarity evaluation, and verification timing. Our first model utilizes ESG-BERT with task-specific classifier heads, while our second model enhances this architecture with linguistic features tailored for each subtask. Our third approach implements a combined subtask model with attention-based sequence pooling, transformer representations augmented with document metadata, and multi-objective learning. Experiments on the English portion of the ML-Promise dataset demonstrate progressive improvement across our models, with our combined subtask approach achieving a private leaderboard score of 0.5268, outperforming the provided baseline of 0.5227. Our work highlights the effectiveness of linguistic feature extraction, attention pooling, and multi-objective learning in promise verification tasks, despite challenges posed by class imbalance and limited training data.

From Posts to Predictions: A User-Aware Framework for Faithful and Transparent Detection of Mental Health Risks on Social Media
Hessam Amini | Leila Kosseim
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era

We propose a user-aware attention-based framework for early detection of mental health risks from social media posts. Our model combines DisorBERT, a mental health–adapted transformer encoder, with a user-level attention mechanism that produces transparent post-level explanations. To assess whether these explanations are faithful, i.e., aligned with the model’s true decision process, we apply adversarial training and quantify attention faithfulness using the AtteFa metric. Experiments on four eRisk tasks (depression, anorexia, self-harm, and pathological gambling) show that our model achieves competitive latency-weighted F1 scores while relying on a sparse subset of posts per user. We also evaluate attention robustness and conduct ablations, confirming the model’s reliance on high-weighted posts. Our work extends prior explainability studies by integrating faithfulness assessment in a real-world high-stakes application. We argue that systems combining predictive accuracy with faithful and transparent explanations offer a promising path toward safe and trustworthy AI for mental health support.

CLaC at DISRPT 2025: Hierarchical Adapters for Cross-Framework Multi-lingual Discourse Relation Classification
Nawar Turk | Daniele Comitogianni | Leila Kosseim
Proceedings of the 4th Shared Task on Discourse Relation Parsing and Treebanking (DISRPT 2025)

We present our submission to Task 3 (Discourse Relation Classification) of the DISRPT 2025 shared task. Task 3 introduces a unified set of 17 discourse relation labels across 39 corpora in 16 languages and six discourse frameworks, posing significant multilingual and cross‐formalism challenges. We first benchmark the task by fine‐tuning multilingual BERT‐based models (mBERT, XLM‐RoBERTa‐Base, and XLM‐RoBERTa‐Large) with two argument‐ordering strategies and progressive unfreezing ratios to establish strong baselines. We then evaluate prompt‐based large language models (namely Claude Opus 4.0) in zero‐shot and few‐shot settings to understand how LLMs respond to the newly proposed unified labels. Finally, we introduce HiDAC, a Hierarchical Dual‐Adapter Contrastive learning model. Results show that while larger transformer models achieve higher accuracy, the improvements are modest, and that unfreezing the top 75% of encoder layers yields performance comparable to full fine‐tuning while training far fewer parameters. Prompt‐based models lag significantly behind fine‐tuned transformers, and HiDAC achieves the highest overall accuracy (67.5%) while remaining more parameter‐efficient than full fine‐tuning.

2024

CLaC at SemEval-2024 Task 4: Decoding Persuasion in Memes – An Ensemble of Language Models with Paraphrase Augmentation
Kota Shamanth Ramanath Nayak | Leila Kosseim
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

This paper describes our approach to SemEval-2024 Task 4 subtask 1, focusing on hierarchical multi-label detection of persuasion techniques in meme texts. Our approach was based on fine-tuning individual language models (BERT, XLM-RoBERTa, and mBERT) and leveraging a mean-based ensemble model. Additional strategies included dataset augmentation through the TC dataset and paraphrase generation as well as the fine-tuning of individual classification thresholds for each class. During testing, our system outperformed the baseline in all languages except for Arabic, where no significant improvement was reached. Analysis of the results seem to indicate that our dataset augmentation strategy and per-class threshold fine-tuning may have introduced noise and exacerbated the dataset imbalance.

CLaC at SemEval-2024 Task 2: Faithful Clinical Trial Inference
Jennifer Marks | Mohammadreza Davari | Leila Kosseim
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

This paper presents the methodology used for our participation in SemEval 2024 Task 2 (Jullien et al., 2024) – Safe Biomedical Natural Language Inference for Clinical Trials. The task involved Natural Language Inference (NLI) on clinical trial data, where statements were provided regarding information within Clinical Trial Reports (CTRs). These statements could pertain to a single CTR or compare two CTRs, requiring the identification of the inference relation (entailment vs contradiction) between CTR-statement pairs. Evaluation was based on F1, Faithfulness, and Consistency metrics, with priority given to the latter two by the organizers. Our approach aims to maximize Faithfulness and Consistency, guided by intuitive definitions provided by the organizers, without detailed metric calculations. Experimentally, our approach yielded models achieving maximal Faithfulness (top rank) and average Consistency (mid rank) at the expense of F1 (low rank). Future work will focus on refining our approach to achieve a balance among all three metrics.

Exploring Soft-Label Training for Implicit Discourse Relation Recognition
Nelson Filipe Costa | Leila Kosseim
Proceedings of the 5th Workshop on Computational Approaches to Discourse (CODI 2024)

This paper proposes a classification model for single label implicit discourse relation recognition trained on soft-label distributions. It follows the PDTB 3.0 framework and it was trained and tested on the DiscoGeM corpus, where it achieves an F1-score of 51.38 on third-level sense classification of implicit discourse relations. We argue that training on soft-label distributions allows the model to better discern between more ambiguous discourse relations.

2023

CLaC at SemEval-2023 Task 3: Language Potluck RoBERTa Detects Online Persuasion Techniques in a Multilingual Setup
Nelson Filipe Costa | Bryce Hamilton | Leila Kosseim
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

This paper presents our approach to the SemEval-2023 Task 3 to detect online persuasion techniques in a multilingual setup. Our classification system is based on the RoBERTa-base model trained predominantly on English to label the persuasion techniques across 9 different languages. Our system was able to significantly surpass the baseline performance in 3 of the 9 languages: English, Georgian and Greek. However, our wrong assumption that a single classification system trained predominantly on English could generalize well to other languages, negatively impacted our scores on the other 6 languages. In this paper, we provide a description of the reasoning behind the development of our final model and what conclusions may be drawn from its performance for future work.

Discourse Analysis of Argumentative Essays of English Learners Based on CEFR Level
Blaise Hanel | Leila Kosseim
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing

In this paper, we investigate the relationship between the use of discourse relations and the CEFR-level of argumentative English learner essays. Using both the Rhetorical Structure Theory (RST) and the Penn Discourse TreeBank (PDTB) frameworks, we analyze essays from The International Corpus Network of Asian Learners (ICNALE), and the Corpus and Repository of Writing (CROW). Results show that the use of the RST relations of Explanation and Background, as well as the first-level PDTB sense of Contingency, are influenced by the English proficiency level of the writer.

Mapping Explicit and Implicit Discourse Relations between the RST-DT and the PDTB 3.0
Nelson Filipe Costa | Nadia Sheikh | Leila Kosseim
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing

In this paper we propose a first empirical mapping between the RST-DT and the PDTB 3.0. We provide an original algorithm which allowed the mapping of 6,510 (80.0%) explicit and implicit discourse relations between the overlapping articles of the RST-DT and PDTB 3.0 discourse annotated corpora. Results of the mapping show that while it is easier to align segments of implicit discourse relations, the mapping obtained between the aligned explicit discourse relations is more unambiguous.

2022

Pre-training Language Models for Surface Realization
Farhood Farahnak | Leila Kosseim
Proceedings of the 5th International Conference on Natural Language and Speech Processing (ICNLSP 2022)

How (Un)Faithful is Attention?
Hessam Amini | Leila Kosseim
Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP

Although attention weights have been commonly used as a means to provide explanations for deep learning models, the approach has been widely criticized due to its lack of faithfulness. In this work, we present a simple approach to compute the newly proposed metric AtteFa, which can quantitatively represent the degree of faithfulness of the attention weights. Using this metric, we further validate the effect of the frequency of informative input elements and the use of contextual vs. non-contextual encoders on the faithfulness of the attention mechanism. Finally, we apply the approach on several real-life binary classification datasets to measure the faithfulness of attention weights in real-life settings.

2020

Surface Realization Using Pretrained Language Models
Farhood Farahnak | Laya Rafiee | Leila Kosseim | Thomas Fevens
Proceedings of the Third Workshop on Multilingual Surface Realisation

In the context of Natural Language Generation, surface realization is the task of generating the linear form of a text following a given grammar. Surface realization models usually consist of a cascade of complex sub-modules, either rule-based or neural network-based, each responsible for a specific sub-task. In this work, we show that a single encoder-decoder language model can be used in an end-to-end fashion for all sub-tasks of surface realization. The model is designed based on the BART language model that receives a linear representation of unordered and non-inflected tokens in a sentence along with their corresponding Universal Dependency information and produces the linear sequence of inflected tokens along with the missing words. The model was evaluated on the shallow and deep tracks of the 2020 Surface Realization Shared Task (SR’20) using both human and automatic evaluation. The results indicate that despite its simplicity, our model achieves competitive results among all participants in the shared task.

Cooking Up a Neural-based Model for Recipe Classification
Elham Mohammadi | Nada Naji | Louis Marceau | Marc Queudot | Eric Charton | Leila Kosseim | Marie-Jean Meurs
Proceedings of the Twelfth Language Resources and Evaluation Conference

In this paper, we propose a neural-based model to address the first task of the DEFT 2013 shared task, with the main challenge of a highly imbalanced dataset, using state-of-the-art embedding approaches and deep architectures. We report on our experiments on the use of linguistic features, extracted by Charton et. al. (2014), in different neural models utilizing pretrained embeddings. Our results show that all of the models that use linguistic features outperform their counterpart models that only use pretrained embeddings. The best performing model uses pretrained CamemBERT embeddings as input and CNN as the hidden layer, and uses additional linguistic features. Adding the linguistic features to this model improves its performance by 4.5% and 11.4% in terms of micro and macro F1 scores, respectively, leading to state-of-the-art results and an improved classification of the rare classes.

On the Creation of a Corpus for Coherence Evaluation of Discursive Units
Elham Mohammadi | Timothe Beiko | Leila Kosseim
Proceedings of the Twelfth Language Resources and Evaluation Conference

In this paper, we report on our experiments towards the creation of a corpus for coherence evaluation. Most corpora for textual coherence evaluation are composed of randomly shuffled sentences that focus on sentence ordering, regardless of whether the sentences were originally related by a discourse relation. To the best of our knowledge, no publicly available corpus has been designed specifically for the evaluation of coherence of known discursive units. In this paper, we focus on coherence modeling at the intra-discursive level and describe our approach to build a corpus of incoherent pairs of sentences. We experimented with a variety of corruption strategies to create synthetic incoherent pairs of discourse arguments from coherent ones. Using discourse argument pairs from the Penn Discourse Tree Bank, we generate incoherent discourse argument pairs, by swapping either their discourse connective or a discourse argument. To evaluate how incoherent the generated corpora are, we use a convolutional neural network to try to distinguish the original pairs from the corrupted ones. Results of the classifier as well as a manual inspection of the corpora show that generating such corpora is still a challenge as the generated instances are clearly not “incoherent enough”, indicating that more effort should be spent on developing more robust ways of generating incoherent corpora.

Du bon usage d’ingrédients linguistiques spéciaux pour classer des recettes exceptionnelles (Using Special Linguistic Ingredients to Classify Exceptional Recipes )
Elham Mohammadi | Louis Marceau | Eric Charton | Leila Kosseim | Luka Nerima | Marie-Jean Meurs
Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Volume 2 : Traitement Automatique des Langues Naturelles

Nous présentons un modèle d’apprentissage automatique qui combine modèles neuronaux et linguistiques pour traiter les tâches de classification dans lesquelles la distribution des étiquettes des instances est déséquilibrée. Les performances de ce modèle sont mesurées à l’aide d’expériences menées sur les tâches de classification de recettes de cuisine de la campagne DEFT 2013 (Grouin et al., 2013). Nous montrons que les plongements lexicaux (word embeddings) associés à des méthodes d’apprentissage profond obtiennent de meilleures performances que tous les algorithmes déployés lors de la campagne DEFT. Nous montrons aussi que ces mêmes classifieurs avec plongements lexicaux peuvent gagner en performance lorsqu’un modèle linguistique est ajouté au modèle neuronal. Nous observons que l’ajout d’un modèle linguistique au modèle neuronal améliore les performances de classification sur les classes rares.

TIMBERT: Toponym Identifier For The Medical Domain Based on BERT
MohammadReza Davari | Leila Kosseim | Tien Bui
Proceedings of the 28th International Conference on Computational Linguistics

In this paper, we propose an approach to automate the process of place name detection in the medical domain to enable epidemiologists to better study and model the spread of viruses. We created a family of Toponym Identification Models based on BERT (TIMBERT), in order to learn in an end-to-end fashion the mapping from an input sentence to the associated sentence labeled with toponyms. When evaluated with the SemEval 2019 task 12 test set (Weissenbacher et al., 2019), our best TIMBERT model achieves an F1 score of 90.85%, a significant improvement compared to the state-of-the-art of 89.13% (Wang et al., 2019).

2019

CLaC at CLPsych 2019: Fusion of Neural Features and Predicted Class Probabilities for Suicide Risk Assessment Based on Online Posts
Elham Mohammadi | Hessam Amini | Leila Kosseim
Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology

This paper summarizes our participation to the CLPsych 2019 shared task, under the name CLaC. The goal of the shared task was to detect and assess suicide risk based on a collection of online posts. For our participation, we used an ensemble method which utilizes 8 neural sub-models to extract neural features and predict class probabilities, which are then used by an SVM classifier. Our team ranked first in 2 out of the 3 tasks (tasks A and C).

CLaC Lab at SemEval-2019 Task 3: Contextual Emotion Detection Using a Combination of Neural Networks and SVM
Elham Mohammadi | Hessam Amini | Leila Kosseim
Proceedings of the 13th International Workshop on Semantic Evaluation

This paper describes our system at SemEval 2019, Task 3 (EmoContext), which focused on the contextual detection of emotions in a dataset of 3-round dialogues. For our final system, we used a neural network with pretrained ELMo word embeddings and POS tags as input, GRUs as hidden units, an attention mechanism to capture representations of the dialogues, and an SVM classifier which used the learned network representations to perform the task of multi-class classification. This system yielded a micro-averaged F1 score of 0.7072 for the three emotion classes, improving the baseline by approximately 12%.

Neural Feature Extraction for Contextual Emotion Detection
Elham Mohammadi | Hessam Amini | Leila Kosseim
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

This paper describes a new approach for the task of contextual emotion detection. The approach is based on a neural feature extractor, composed of a recurrent neural network with an attention mechanism, followed by a classifier, that can be neural or SVM-based. We evaluated the model with the dataset of the task 3 of SemEval 2019 (EmoContext), which includes short 3-turn conversations, tagged with 4 emotion classes. The best performing setup was achieved using ELMo word embeddings and POS tags as input, bidirectional GRU as hidden units, and an SVM as the final classifier. This configuration reached 69.93% in terms of micro-average F1 score on the main 3 emotion classes, a score that outperformed the baseline system by 11.25%.

The Concordia NLG Surface Realizer at SRST 2019
Farhood Farahnak | Laya Rafiee | Leila Kosseim | Thomas Fevens
Proceedings of the 2nd Workshop on Multilingual Surface Realisation (MSR 2019)

This paper presents the model we developed for the shallow track of the 2019 NLG Surface Realization Shared Task. The model reconstructs sentences whose word order and word inflections were removed. We divided the problem into two sub-problems: reordering and inflecting. For the purpose of reordering, we used a pointer network integrated with a transformer model as its encoder-decoder modules. In order to generate the inflected forms of tokens, a Feed Forward Neural Network was employed.

2018

Attention for Implicit Discourse Relation Recognition
Andre Cianflone | Leila Kosseim
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

CLaC @ DEFT 2018: Sentiment analysis of tweets on transport from Île-de-France
Simon Jacques | Farhood Farahnak | Leila Kosseim
Actes de la Conférence TALN. Volume 2 - Démonstrations, articles des Rencontres Jeunes Chercheurs, ateliers DeFT

CLaC @ DEFT 2018: Analysis of tweets on transport on the Île-de-France This paper describes the system deployed by the CLaC lab at Concordia University in Montreal for the DEFT 2018 shared task. The competition consisted in four different tasks; however, due to lack of time, we only participated in the first two. We participated with a system based on conventional supervised learning methods: a support vector machine classifier and an artificial neural network. For task 1, our best approach achieved an F-measure of 87.61%; while at task 2, we achieve 51.03%, situating our system below the average of the other participants.

2017

Automatic Mapping of French Discourse Connectives to PDTB Discourse Relations
Majid Laali | Leila Kosseim
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue

In this paper, we present an approach to exploit phrase tables generated by statistical machine translation in order to map French discourse connectives to discourse relations. Using this approach, we created DisCoRel, a lexicon of French discourse connectives and their PDTB relations. When evaluated against LEXCONN, DisCoRel achieves a recall of 0.81 and an Average Precision of 0.68 for the Concession and Condition relations.

Improving Discourse Relation Projection to Build Discourse Annotated Corpora
Majid Laali | Leila Kosseim
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

The naive approach to annotation projection is not effective to project discourse annotations from one language to another because implicit discourse relations are often changed to explicit ones and vice-versa in the translation. In this paper, we propose a novel approach based on the intersection between statistical word-alignment models to identify unsupported discourse annotations. This approach identified 65% of the unsupported annotations in the English-French parallel sentences from Europarl. By filtering out these unsupported annotations, we induced the first PDTB-style discourse annotated corpus for French from Europarl. We then used this corpus to train a classifier to identify the discourse-usage of French discourse connectives and show a 15% improvement of F1-score compared to the classifier trained on the non-filtered annotations.

Argument Labeling of Explicit Discourse Relations using LSTM Neural Networks
Sohail Hooda | Leila Kosseim
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

Argument labeling of explicit discourse relations is a challenging task. The state of the art systems achieve slightly above 55% F-measure but require hand-crafted features. In this paper, we propose a Long Short Term Memory (LSTM) based model for argument labeling. We experimented with multiple configurations of our model. Using the PDTB dataset, our best model achieved an F1 measure of 23.05% without any feature engineering. This is significantly higher than the 20.52% achieved by the state of the art RNN approach, but significantly lower than the feature based state of the art systems. On the other hand, because our approach learns only from the raw dataset, it is more widely applicable to multiple textual genres and languages.

Automatic Identification of AltLexes using Monolingual Parallel Corpora
Elnaz Davoodi | Leila Kosseim
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

The automatic identification of discourse relations is still a challenging task in natural language processing. Discourse connectives, such as since or but, are the most informative cues to identify explicit relations; however discourse parsers typically use a closed inventory of such connectives. As a result, discourse relations signalled by markers outside these inventories (i.e. AltLexes) are not detected as effectively. In this paper, we propose a novel method to leverage parallel corpora in text simplification and lexical resources to automatically identify alternative lexicalizations that signal discourse relation. When applied to the Simple Wikipedia and Newsela corpora along with WordNet and the PPDB, the method allowed the automatic discovery of 91 AltLexes.

2016

N-gram and Neural Language Models for Discriminating Similar Languages
Andre Cianflone | Leila Kosseim
Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3)

This paper describes our submission to the 2016 Discriminating Similar Languages (DSL) Shared Task. We participated in the closed Sub-task 1 with two separate machine learning techniques. The first approach is a character based Convolution Neural Network with an LSTM layer (CLSTM), which achieved an accuracy of 78.45% with minimal tuning. The second approach is a character-based n-gram model of size 7. It achieved an accuracy of 88.45% which is close to the accuracy of 89.38% achieved by the best submission.

On the Contribution of Discourse Structure on Text Complexity Assessment
Elnaz Davoodi | Leila Kosseim
Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue

CLaC at SemEval-2016 Task 11: Exploring linguistic and psycho-linguistic Features for Complex Word Identification
Elnaz Davoodi | Leila Kosseim
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

The CLaC Discourse Parser at CoNLL-2016
Majid Laali | Andre Cianflone | Leila Kosseim
Proceedings of the CoNLL-16 shared task

2015

The CLaC Discourse Parser at CoNLL-2015
Majid Laali | Elnaz Davoodi | Leila Kosseim
Proceedings of the Nineteenth Conference on Computational Natural Language Learning - Shared Task

2014

Inducing Discourse Connectives from Parallel Texts
Majid Laali | Leila Kosseim
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

2013

ClaC: Semantic Relatedness of Words and Phrases
Reda Siblini | Leila Kosseim
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

Using a Weighted Semantic Network for Lexical Semantic Relatedness
Reda Siblini | Leila Kosseim
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

Measuring the Effect of Discourse Relations on Blog Summarization
Shamima Mithun | Leila Kosseim
Proceedings of the Sixth International Joint Conference on Natural Language Processing

2012

Discrepancy Between Automatic and Manual Evaluation of Summaries
Shamima Mithun | Leila Kosseim | Prasad Perera
Proceedings of Workshop on Evaluation Metrics and System Comparison for Automatic Summarization

2011

Discourse Structures to Reduce Discourse Incoherence in Blog Summarization
Shamima Mithun | Leila Kosseim
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011

2010

A Hybrid Approach to Utilize Rhetorical Relations for Blog Summarization
Shamima Mithun | Leila Kosseim
Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

The availability of huge amounts of online opinions has created a new need to develop effective query-based opinion summarizers to analyze this information in order to facilitate decision making at every level. To develop an effective opinion summarization approach, we have targeted to resolve specifically Question Irrelevancy and Discourse Incoherency problems which have been found to be the most frequently occurring problems for opinion summarization. To address these problems, we have introduced a hybrid approach by combining text schema and rhetorical relations to exploit intra-sentential rhetorical relations. To evaluate our approach, we have built a system called BlogSum and have compared BlogSum-generated summaries after applying rhetorical structuring to BlogSum-generated candidate sentences without utilizing rhetorical relations using the Text Analysis Conference (TAC) 2008 data for summary contents. Evaluation results show that our approach improves summary contents by reducing question irrelevant sentences.

2009

Summarizing Blog Entries versus News Texts
Shamima Mithun | Leila Kosseim
Proceedings of the Workshop on Events in Emerging Text Types

2008

Answering List Questions using Co-occurrence and Clustering
Majid Razmara | Leila Kosseim
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Although answering list questions is not a new research area, answering them automatically still remains a challenge. The median F-score of systems that participated in TREC 2007 Question Answering track is still very low (0.085) while 74% of the questions had a median F-score of 0. In this paper, we propose a novel approach to answering list questions. This approach is based on the hypothesis that answer instances of a list question co-occur in the documents and sentences related to the topic of the question. We use a clustering method to group the candidate answers that co-occur more often. To pinpoint the right cluster, we use the target and the question keywords as spies to return the cluster that contains these keywords.

RoDEO: Reasoning over Dependencies Extracted Online
Reda Siblini | Leila Kosseim
Proceedings of the 4th Web as Corpus Workshop

The web is the largest available corpus, which could be enormously valuable to many natural language processing applications. However it is becoming very difficult to identify relevant information from the web. We present a system for querying dependency tree collocations from the web. We show its usefulness in identifying relevant information by evaluating its accuracy in the task of extracting classes of named entities. The task achieved a general accuracy of 70%.

2004

Simple features for statistical Word Sense Disambiguation
Abolfazl Lamjiri | Osama El Demerdash | Leila Kosseim
Proceedings of SENSEVAL-3, the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text

The Problem of Precision in Restricted-Domain Question Answering. Some Proposed Methods of Improvement
Hai Doan-Nguyen | Leila Kosseim
Proceedings of the Conference on Question Answering in Restricted Domains

2003

Generation of natural responses through syntactic patterns
Glenda B. Anaya | Leila Kosseim
Actes de la 10ème conférence sur le Traitement Automatique des Langues Naturelles. Posters

The goal of Question-Answering (QA) systems is to find short and factual answers to opendomain questions by searching a large collection of documents. The subject of this research is to formulate complete and natural answer-sentences to questions, given the short answer. The answer-sentences are meant to be self-sufficient; that is, they should contain enough context to be understood without needing the original question. Generating such sentences is important in question-answering as they can be used to enhance existing QA systems to provide answers to the user in a more natural way and to provide a pattern to actually extract the answer from the document collection.

2001

Extraction de noms propres à partir de textes variés: problématique et enjeux
Leila Kosseim | Thierry Poibeau
Actes de la 8ème conférence sur le Traitement Automatique des Langues Naturelles. Posters

Cet article porte sur l’identification de noms propres à partir de textes écrits. Les stratégies à base de règles développées pour des textes de type journalistique se révèlent généralement insuffisantes pour des corpus composés de textes ne répondant pas à des critères rédactionnels stricts. Après une brève revue des travaux effectués sur des corpus de textes de nature journalistique, nous présentons la problématique de l’analyse de textes variés en nous basant sur deux corpus composés de courriers électroniques et de transcriptions manuelles de conversations téléphoniques. Une fois les sources d’erreurs présentées, nous décrivons l’approche utilisée pour adapter un système d’extraction de noms propres développé pour des textes journalistiques à l’analyse de messages électroniques.

Critères de sélection d’une approche pour le suivi automatique du courriel
Leila Kosseim | Guy Lapalme
Actes de la 8ème conférence sur le Traitement Automatique des Langues Naturelles. Posters

Cet article discute de différentes approches pour faire le suivi automatique du courrier-électronique. Nous présentons tout d’abord les méthodes de traitement automatique de la langue (TAL) les plus utilisées pour cette tâche, puis un ensemble de critères influençant le choix d’une approche. Ces critères ont été développés grâce à une étude de cas sur un corpus fourni par Bell Canada Entreprises. Avec notre corpus, il est apparu que si aucune méthode n’est complètement satisfaisante par elle-même, une approche combinée semble beaucoup plus prometteuse.

1994

Content and Rhetorical Status Selection in Instructional Texts
Leila Kosseim | Guy Lapalme
Proceedings of the Seventh International Workshop on Natural Language Generation

Venues