Shiva Taslimipoor

2025

A Survey on Automated Distractor Evaluation in Multiple-Choice Tasks
Luca Benedetto | Shiva Taslimipoor | Paula Buttery
Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025)

Multiple-Choice Tasks are one of the most common types of assessment item, due to their feature of being easy to automatically and objectively grade. A key component of Multiple-Choice Tasks are distractors – i.e., the wrong answer options – since poor distractors affect the overall quality of the item: e.g., if they are obviously wrong, they are never selected. Thus, previous research has focused extensively on techniques for automatically generating distractors, which can be especially helpful in settings where large pools of questions are desirable or needed. However, there is no agreement within the community about the techniques that are most suited to evaluate generated distractors, and the ones used in the literature are sometimes not aligned with how distractors perform in real exams. In this review paper, we perform a comprehensive study of the approaches which are used in the literature for evaluating generated distractors, propose a taxonomy to categorise them, discuss if and how they are aligned with distractors performance in exam settings, and what are the differences for different question types and educational domains.

2024

pdf bib abs

Thanks to recent advances in generative AI, we are able to prompt large language models (LLMs) to produce texts which are fluent and grammatical. In addition, it has been shown that we can elicit attempts at grammatical error correction (GEC) from LLMs when prompted with ungrammatical input sentences. We evaluate how well LLMs can perform at GEC by measuring their performance on established benchmark datasets. We go beyond previous studies, which only examined GPT* models on a selection of English GEC datasets, by evaluating seven open-source and three commercial LLMs on four established GEC benchmarks. We investigate model performance and report results against individual error types. Our results indicate that LLMs do not always outperform supervised English GEC models except in specific contexts – namely commercial LLMs on benchmarks annotated with fluency corrections as opposed to minimal edits. We find that several open-source models outperform commercial ones on minimal edit benchmarks, and that in some settings zero-shot prompting is just as competitive as few-shot prompting.

pdf bib abs

Distractor Generation Using Generative and Discriminative Capabilities of Transformer-based Models
Shiva Taslimipoor | Luca Benedetto | Mariano Felice | Paula Buttery
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Multiple Choice Questions (MCQs) are very common in both high-stakes and low-stakes examinations, and their effectiveness in assessing students relies on the quality and diversity of distractors, which are the incorrect answer options provided alongside the correct answer. Motivated by the progress in generative language models, we propose a two-step automatic distractor generation approach which is based on text to text transfer transformer models. Unlike most of previous methods for distractor generation, our approach does not rely on the correct answer options. Instead, it first generates both correct and incorrect answer options, and then discriminates potential correct options from distractors. Identified distractors are finally categorised based on semantic similarity scores into separate clusters, and the cluster heads are selected as our final distinct distractors. Experiments on two publicly available datasets show that our approach outperforms previous models both in the case of single-word answer options and longer-sequence reading comprehension questions.

2023

pdf bib abs

A Survey of MWE Identification Experiments: The Devil is in the Details
Carlos Ramisch | Abigail Walsh | Thomas Blanchard | Shiva Taslimipoor
Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023)

Multiword expression (MWE) identification has been the focus of numerous research papers, especially in the context of the DiMSUM and PARSEME Shared Tasks (STs). This survey analyses 40 MWE identification papers with experiments on data from these STs. We look at corpus selection, pre- and post-processing, MWE encoding, evaluation metrics, statistical significance, and error analyses. We find that these aspects are usually considered minor and/or omitted in the literature. However, they may considerably impact the results and the conclusions drawn from them. Therefore, we advocate for more systematic descriptions of experimental conditions to reduce the risk of misleading conclusions drawn from poorly designed experimental setup.

pdf bib

2022

pdf bib abs

Constructing Open Cloze Tests Using Generation and Discrimination Capabilities of Transformers
Mariano Felice | Shiva Taslimipoor | Paula Buttery
Findings of the Association for Computational Linguistics: ACL 2022

This paper presents the first multi-objective transformer model for generating open cloze tests that exploits generation and discrimination capabilities to improve performance. Our model is further enhanced by tweaking its loss function and applying a post-processing re-ranking algorithm that improves overall test structure. Experiments using automatic and human evaluation show that our approach can achieve up to 82% accuracy according to experts, outperforming previous work and baselines. We also release a collection of high-quality open cloze tests along with sample system output and human annotations that can serve as a future benchmark.

pdf bib

Proceedings of the 18th Workshop on Multiword Expressions @LREC2022
Archna Bhatia | Paul Cook | Shiva Taslimipoor | Marcos Garcia | Carlos Ramisch
Proceedings of the 18th Workshop on Multiword Expressions @LREC2022

pdf bib abs

CEPOC: The Cambridge Exams Publishing Open Cloze dataset
Mariano Felice | Shiva Taslimipoor | Øistein E. Andersen | Paula Buttery
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Open cloze tests are a standard type of exercise where examinees must complete a text by filling in the gaps without any given options to choose from. This paper presents the Cambridge Exams Publishing Open Cloze (CEPOC) dataset, a collection of open cloze tests from world-renowned English language proficiency examinations. The tests in CEPOC have been expertly designed and validated using standard principles in language research and assessment. They are prepared for language learners at different proficiency levels and hence classified into different CEFR levels (A2, B1, B2, C1, C2). This resource can be a valuable testbed for various NLP tasks. We perform a complete set of experiments on three tasks: gap filling, gap prediction, and CEFR text classification. We implement transformer-based systems based on pre-trained language models to model each task and use our dataset as a test set, providing promising benchmark results.

pdf bib abs

Improving Grammatical Error Correction for Multiword Expressions
Shiva Taslimipoor | Christopher Bryant | Zheng Yuan
Proceedings of the 18th Workshop on Multiword Expressions @LREC2022

Grammatical error correction (GEC) is the task of automatically correcting errors in text. It has mainly been developed to assist language learning, but can also be applied to native text. This paper reports on preliminary work in improving GEC for multiword expression (MWE) error correction. We propose two systems which incorporate MWE information in two different ways: one is a multi-encoder decoder system which encodes MWE tags in a second encoder, and the other is a BART pre-trained transformer-based system that encodes MWE representations using special tokens. We show improvements in correcting specific types of verbal MWEs based on a modified version of a standard GEC evaluation approach.

2021

pdf bib abs

Multi-Class Grammatical Error Detection for Correction: A Tale of Two Systems
Zheng Yuan | Shiva Taslimipoor | Christopher Davis | Christopher Bryant
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

In this paper, we show how a multi-class grammatical error detection (GED) system can be used to improve grammatical error correction (GEC) for English. Specifically, we first develop a new state-of-the-art binary detection system based on pre-trained ELECTRA, and then extend it to multi-class detection using different error type tagsets derived from the ERRANT framework. Output from this detection system is used as auxiliary input to fine-tune a novel encoder-decoder GEC model, and we subsequently re-rank the N-best GEC output to find the hypothesis that most agrees with the GED output. Results show that fine-tuning the GEC system using 4-class GED produces the best model, but re-ranking using 55-class GED leads to the best performance overall. This suggests that different multi-class GED systems benefit GEC in different ways. Ultimately, our system outperforms all other previous work that combines GED and GEC, and achieves a new single-model NMT-based state of the art on the BEA-test benchmark.

pdf bib

2020

pdf bib abs

Incorporating Multiword Expressions in Phrase Complexity Estimation
Sian Gooding | Shiva Taslimipoor | Ekaterina Kochmar
Proceedings of the 1st Workshop on Tools and Resources to Empower People with REAding DIfficulties (READI)

Multiword expressions (MWEs) were shown to be useful in a number of NLP tasks. However, research on the use of MWEs in lexical complexity assessment and simplification is still an under-explored area. In this paper, we propose a text complexity assessment system for English, which incorporates MWE identification. We show that detecting MWEs using state-of-the-art systems improves predicting complexity on an established lexical complexity dataset.

pdf bib abs

SeCoDa: Sense Complexity Dataset
David Strohmaier | Sian Gooding | Shiva Taslimipoor | Ekaterina Kochmar
Proceedings of the Twelfth Language Resources and Evaluation Conference

The Sense Complexity Dataset (SeCoDa) provides a corpus that is annotated jointly for complexity and word senses. It thus provides a valuable resource for both word sense disambiguation and the task of complex word identification. The intention is that this dataset will be used to identify complexity at the level of word senses rather than word tokens. For word sense annotation SeCoDa uses a hierarchical scheme that is based on information available in the Cambridge Advanced Learner’s Dictionary. This way we can offer more coarse-grained senses than directly available in WordNet.

pdf bib abs

MTLB-STRUCT @Parseme 2020: Capturing Unseen Multiword Expressions Using Multi-task Learning and Pre-trained Masked Language Models
Shiva Taslimipoor | Sara Bahaadini | Ekaterina Kochmar
Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons

This paper describes a semi-supervised system that jointly learns verbal multiword expressions (VMWEs) and dependency parse trees as an auxiliary task. The model benefits from pre-trained multilingual BERT. BERT hidden layers are shared among the two tasks and we introduce an additional linear layer to retrieve VMWE tags. The dependency parse tree prediction is modelled by a linear layer and a bilinear one plus a tree CRF architecture on top of the shared BERT. The system has participated in the open track of the PARSEME shared task 2020 and ranked first in terms of F1-score in identifying unseen VMWEs as well as VMWEs in general, averaged across all 14 languages.

pdf bib abs

Verbal Multiword Expressions for Identification of Metaphor
Omid Rohanian | Marek Rei | Shiva Taslimipoor | Le An Ha
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Metaphor is a linguistic device in which a concept is expressed by mentioning another. Identifying metaphorical expressions, therefore, requires a non-compositional understanding of semantics. Multiword Expressions (MWEs), on the other hand, are linguistic phenomena with varying degrees of semantic opacity and their identification poses a challenge to computational models. This work is the first attempt at analysing the interplay of metaphor and MWEs processing through the design of a neural architecture whereby classification of metaphors is enhanced by informing the model of the presence of MWEs. To the best of our knowledge, this is the first “MWE-aware” metaphor identification system paving the way for further experiments on the complex interactions of these phenomena. The results and analyses show that this proposed architecture reach state-of-the-art on two different established metaphor datasets.

2019

pdf bib abs

Bridging the Gap: Attending to Discontinuity in Identification of Multiword Expressions
Omid Rohanian | Shiva Taslimipoor | Samaneh Kouchaki | Le An Ha | Ruslan Mitkov
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

We introduce a new method to tag Multiword Expressions (MWEs) using a linguistically interpretable language-independent deep learning architecture. We specifically target discontinuity, an under-explored aspect that poses a significant challenge to computational treatment of MWEs. Two neural architectures are explored: Graph Convolutional Network (GCN) and multi-head self-attention. GCN leverages dependency parse information, and self-attention attends to long-range relations. We finally propose a combined model that integrates complementary information from both, through a gating mechanism. The experiments on a standard multilingual dataset for verbal MWEs show that our model outperforms the baselines not only in the case of discontinuous MWEs but also in overall F-score.

pdf bib abs

GCN-Sem at SemEval-2019 Task 1: Semantic Parsing using Graph Convolutional and Recurrent Neural Networks
Shiva Taslimipoor | Omid Rohanian | Sara Može
Proceedings of the 13th International Workshop on Semantic Evaluation

This paper describes the system submitted to the SemEval 2019 shared task 1 ‘Cross-lingual Semantic Parsing with UCCA’. We rely on the semantic dependency parse trees provided in the shared task which are converted from the original UCCA files and model the task as tagging. The aim is to predict the graph structure of the output along with the types of relations among the nodes. Our proposed neural architecture is composed of Graph Convolution and BiLSTM components. The layers of the system share their weights while predicting dependency links and semantic labels. The system is applied to the CONLLU format of the input data and is best suited for semantic dependency parsing.

pdf bib abs

Cross-lingual Transfer Learning and Multitask Learning for Capturing Multiword Expressions
Shiva Taslimipoor | Omid Rohanian | Le An Ha
Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019)

Recent developments in deep learning have prompted a surge of interest in the application of multitask and transfer learning to NLP problems. In this study, we explore for the first time, the application of transfer learning (TRL) and multitask learning (MTL) to the identification of Multiword Expressions (MWEs). For MTL, we exploit the shared syntactic information between MWE and dependency parsing models to jointly train a single model on both tasks. We specifically predict two types of labels: MWE and dependency parse. Our neural MTL architecture utilises the supervision of dependency parsing in lower layers and predicts MWE tags in upper layers. In the TRL scenario, we overcome the scarcity of data by learning a model on a larger MWE dataset and transferring the knowledge to a resource-poor setting in another language. In both scenarios, the resulting models achieved higher performance compared to standard neural approaches.

2018

pdf bib abs

WLV at SemEval-2018 Task 3: Dissecting Tweets in Search of Irony
Omid Rohanian | Shiva Taslimipoor | Richard Evans | Ruslan Mitkov
Proceedings of the 12th International Workshop on Semantic Evaluation

This paper describes the systems submitted to SemEval 2018 Task 3 “Irony detection in English tweets” for both subtasks A and B. The first system leveraging a combination of sentiment, distributional semantic, and text surface features is ranked third among 44 teams according to the official leaderboard of the subtask A. The second system with slightly different representation of the features ranked ninth in subtask B. We present a method that entails decomposing tweets into separate parts. Searching for contrast within the constituents of a tweet is an integral part of our system. We embrace an extensive definition of contrast which leads to a vast coverage in detecting ironic content.

pdf bib abs

Wolves at SemEval-2018 Task 10: Semantic Discrimination based on Knowledge and Association
Shiva Taslimipoor | Omid Rohanian | Le An Ha | Gloria Corpas Pastor | Ruslan Mitkov
Proceedings of the 12th International Workshop on Semantic Evaluation

This paper describes the system submitted to SemEval 2018 shared task 10 ‘Capturing Dicriminative Attributes’. We use a combination of knowledge-based and co-occurrence features to capture the semantic difference between two words in relation to an attribute. We define scores based on association measures, ngram counts, word similarity, and ConceptNet relations. The system is ranked 4th (joint) on the official leaderboard of the task.

2017

pdf bib abs

Using Gaze Data to Predict Multiword Expressions
Omid Rohanian | Shiva Taslimipoor | Victoria Yaneva | Le An Ha
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

In recent years gaze data has been increasingly used to improve and evaluate NLP models due to the fact that it carries information about the cognitive processing of linguistic phenomena. In this paper we conduct a preliminary study towards the automatic identification of multiword expressions based on gaze features from native and non-native speakers of English. We report comparisons between a part-of-speech (POS) and frequency baseline to: i) a prediction model based solely on gaze data and ii) a combined model of gaze data, POS and frequency. In spite of the challenging nature of the task, best performance was achieved by the latter. Furthermore, we explore how the type of gaze data (from native versus non-native speakers) affects the prediction, showing that data from the two groups is discriminative to an equal degree for the task. Finally, we show that late processing measures are more predictive than early ones, which is in line with previous research on idioms and other formulaic structures.

pdf bib abs

Investigating the Opacity of Verb-Noun Multiword Expression Usages in Context
Shiva Taslimipoor | Omid Rohanian | Ruslan Mitkov | Afsaneh Fazly
Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017)

This study investigates the supervised token-based identification of Multiword Expressions (MWEs). This is an ongoing research to exploit the information contained in the contexts in which different instances of an expression could occur. This information is used to investigate the question of whether an expression is literal or MWE. Lexical and syntactic context features derived from vector representations are shown to be more effective over traditional statistical measures to identify tokens of MWEs.

2015

pdf bib

2012

pdf bib abs

Using Noun Similarity to Adapt an Acceptability Measure for Persian Light Verb Constructions
Shiva Taslimipoor | Afsaneh Fazly | Ali Hamzeh
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Light verb constructions (LVCs), such as take a walk and make a decision, are a common subclass of multiword expressions (MWEs), whose distinct syntactic and semantic properties call for a special treatment within a computational system. In particular, LVCs are formed semi-productively: often a semantically-general verb (such as take) combines with a number of semantically-similar nouns to form semantically-related LVCs, as in make a decision/choice/commitment. Nonetheless, there are restrictions as to which verbs combine with which class of nouns. A proper computational account of LVCs is even more important for languages such as Persian, in which most verbs are of the form of LVCs. Recently, there has been some work on the automatic identification of MWEs (including LVCs) in resource-rich languages, such as English and Dutch. We adapt such existing techniques for the automatic identification of LVCs in Persian, an under-resourced language. Specifically, we extend an existing statistical measure of the acceptability of English LVCs (Fazly et al., 2007) to make explicit use of semantic classes of noun, and show that such classes are in particular useful for determining the LVC acceptability of new combinations.

Venues

BEA1

WS1