Matthew Purver


2024

pdf bib
Comparing News Framing of Migration Crises using Zero-Shot Classification
Nikola Ivačič | Matthew Purver | Fabienne Lind | Senja Pollak | Hajo Boomgaarden | Veronika Bajt
Proceedings of the First Workshop on Reference, Framing, and Perspective @ LREC-COLING 2024

We present an experiment on classifying news frames in a language unseen by the learner, using zero-shot cross-lingual transfer learning. We used two pre-trained multilingual Transformer Encoder neural network models and tested with four specific news frames, investigating two approaches to the resulting multi-label task: Binary Relevance (treating each frame independently) and Label Power-set (predicting each possible combination of frames). We train our classifiers on an available annotated multilingual migration news dataset and test on an unseen Slovene language migration news corpus, first evaluating performance and then using the classifiers to analyse how media framed the news during the periods of Syria and Ukraine conflict-related migrations.

pdf bib
Findings of the Association for Computational Linguistics: EACL 2024
Yvette Graham | Matthew Purver
Findings of the Association for Computational Linguistics: EACL 2024

pdf bib
Recent Trends in Linear Text Segmentation: A Survey
Iacopo Ghinassi | Lin Wang | Chris Newell | Matthew Purver
Findings of the Association for Computational Linguistics: EMNLP 2024

Linear Text Segmentation is the task of automatically tagging text documents with topic shifts, i.e. the places in the text where the topics change. A well-established area of research in Natural Language Processing, drawing from well-understood concepts in linguistic and computational linguistic research, the field has recently seen a lot of interest as a result of the surge of text, video, and audio available on the web, which in turn require ways of summarising and categorizing the mole of content for which linear text segmentation is a fundamental step. In this survey, we provide an extensive overview of current advances in linear text segmentation, describing the state of the art in terms of resources and approaches for the task. Finally, we highlight the limitations of available resources and of the task itself, while indicating ways forward based on the most recent literature and under-explored research directions.

pdf bib
Analyzing and Enhancing Clarification Strategies for Ambiguous References in Consumer Service Interactions
Changling Li | Yujian Gan | Zhenrong Yang | Youyang Chen | Xinxuan Qiu | Yanni Lin | Matthew Purver | Massimo Poesio
Proceedings of the 25th Annual Meeting of the Special Interest Group on Discourse and Dialogue

When customers present ambiguous references, service staff typically need to clarify the customers’ specific intentions. To advance research in this area, we collected 1,000 real-world consumer dialogues with ambiguous references. This dataset will be used for subsequent studies to identify ambiguous references and generate responses. Our analysis of the dataset revealed common strategies employed by service staff, including directly asking clarification questions (CQ) and listing possible options before asking a clarification question (LCQ). However, we found that merely using CQ often fails to fully satisfy customers. In contrast, using LCQ, as well as recommending specific products after listing possible options, proved more effective in resolving ambiguous references and enhancing customer satisfaction.

pdf bib
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Yvette Graham | Matthew Purver
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)
Yvette Graham | Matthew Purver
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
A Computational Analysis of the Dehumanisation of Migrants from Syria and Ukraine in Slovene News Media
Jaya Caporusso | Damar Hoogland | Mojca Brglez | Boshko Koloski | Matthew Purver | Senja Pollak
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Dehumanisation involves the perception and/or treatment of a social group’s members as less than human. This phenomenon is rarely addressed with computational linguistic techniques. We adapt a recently proposed approach for English, making it easier to transfer to other languages and to evaluate, introducing a new sentiment resource, the use of zero-shot cross-lingual valence and arousal detection, and a new method for statistical significance testing. We then apply it to study attitudes to migration expressed in Slovene newspapers, to examine changes in the Slovene discourse on migration between the 2015-16 migration crisis following the war in Syria and the 2022-23 period following the war in Ukraine. We find that while this discourse became more negative and more intense over time, it is less dehumanising when specifically addressing Ukrainian migrants compared to others.

pdf bib
Denoising Labeled Data for Comment Moderation Using Active Learning
Andraž Pelicon | Mladen Karan | Ravi Shekhar | Matthew Purver | Senja Pollak
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Noisily labeled textual data is ample on internet platforms that allow user-created content. Training models, such as offensive language detection models for comment moderation, on such data may prove difficult as the noise in the labels prevents the model to converge. In this work, we propose to use active learning methods for the purposes of denoising training data for model training. The goal is to sample examples the most informative examples with noisy labels with active learning and send them to the oracle for reannotation thus reducing the overall cost of reannotation. In this setting we tested three existing active learning methods, namely DBAL, Variance of Gradients (VoG) and BADGE. The proposed approach to data denoising is tested on the problem of offensive language detection. We observe that active learning can be effectively used for the purposes of data denoising, however care should be taken when choosing the algorithm for this purpose.

pdf bib
When Cohesion Lies in the Embedding Space: Embedding-Based Reference-Free Metrics for Topic Segmentation
Iacopo Ghinassi | Lin Wang | Chris Newell | Matthew Purver
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

In this paper we propose a new framework and new methods for the reference-free evaluation of topic segmentation systems directly in the embedding space. Specifically, we define a common framework for reference-free, embedding-based topic segmentation metrics, and show how this applies to an existing metric. We then define new metrics, based on a previously defined cohesion score, Average Relative Proximity. Using this approach, we show that Large Language Models (LLMs) yield features that, if used correctly, can strongly correlate with traditional topic segmentation metrics based on costly and rare human annotations, while outperforming existing reference-free metrics borrowed from clustering evaluation in most domains. We then show that smaller language models specifically fine-tuned for different sentence-level tasks can outperform LLMs several orders of magnitude larger. Via a thorough comparison of our metric’s performance across different datasets, we see that conversational data present the biggest challenge in this framework. Finally, we analyse the behaviour of our metrics in specific error cases, such as those of under-generation and moving of ground truth topic boundaries, and show that our metrics behave more consistently than other reference-free methods.

2023

pdf bib
Lexicools at SemEval-2023 Task 10: Sexism Lexicon Construction via XAI
Pakawat Nakwijit | Mahmoud Samir | Matthew Purver
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

This paper presents our work on the SemEval-2023 Task 10 Explainable Detection of Online Sexism (EDOS) using lexicon-based models. Our approach consists of three main steps: lexicon construction based on Pointwise Mutual Information (PMI) and Shapley value, lexicon augmentation using an unannotated corpus and Large Language Models (LLMs), and, lastly, lexical incorporation for Bag-of-Word (BoW) logistic regression and fine-tuning LLMs. Our results demonstrate that our Shapley approach effectively produces a high-quality lexicon. We also show that by simply counting the presence of certain words in our lexicons and comparing the count can outperform a BoW logistic regression in task B/C and fine-tuning BERT in task C. In the end, our classifier achieved F1-scores of 53.34\% and 27.31\% on the official blind test sets for tasks B and C, respectively. We, additionally, provide in-depth analysis highlighting model limitation and bias. We also present our attempts to understand the model’s behaviour based on our constructed lexicons. Our code and the resulting lexicons are open-sourced in our GitHub repository https://github.com/SirBadr/SemEval2022-Task10.

pdf bib
Lon-eå at SemEval-2023 Task 11: A Comparison of Activation Functions for Soft and Hard Label Prediction
Peyman Hosseini | Mehran Hosseini | Sana Al-azzawi | Marcus Liwicki | Ignacio Castro | Matthew Purver
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

We study the influence of different activation functions in the output layer of pre-trained transformer models for soft and hard label prediction in the learning with disagreement task. In this task, the goal is to quantify the amount of disagreement via predicting soft labels. To predict the soft labels, we use BERT-based preprocessors and encoders and vary the activation function used in the output layer, while keeping other parameters constant. The soft labels are then used for the hard label prediction. The activation functions considered are sigmoid as well as a step-function that is added to the model post-training and a sinusoidal activation function, which is introduced for the first time in this paper.

pdf bib
Tracing Linguistic Markers of Influence in a Large Online Organisation
Prashant Khare | Ravi Shekhar | Mladen Karan | Stephen McQuistin | Colin Perkins | Ignacio Castro | Gareth Tyson | Patrick Healey | Matthew Purver
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Social science and psycholinguistic research have shown that power and status affect how people use language in a range of domains. Here, we investigate a similar question in a large, distributed, consensus-driven community with little traditional power hierarchy – the Internet Engineering Task Force (IETF), a collaborative organisation that designs internet standards. Our analysis based on lexical categories (LIWC) and BERT, shows that participants’ levels of influence can be predicted from their email text, and identify key linguistic differences (e.g., certain LIWC categories, such as “WE” are positively correlated with high-influence). We also identify the differences in language use for the same person before and after becoming influential.

pdf bib
Re-appraising the Schema Linking for Text-to-SQL
Yujian Gan | Xinyun Chen | Matthew Purver
Findings of the Association for Computational Linguistics: ACL 2023

Most text-to-SQL models, even though based on the same grammar decoder, generate the SQL structure first and then fill in the SQL slots with the correct schema items. This second step depends on schema linking: aligning the entity references in the question with the schema columns or tables. This is generally approached via Exact Match based Schema Linking (EMSL) within a neural network-based schema linking module. EMSL has become standard in text-to-SQL: many state-of-the-art models employ EMSL, with performance dropping significantly when the EMSL component is removed. In this work, however, we show that EMSL reduces robustness, rendering models vulnerable to synonym substitution and typos. Instead of relying on EMSL to make up for deficiencies in question-schema encoding, we show that using a pre-trained language model as an encoder can improve performance without using EMSL, giving a more robust model. We also study the design choice of the schema linking module, finding that a suitable design benefits performance and interoperability. Finally, based on the above study of schema linking, we introduce the grammar linking to help model align grammar references in the question with the SQL keywords.

pdf bib
LEDA: a Large-Organization Email-Based Decision-Dialogue-Act Analysis Dataset
Mladen Karan | Prashant Khare | Ravi Shekhar | Stephen McQuistin | Ignacio Castro | Gareth Tyson | Colin Perkins | Patrick Healey | Matthew Purver
Findings of the Association for Computational Linguistics: ACL 2023

Collaboration increasingly happens online. This is especially true for large groups working on global tasks, with collaborators all around the globe. The size and distributed nature of such groups makes decision-making challenging. This paper proposes a set of dialog acts for the study of decision-making mechanisms in such groups, and provides a new annotated dataset based on real-world data from the public mail-archives of one such organisation – the Internet Engineering Task Force (IETF). We provide an initial data analysis showing that this dataset can be used to better understand decision-making in such organisations. Finally, we experiment with a preliminary transformer-based dialog act tagging model.

pdf bib
Reformulating NLP tasks to Capture Longitudinal Manifestation of Language Disorders in People with Dementia.
Dimitris Gkoumas | Matthew Purver | Maria Liakata
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Dementia is associated with language disorders which impede communication. Here, we automatically learn linguistic disorder patterns by making use of a moderately-sized pre-trained language model and forcing it to focus on reformulated natural language processing (NLP) tasks and associated linguistic patterns. Our experiments show that NLP tasks that encapsulate contextual information and enhance the gradient signal with linguistic patterns benefit performance. We then use the probability estimates from the best model to construct digital linguistic markers measuring the overall quality in communication and the intensity of a variety of language disorders. We investigate how the digital markers characterize dementia speech from a longitudinal perspective. We find that our proposed communication marker is able to robustly and reliably characterize the language of people with dementia, outperforming existing linguistic approaches; and shows external validity via significant correlation with clinical markers of behaviour. Finally, our proposed linguistic disorder markers provide useful insights into gradual language impairment associated with disease progression.

pdf bib
Lessons Learnt from Linear Text Segmentation: a Fair Comparison of Architectural and Sentence Encoding Strategies for Successful Segmentation
Iacopo Ghinassi | Lin Wang | Chris Newell | Matthew Purver
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing

Recent works on linear text segmentation have shown new state-of-the-art results nearly every year. Most times, however, these recent advances include a variety of different elements which makes it difficult to evaluate which individual components of the proposed methods bring about improvements for the task and, more generally, what actually works for linear text segmentation. Moreover, evaluating text segmentation is notoriously difficult and the use of a metric such as Pk, which is widely used in existing literature, presents specific problems that complicates a fair comparison between segmentation models. In this work, then, we draw from a number of existing works to assess which is the state-of-the-art in linear text segmentation, investigating what architectures and features work best for the task. For doing so, we present three models representative of a variety of approaches, we compare them to existing methods and we inspect elements composing them, so as to give a more complete picture of which technique is more successful and why that might be the case. At the same time, we highlight a specific feature of Pk which can bias the results and we report our results using different settings, so as to give future literature a more comprehensive set of baseline results for future developments. We then hope that this work can serve as a solid foundation to foster research in the area, overcoming task-specific difficulties such as evaluation setting and providing new state-of-the-art results.

pdf bib
Analysis of Transfer Learning for Named Entity Recognition in South-Slavic Languages
Nikola Ivačič | Thi Hong Hanh Tran | Boshko Koloski | Senja Pollak | Matthew Purver
Proceedings of the 9th Workshop on Slavic Natural Language Processing 2023 (SlavicNLP 2023)

This paper analyzes a Named Entity Recognition task for South-Slavic languages using the pre-trained multilingual neural network models. We investigate whether the performance of the models for a target language can be improved by using data from closely related languages. We have shown that the model performance is not influenced substantially when trained with other than a target language. While for Slovene, the monolingual setting generally performs better, for Croatian and Serbian the results are slightly better in selected cross-lingual settings, but the improvements are not large. The most significant performance improvement is shown for the Serbian language, which has the smallest corpora. Therefore, fine-tuning with other closely related languages may benefit only the “low resource” languages.

2022

pdf bib
Proceedings of the Eighth Workshop on Computational Linguistics and Clinical Psychology
Ayah Zirikly | Dana Atzil-Slonim | Maria Liakata | Steven Bedrick | Bart Desmet | Molly Ireland | Andrew Lee | Sean MacAvaney | Matthew Purver | Rebecca Resnik | Andrew Yates
Proceedings of the Eighth Workshop on Computational Linguistics and Clinical Psychology

pdf bib
Measuring and Improving Compositional Generalization in Text-to-SQL via Component Alignment
Yujian Gan | Xinyun Chen | Qiuping Huang | Matthew Purver
Findings of the Association for Computational Linguistics: NAACL 2022

In text-to-SQL tasks — as in much of NLP — compositional generalization is a major challenge: neural networks struggle with compositional generalization where training and test distributions differ. However, most recent attempts to improve this are based on word-level synthetic data or specific dataset splits to generate compositional biases. In this work, we propose a clause-level compositional example generation method. We first split the sentences in the Spider text-to-SQL dataset into sub-sentences, annotating each sub-sentence with its corresponding SQL clause, resulting in a new dataset Spider-SS. We then construct a further dataset, Spider-CG, by composing Spider-SS sub-sentences in different combinations, to test the ability of models to generalize compositionally. Experiments show that existing models suffer significant performance degradation when evaluated on Spider-CG, even though every sub-sentence is seen during training. To deal with this problem, we modify a number of state-of-the-art models to train on the segmented data of Spider-SS, and we show that this method improves the generalization performance.

pdf bib
CoRAL: a Context-aware Croatian Abusive Language Dataset
Ravi Shekhar | Mladen Karan | Matthew Purver
Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022

In light of unprecedented increases in the popularity of the internet and social media, comment moderation has never been a more relevant task. Semi-automated comment moderation systems greatly aid human moderators by either automatically classifying the examples or allowing the moderators to prioritize which comments to consider first. However, the concept of inappropriate content is often subjective, and such content can be conveyed in many subtle and indirect ways. In this work, we propose CoRAL – a language and culturally aware Croatian Abusive dataset covering phenomena of implicitness and reliance on local and global context. We show experimentally that current models degrade when comments are not explicit and further degrade when language skill and context knowledge are required to interpret the comment.

pdf bib
Knowledge informed sustainability detection from short financial texts
Boshko Koloski | Syrielle Montariol | Matthew Purver | Senja Pollak
Proceedings of the Fourth Workshop on Financial Technology and Natural Language Processing (FinNLP)

There is a global trend for responsible investing and the need for developing automated methods for analyzing and Environmental, Social and Governance (ESG) related elements in financial texts is raising. In this work we propose a solution to the FinSim4-ESG task, consisting of binary classification of sentences into sustainable or unsustainable. We propose a novel knowledge-based latent heterogeneous representation that is based on knowledge from taxonomies and knowledge graphs and multiple contemporary document representations. We hypothesize that an approach based on a combination of knowledge and document representations can introduce significant improvement over conventional document representation approaches. We consider ensembles on classifier as well on representation level late-fusion and early fusion. The proposed approaches achieve competitive accuracy of 89 and are 5.85 behind the best achieved score.

pdf bib
Misspelling Semantics in Thai
Pakawat Nakwijit | Matthew Purver
Proceedings of the Thirteenth Language Resources and Evaluation Conference

User-generated content is full of misspellings. Rather than being just random noise, we hypothesise that many misspellings contain hidden semantics that can be leveraged for language understanding tasks. This paper presents a fine-grained annotated corpus of misspelling in Thai, together with an analysis of misspelling intention and its possible semantics to get a better understanding of the misspelling patterns observed in the corpus. In addition, we introduce two approaches to incorporate the semantics of misspelling: Misspelling Average Embedding (MAE) and Misspelling Semantic Tokens (MST). Experiments on a sentiment analysis task confirm our overall hypothesis: additional semantics from misspelling can boost the micro F1 score up to 0.4-2%, while blindly normalising misspelling is harmful and suboptimal.

pdf bib
Tracking Changes in ESG Representation: Initial Investigations in UK Annual Reports
Matthew Purver | Matej Martinc | Riste Ichev | Igor Lončarski | Katarina Sitar Šuštar | Aljoša Valentinčič | Senja Pollak
Proceedings of the First Computing Social Responsibility Workshop within the 13th Language Resources and Evaluation Conference

We describe initial work into analysing the language used around environmental, social and governance (ESG) issues in UK company annual reports. We collect a dataset of annual reports from UK FTSE350 companies over the years 2012-2019; separately, we define a categorized list of core ESG terms (single words and multi-word expressions) by combining existing lists with manual annotation. We then show that this list can be used to analyse the changes in ESG language in the dataset over time, via a combination of language modelling and distributional modelling via contextual word embeddings. Initial findings show that while ESG discussion in annual reports is becoming significantly more likely over time, the increase varies with category and with individual terms, and that some terms show noticeable changes in usage.

pdf bib
JSI at SemEval-2022 Task 1: CODWOE - Reverse Dictionary: Monolingual and cross-lingual approaches
Thi Hong Hanh Tran | Matej Martinc | Matthew Purver | Senja Pollak
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

The reverse dictionary task is a sequence-to-vector task in which a gloss is provided as input, and the output must be a semantically matching word vector. The reverse dictionary is useful in practical applications such as solving the tip-of-the-tongue problem, helping new language learners, etc. In this paper, we evaluate the effect of a Transformer-based model with cross-lingual zero-shot learning to improve the reverse dictionary performance. Our experiments are conducted in five languages in the CODWOE dataset, including English, French, Italian, Spanish, and Russian. Even if we did not achieve a good ranking in the CODWOE competition, we show that our work partially improves the current baseline from the organizers with a hypothesis on the impact of LSTM in monolingual, multilingual, and zero-shot learning. All the codes are available at https://github.com/honghanhh/codwoe2021.

2021

pdf bib
Towards Robustness of Text-to-SQL Models against Synonym Substitution
Yujian Gan | Xinyun Chen | Qiuping Huang | Matthew Purver | John R. Woodward | Jinxia Xie | Pengsheng Huang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Recently, there has been significant progress in studying neural networks to translate text descriptions into SQL queries. Despite achieving good performance on some public benchmarks, existing text-to-SQL models typically rely on the lexical matching between words in natural language (NL) questions and tokens in table schemas, which may render the models vulnerable to attacks that break the schema linking mechanism. In this work, we investigate the robustness of text-to-SQL models to synonym substitution. In particular, we introduce Spider-Syn, a human-curated dataset based on the Spider benchmark for text-to-SQL translation. NL questions in Spider-Syn are modified from Spider, by replacing their schema-related words with manually selected synonyms that reflect real-world question paraphrases. We observe that the accuracy dramatically drops by eliminating such explicit correspondence between NL questions and table schemas, even if the synonyms are not adversarially selected to conduct worst-case attacks. Finally, we present two categories of approaches to improve the model robustness. The first category of approaches utilizes additional synonym annotations for table schemas by modifying the model input, while the second category is based on adversarial training. We demonstrate that both categories of approaches significantly outperform their counterparts without the defense, and the first category of approaches are more effective.

pdf bib
Not All Comments Are Equal: Insights into Comment Moderation from a Topic-Aware Model
Elaine Zosa | Ravi Shekhar | Mladen Karan | Matthew Purver
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

Moderation of reader comments is a significant problem for online news platforms. Here, we experiment with models for automatic moderation, using a dataset of comments from a popular Croatian newspaper. Our analysis shows that while comments that violate the moderation rules mostly share common linguistic and thematic features, their content varies across the different sections of the newspaper. We therefore make our models topic-aware, incorporating semantic features from a topic model into the classification decision. Our results show that topic information improves the performance of the model, increases its confidence in correct outputs, and helps us understand the model’s outputs.

pdf bib
Communicative Grounding of Analogical Explanations in Dialogue: A Corpus Study of Conversational Management Acts and Statistical Sequence Models for Tutoring through Analogy
Jorge Del-Bosque-Trevino | Julian Hough | Matthew Purver
Proceedings of the Reasoning and Interaction Conference (ReInAct 2021)

We present a conversational management act (CMA) annotation schema for one-to-one tutorial dialogue sessions where a tutor uses an analogy to teach a student a concept. CMAs are more fine-grained sub-utterance acts compared to traditional dialogue act mark-up. The schema achieves an inter-annotator agreement (IAA) Cohen Kappa score of at least 0.66 across all 10 classes. We annotate a corpus of analogical episodes with the schema and develop statistical sequence models from the corpus which predict tutor content related decisions, in terms of the selection of the analogical component (AC) and tutor conversational management act (TCMA) to deploy at the current utterance, given the student’s behaviour. CRF sequence classifiers perform well on AC selection and robustly on TCMA selection, achieving respective accuracies of 61.9% and 56.3% on a cross-validation experiment over the corpus.

pdf bib
Rare-Class Dialogue Act Tagging for Alzheimer’s Disease Diagnosis
Shamila Nasreen | Julian Hough | Matthew Purver
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue

Alzheimer’s Disease (AD) is associated with many characteristic changes, not only in an individual’s language but also in the interactive patterns observed in dialogue. The most indicative changes of this latter kind tend to be associated with relatively rare dialogue acts (DAs), such as those involved in clarification exchanges and responses to particular kinds of questions. However, most existing work in DA tagging focuses on improving average performance, effectively prioritizing more frequent classes; it thus gives a poor performance on these rarer classes and is not suited for application to AD analysis. In this paper, we investigate tagging specifically for rare class DAs, using a hierarchical BiLSTM model with various ways of incorporating information from previous utterances and DA tags in context. We show that this can give good performance for rare DA classes on both the general Switchboard corpus (SwDA) and an AD-specific conversational dataset, the Carolinas Conversation Collection (CCC); and that the tagger outputs then contribute useful information for distinguishing patients with and without AD

pdf bib
Mitigating Topic Bias when Detecting Decisions in Dialogue
Mladen Karan | Prashant Khare | Patrick Healey | Matthew Purver
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue

This work revisits the task of detecting decision-related utterances in multi-party dialogue. We explore performance of a traditional approach and a deep learning-based approach based on transformer language models, with the latter providing modest improvements. We then analyze topic bias in the models using topic information obtained by manual annotation. Our finding is that when detecting some types of decisions in our data, models rely more on topic specific words that decisions are about rather than on words that more generally indicate decision making. We further explore this by removing topic information from the train data. We show that this resolves the bias issues to an extent and, surprisingly, sometimes even boosts performance.

pdf bib
Natural SQL: Making SQL Easier to Infer from Natural Language Specifications
Yujian Gan | Xinyun Chen | Jinxia Xie | Matthew Purver | John R. Woodward | John Drake | Qiaofu Zhang
Findings of the Association for Computational Linguistics: EMNLP 2021

Addressing the mismatch between natural language descriptions and the corresponding SQL queries is a key challenge for text-to-SQL translation. To bridge this gap, we propose an SQL intermediate representation (IR) called Natural SQL (NatSQL). Specifically, NatSQL preserves the core functionalities of SQL, while it simplifies the queries as follows: (1) dispensing with operators and keywords such as GROUP BY, HAVING, FROM, JOIN ON, which are usually hard to find counterparts in the text descriptions; (2) removing the need of nested subqueries and set operators; and (3) making the schema linking easier by reducing the required number of schema items. On Spider, a challenging text-to-SQL benchmark that contains complex and nested SQL queries, we demonstrate that NatSQL outperforms other IRs, and significantly improves the performance of several previous SOTA models. Furthermore, for existing models that do not support executable SQL generation, NatSQL easily enables them to generate executable SQL queries, and achieves the new state-of-the-art execution accuracy.

pdf bib
Exploring Underexplored Limitations of Cross-Domain Text-to-SQL Generalization
Yujian Gan | Xinyun Chen | Matthew Purver
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Recently, there has been significant progress in studying neural networks for translating text descriptions into SQL queries under the zero-shot cross-domain setting. Despite achieving good performance on some public benchmarks, we observe that existing text-to-SQL models do not generalize when facing domain knowledge that does not frequently appear in the training data, which may render the worse prediction performance for unseen domains. In this work, we investigate the robustness of text-to-SQL models when the questions require rarely observed domain knowledge. In particular, we define five types of domain knowledge and introduce Spider-DK (DK is the abbreviation of domain knowledge), a human-curated dataset based on the Spider benchmark for text-to-SQL translation. NL questions in Spider-DK are selected from Spider, and we modify some samples by adding domain knowledge that reflects real-world question paraphrases. We demonstrate that the prediction accuracy dramatically drops on samples that require such domain knowledge, even if the domain knowledge appears in the training set, and the model provides the correct predictions for related training samples.

pdf bib
Zero-shot Cross-lingual Content Filtering: Offensive Language and Hate Speech Detection
Andraž Pelicon | Ravi Shekhar | Matej Martinc | Blaž Škrlj | Matthew Purver | Senja Pollak
Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation

We present a system for zero-shot cross-lingual offensive language and hate speech classification. The system was trained on English datasets and tested on a task of detecting hate speech and offensive social media content in a number of languages without any additional training. Experiments show an impressive ability of both models to generalize from English to other languages. There is however an expected gap in performance between the tested cross-lingual models and the monolingual models. The best performing model (offensive content classifier) is available online as a REST API.

pdf bib
EMBEDDIA Tools, Datasets and Challenges: Resources and Hackathon Contributions
Senja Pollak | Marko Robnik-Šikonja | Matthew Purver | Michele Boggia | Ravi Shekhar | Marko Pranjić | Salla Salmela | Ivar Krustok | Tarmo Paju | Carl-Gustav Linden | Leo Leppänen | Elaine Zosa | Matej Ulčar | Linda Freienthal | Silver Traat | Luis Adrián Cabrera-Diego | Matej Martinc | Nada Lavrač | Blaž Škrlj | Martin Žnidaršič | Andraž Pelicon | Boshko Koloski | Vid Podpečan | Janez Kranjc | Shane Sheehan | Emanuela Boros | Jose G. Moreno | Antoine Doucet | Hannu Toivonen
Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation

This paper presents tools and data sources collected and released by the EMBEDDIA project, supported by the European Union’s Horizon 2020 research and innovation program. The collected resources were offered to participants of a hackathon organized as part of the EACL Hackashop on News Media Content Analysis and Automated Report Generation in February 2021. The hackathon had six participating teams who addressed different challenges, either from the list of proposed challenges or their own news-industry-related tasks. This paper goes beyond the scope of the hackathon, as it brings together in a coherent and compact form most of the resources developed, collected and released by the EMBEDDIA project. Moreover, it constitutes a handy source for news media industry and researchers in the fields of Natural Language Processing and Social Science.

2020

pdf bib
A Review of Cross-Domain Text-to-SQL Models
Yujian Gan | Matthew Purver | John R. Woodward
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing: Student Research Workshop

WikiSQL and Spider, the large-scale cross-domain text-to-SQL datasets, have attracted much attention from the research community. The leaderboards of WikiSQL and Spider show that many researchers propose their models trying to solve the text-to-SQL problem. This paper first divides the top models in these two leaderboards into two paradigms. We then present details not mentioned in their original paper by evaluating the key components, including schema linking, pretrained word embeddings, and reasoning assistance modules. Based on the analysis of these models, we want to promote understanding of the text-to-SQL field and find out some interesting future works, for example, it is worth studying the text-to-SQL problem in an environment where it is more challenging to build schema linking and also worth studying combing the advantage of each model toward text-to-SQL.

pdf bib
SemEval-2020 Task 3: Graded Word Similarity in Context
Carlos Santos Armendariz | Matthew Purver | Senja Pollak | Nikola Ljubešić | Matej Ulčar | Ivan Vulić | Mohammad Taher Pilehvar
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This paper presents the Graded Word Similarity in Context (GWSC) task which asked participants to predict the effects of context on human perception of similarity in English, Croatian, Slovene and Finnish. We received 15 submissions and 11 system description papers. A new dataset (CoSimLex) was created for evaluation in this task: it contains pairs of words, each annotated within two different contexts. Systems beat the baselines by significant margins, but few did well in more than one language or subtask. Almost every system employed a Transformer model, but with many variations in the details: WordNet sense embeddings, translation of contexts, TF-IDF weightings, and the automatic creation of datasets for fine-tuning were all used to good effect.

pdf bib
CoSimLex: A Resource for Evaluating Graded Word Similarity in Context
Carlos Santos Armendariz | Matthew Purver | Matej Ulčar | Senja Pollak | Nikola Ljubešić | Mark Granroth-Wilding
Proceedings of the Twelfth Language Resources and Evaluation Conference

State of the art natural language processing tools are built on context-dependent word embeddings, but no direct method for evaluating these representations currently exists. Standard tasks and datasets for intrinsic evaluation of embeddings are based on judgements of similarity, but ignore context; standard tasks for word sense disambiguation take account of context but do not provide continuous measures of meaning similarity. This paper describes an effort to build a new dataset, CoSimLex, intended to fill this gap. Building on the standard pairwise similarity task of SimLex-999, it provides context-dependent similarity measures; covers not only discrete differences in word sense but more subtle, graded changes in meaning; and covers not only a well-resourced language (English) but a number of less-resourced languages. We define the task and evaluation metrics, outline the dataset collection methodology, and describe the status of the dataset so far.

pdf bib
Temporal Mental Health Dynamics on Social Media
Tom Tabak | Matthew Purver
Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020

We describe a set of experiments for building a temporal mental health dynamics system. We utilise a pre-existing methodology for distant- supervision of mental health data mining from social media platforms and deploy the system during the global COVID-19 pandemic as a case study. Despite the challenging nature of the task, we produce encouraging results, both explicit to the global pandemic and implicit to a global phenomenon, Christmas Depres- sion, supported by the literature. We propose a methodology for providing insight into tem- poral mental health dynamics to be utilised for strategic decision-making.

pdf bib
How Furiously Can Colorless Green Ideas Sleep? Sentence Acceptability in Context
Jey Han Lau | Carlos Armendariz | Shalom Lappin | Matthew Purver | Chang Shu
Transactions of the Association for Computational Linguistics, Volume 8

We study the influence of context on sentence acceptability. First we compare the acceptability ratings of sentences judged in isolation, with a relevant context, and with an irrelevant context. Our results show that context induces a cognitive load for humans, which compresses the distribution of ratings. Moreover, in relevant contexts we observe a discourse coherence effect that uniformly raises acceptability. Next, we test unidirectional and bidirectional language models in their ability to predict acceptability ratings. The bidirectional models show very promising results, with the best model achieving a new state-of-the-art for unsupervised acceptability prediction. The two sets of experiments provide insights into the cognitive aspects of sentence processing and central issues in the computational modeling of text and discourse.

2019

pdf bib
Proceedings of the IWCS Workshop Vector Semantics for Discourse and Dialogue
Mehrnoosh Sadrzadeh | Matthew Purver | Arash Eshghi | Julian Hough | Ruth Kempson | Patrick G. T. Healey
Proceedings of the IWCS Workshop Vector Semantics for Discourse and Dialogue

2017

pdf bib
Incongruent Headlines: Yet Another Way to Mislead Your Readers
Sophie Chesney | Maria Liakata | Massimo Poesio | Matthew Purver
Proceedings of the 2017 EMNLP Workshop: Natural Language Processing meets Journalism

This paper discusses the problem of incongruent headlines: those which do not accurately represent the information contained in the article with which they occur. We emphasise that this phenomenon should be considered separately from recognised problematic headline types such as clickbait and sensationalism, arguing that existing natural language processing (NLP) methods applied to these related concepts are not appropriate for the automatic detection of headline incongruence, as an analysis beyond stylistic traits is necessary. We therefore suggest a number of alternative methodologies that may be appropriate to the task at hand as a foundation for future work in this area. In addition, we provide an analysis of existing data sets which are related to this work, and motivate the need for a novel data set in this domain.

pdf bib
A Geometric Method for Detecting Semantic Coercion
Stephen McGregor | Elisabetta Jezek | Matthew Purver | Geraint Wiggins
Proceedings of the 12th International Conference on Computational Semantics (IWCS) — Long papers

2016

pdf bib
Proceedings of the INLG 2016 Workshop on Computational Creativity in Natural Language Generation
Matthew Purver | Pablo Gervás | Sascha Griffiths
Proceedings of the INLG 2016 Workshop on Computational Creativity in Natural Language Generation

pdf bib
Process Based Evaluation of Computer Generated Poetry
Stephen McGregor | Matthew Purver | Geraint Wiggins
Proceedings of the INLG 2016 Workshop on Computational Creativity in Natural Language Generation

pdf bib
Robust Co-occurrence Quantification for Lexical Distributional Semantics
Dmitrijs Milajevs | Mehrnoosh Sadrzadeh | Matthew Purver
Proceedings of the ACL 2016 Student Research Workshop

2015

pdf bib
Proceedings of the 11th International Conference on Computational Semantics
Matthew Purver | Mehrnoosh Sadrzadeh | Matthew Stone
Proceedings of the 11th International Conference on Computational Semantics

pdf bib
Feedback in Conversation as Incremental Semantic Update
Arash Eshghi | Christine Howes | Eleni Gregoromichelaki | Julian Hough | Matthew Purver
Proceedings of the 11th International Conference on Computational Semantics

pdf bib
Proceedings of the 15th European Workshop on Natural Language Generation (ENLG)
Anya Belz | Albert Gatt | François Portet | Matthew Purver
Proceedings of the 15th European Workshop on Natural Language Generation (ENLG)

2014

pdf bib
Probabilistic Type Theory for Incremental Dialogue Processing
Julian Hough | Matthew Purver
Proceedings of the EACL 2014 Workshop on Type Theory and Natural Language Semantics (TTNLS)

pdf bib
Investigating the Contribution of Distributional Semantic Information for Dialogue Act Classification
Dmitrijs Milajevs | Matthew Purver
Proceedings of the 2nd Workshop on Continuous Vector Space Models and their Compositionality (CVSC)

pdf bib
Linguistic Indicators of Severity and Progress in Online Text-based Therapy for Depression
Christine Howes | Matthew Purver | Rose McCabe
Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality

pdf bib
A Simple Baseline for Discriminating Similar Languages
Matthew Purver
Proceedings of the First Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects

pdf bib
Strongly Incremental Repair Detection
Julian Hough | Matthew Purver
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Evaluating Neural Word Representations in Tensor-Based Compositional Settings
Dmitrijs Milajevs | Dimitri Kartsaklis | Mehrnoosh Sadrzadeh | Matthew Purver
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

2013

pdf bib
Probabilistic induction for an incremental semantic grammar
Arash Eshghi | Matthew Purver | Julian Hough
Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013) – Long Papers

pdf bib
Investigating Topic Modelling for Therapy Dialogue Analysis
Christine Howes | Matthew Purver | Rose McCabe
Proceedings of the IWCS 2013 Workshop on Computational Semantics in Clinical Text (CSCT 2013)

pdf bib
Incremental Grammar Induction from Child-Directed Dialogue Utterances
Arash Eshghi | Julian Hough | Matthew Purver
Proceedings of the Fourth Annual Workshop on Cognitive Modeling and Computational Linguistics (CMCL)

2012

pdf bib
Predicting Adherence to Treatment for Schizophrenia from Dialogue Transcripts
Christine Howes | Matthew Purver | Rose McCabe | Patrick G. T. Healey | Mary Lavelle
Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue

pdf bib
Experimenting with Distant Supervision for Emotion Classification
Matthew Purver | Stuart Battersby
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

2011

pdf bib
Incremental Semantic Construction in a Dialogue System
Matthew Purver | Arash Eshghi | Julian Hough
Proceedings of the Ninth International Conference on Computational Semantics (IWCS 2011)

2009

pdf bib
Proceedings of the SIGDIAL 2009 Conference
Patrick Healey | Roberto Pieraccini | Donna Byron | Steve Young | Matthew Purver
Proceedings of the SIGDIAL 2009 Conference

pdf bib
Split Utterances in Dialogue: a Corpus Study
Matthew Purver | Christine Howes | Eleni Gregoromichelaki | Patrick Healey
Proceedings of the SIGDIAL 2009 Conference

pdf bib
Cascaded Lexicalised Classifiers for Second-Person Reference Resolution
Matthew Purver | Raquel Fernández | Matthew Frampton | Stanley Peters
Proceedings of the SIGDIAL 2009 Conference

2008

pdf bib
Modelling and Detecting Decisions in Multi-party Dialogue
Raquel Fernández | Matthew Frampton | Patrick Ehlen | Matthew Purver | Stanley Peters
Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue

2007

pdf bib
A Conversational In-Car Dialog System
Baoshi Yan | Fuliang Weng | Zhe Feng | Florin Ratiu | Madhuri Raya | Yao Meng | Sebastian Varges | Matthew Purver | Annie Lien | Tobias Scheideck | Badri Raghunathan | Feng Lin | Rohit Mishra | Brian Lathrop | Zhaoxia Zhang | Harry Bratt | Stanley Peters
Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT)

pdf bib
Disambiguating Between Generic and Referential “You” in Dialog
Surabhi Gupta | Matthew Purver | Dan Jurafsky
Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions

pdf bib
Detecting and Summarizing Action Items in Multi-Party Dialogue
Matthew Purver | John Dowding | John Niekrasz | Patrick Ehlen | Sharareh Noorbaloochi | Stanley Peters
Proceedings of the 8th SIGdial Workshop on Discourse and Dialogue

pdf bib
CHAT to Your Destination
Fuliang Weng | Baoshi Yan | Zhe Feng | Florin Ratiu | Madhuri Raya | Brian Lathrop | Annie Lien | Sebastian Varges | Rohit Mishra | Feng Lin | Matthew Purver | Harry Bratt | Yao Meng | Stanley Peters | Tobias Scheideck | Badri Raghunathan | Zhaoxia Zhang
Proceedings of the 8th SIGdial Workshop on Discourse and Dialogue

pdf bib
Resolving “You” in Multi-Party Dialog
Surabhi Gupta | John Niekrasz | Matthew Purver | Dan Jurafsky
Proceedings of the 8th SIGdial Workshop on Discourse and Dialogue

2006

pdf bib
Unsupervised Topic Modelling for Multi-Party Spoken Discourse
Matthew Purver | Konrad P. Körding | Thomas L. Griffiths | Joshua B. Tenenbaum
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf bib
Shallow Discourse Structure for Action Item Detection
Matthew Purver | Patrick Ehlen | John Niekrasz
Proceedings of the Analyzing Conversations in Text and Speech

2005

pdf bib
Meeting Structure Annotation: Data and Tools
Alexander Gruenstein | John Niekrasz | Matthew Purver
Proceedings of the 6th SIGdial Workshop on Discourse and Dialogue

pdf bib
Combining Confidence Scores with Contextual Features for Robust Multi-Device Dialogue
Lawrence Cavedon | Matthew Purver | Florin Ratiu
Proceedings of the Australasian Language Technology Workshop 2005

2004

pdf bib
Incremental Parsing, or Incremental Grammar?
Matthew Purver | Ruth Kempson
Proceedings of the Workshop on Incremental Parsing: Bringing Engineering and Cognition Together

2003

pdf bib
Answering Clarification Questions
Matthew Purver | Patrick G.T. Healey | James King | Jonathan Ginzburg | Greg J. Mills
Proceedings of the Fourth SIGdial Workshop of Discourse and Dialogue

pdf bib
Incremental Generation by Incremental Parsing: Tactical Generation in Dynamic Syntax
Matthew Purver | Masayuki Otsuka
Proceedings of the 9th European Workshop on Natural Language Generation (ENLG-2003) at EACL 2003

2002

pdf bib
Processing Unknown Words in a Dialogue System
Matthew Purver
Proceedings of the Third SIGdial Workshop on Discourse and Dialogue

2001

pdf bib
On the Means for Clarification in Dialogue
Matthew Purver | Jonathan Ginzburg | Patrick Healey
Proceedings of the Second SIGdial Workshop on Discourse and Dialogue

Search
Co-authors