Andreas Vlachos


2021

pdf bib
Proceedings of the 5th Workshop on Structured Prediction for NLP (SPNLP 2021)
Zornitsa Kozareva | Sujith Ravi | Andreas Vlachos | Priyanka Agrawal | André Martins
Proceedings of the 5th Workshop on Structured Prediction for NLP (SPNLP 2021)

pdf bib
Elastic weight consolidation for better bias inoculation
James Thorne | Andreas Vlachos
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

The biases present in training datasets have been shown to affect models for sentence pair classification tasks such as natural language inference (NLI) and fact verification. While fine-tuning models on additional data has been used to mitigate them, a common issue is that of catastrophic forgetting of the original training dataset. In this paper, we show that elastic weight consolidation (EWC) allows fine-tuning of models to mitigate biases while being less susceptible to catastrophic forgetting. In our evaluation on fact verification and NLI stress tests, we show that fine-tuning with EWC dominates standard fine-tuning, yielding models with lower levels of forgetting on the original (biased) dataset for equivalent gains in accuracy on the fine-tuning (unbiased) dataset.

pdf bib
I Beg to Differ: A study of constructive disagreement in online conversations
Christine De Kock | Andreas Vlachos
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Disagreements are pervasive in human communication. In this paper we investigate what makes disagreement constructive. To this end, we construct WikiDisputes, a corpus of 7425 Wikipedia Talk page conversations that contain content disputes, and define the task of predicting whether disagreements will be escalated to mediation by a moderator. We evaluate feature-based models with linguistic markers from previous work, and demonstrate that their performance is improved by using features that capture changes in linguistic markers throughout the conversations, as opposed to averaged values. We develop a variety of neural models and show that taking into account the structure of the conversation improves predictive accuracy, exceeding that of feature-based models. We assess our best neural model in terms of both predictive accuracy and uncertainty by evaluating its behaviour when it is only exposed to the beginning of the conversation, finding that model accuracy improves and uncertainty reduces as models are exposed to more information.

pdf bib
Incremental Beam Manipulation for Natural Language Generation
James Hargreaves | Andreas Vlachos | Guy Emerson
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

The performance of natural language generation systems has improved substantially with modern neural networks. At test time they typically employ beam search to avoid locally optimal but globally suboptimal predictions. However, due to model errors, a larger beam size can lead to deteriorating performance according to the evaluation metric. For this reason, it is common to rerank the output of beam search, but this relies on beam search to produce a good set of hypotheses, which limits the potential gains. Other alternatives to beam search require changes to the training of the model, which restricts their applicability compared to beam search. This paper proposes incremental beam manipulation, i.e. reranking the hypotheses in the beam during decoding instead of only at the end. This way, hypotheses that are unlikely to lead to a good final output are discarded, and in their place hypotheses that would have been ignored will be considered instead. Applying incremental beam manipulation leads to an improvement of 1.93 and 5.82 BLEU points over vanilla beam search for the test sets of the E2E and WebNLG challenges respectively. The proposed method also outperformed a strong reranker by 1.04 BLEU points on the E2E challenge, while being on par with it on the WebNLG dataset.

pdf bib
Leveraging Type Descriptions for Zero-shot Named Entity Recognition and Classification
Rami Aly | Andreas Vlachos | Ryan McDonald
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

A common issue in real-world applications of named entity recognition and classification (NERC) is the absence of annotated data for the target entity classes during training. Zero-shot learning approaches address this issue by learning models from classes with training data that can predict classes without it. This paper presents the first approach for zero-shot NERC, introducing novel architectures that leverage the fact that textual descriptions for many entity classes occur naturally. We address the zero-shot NERC specific challenge that the not-an-entity class is not well defined as different entity classes are considered in training and testing. For evaluation, we adapt two datasets, OntoNotes and MedMentions, emulating the difficulty of real-world zero-shot learning by testing models on the rarest entity classes. Our proposed approach outperforms baselines adapted from machine reading comprehension and zero-shot text classification. Furthermore, we assess the effect of different class descriptions for this task.

pdf bib
Evidence-based Factual Error Correction
James Thorne | Andreas Vlachos
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

This paper introduces the task of factual error correction: performing edits to a claim so that the generated rewrite is better supported by evidence. This extends the well-studied task of fact verification by providing a mechanism to correct written texts that are refuted or only partially supported by evidence. We demonstrate that it is feasible to train factual error correction systems from existing fact checking datasets which only contain labeled claims accompanied by evidence, but not the correction. We achieve this by employing a two-stage distant supervision approach that incorporates evidence into masked claims when generating corrections. Our approach, based on the T5 transformer and using retrieved evidence, achieved better results than existing work which used a pointer copy network and gold evidence, producing accurate factual error corrections for 5x more instances in human evaluation and a .125 increase in SARI score. The evaluation is conducted on a dataset of 65,000 instances based on a recent fact verification shared task and we release it to enable further work on the task.

pdf bib
Trajectory-Based Meta-Learning for Out-Of-Vocabulary Word Embedding Learning
Gordon Buck | Andreas Vlachos
Proceedings of the Second Workshop on Domain Adaptation for NLP

Word embedding learning methods require a large number of occurrences of a word to accurately learn its embedding. However, out-of-vocabulary (OOV) words which do not appear in the training corpus emerge frequently in the smaller downstream data. Recent work formulated OOV embedding learning as a few-shot regression problem and demonstrated that meta-learning can improve results obtained. However, the algorithm used, model-agnostic meta-learning (MAML) is known to be unstable and perform worse when a large number of gradient steps are used for parameter updates. In this work, we propose the use of Leap, a meta-learning algorithm which leverages the entire trajectory of the learning process instead of just the beginning and the end points, and thus ameliorates these two issues. In our experiments on a benchmark OOV embedding learning dataset and in an extrinsic evaluation, Leap performs comparably or better than MAML. We go on to examine which contexts are most beneficial to learn an OOV embedding from, and propose that the choice of contexts may matter more than the meta-learning employed.

pdf bib
Survival text regression for time-to-event prediction in conversations
Christine De Kock | Andreas Vlachos
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

2020

pdf bib
Proceedings of the Fourth Workshop on Structured Prediction for NLP
Priyanka Agrawal | Zornitsa Kozareva | Julia Kreutzer | Gerasimos Lampouras | André Martins | Sujith Ravi | Andreas Vlachos
Proceedings of the Fourth Workshop on Structured Prediction for NLP

pdf bib
Proceedings of the Third Workshop on Fact Extraction and VERification (FEVER)
Christos Christodoulopoulos | James Thorne | Andreas Vlachos | Oana Cocarascu | Arpit Mittal
Proceedings of the Third Workshop on Fact Extraction and VERification (FEVER)

pdf bib
Generating Fact Checking Briefs
Angela Fan | Aleksandra Piktus | Fabio Petroni | Guillaume Wenzek | Marzieh Saeidi | Andreas Vlachos | Antoine Bordes | Sebastian Riedel
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Fact checking at scale is difficult—while the number of active fact checking websites is growing, it remains too small for the needs of the contemporary media ecosystem. However, despite good intentions, contributions from volunteers are often error-prone, and thus in practice restricted to claim detection. We investigate how to increase the accuracy and efficiency of fact checking by providing information about the claim before performing the check, in the form of natural language briefs. We investigate passage-based briefs, containing a relevant passage from Wikipedia, entity-centric ones consisting of Wikipedia pages of mentioned entities, and Question-Answering Briefs, with questions decomposing the claim, and their answers. To produce QABriefs, we develop QABriefer, a model that generates a set of questions conditioned on the claim, searches the web for evidence, and generates answers. To train its components, we introduce QABriefDataset We show that fact checking with briefs — in particular QABriefs — increases the accuracy of crowdworkers by 10% while slightly decreasing the time taken. For volunteer (unpaid) fact checkers, QABriefs slightly increase accuracy and reduce the time required by around 20%.

2019

pdf bib
Neural Generative Rhetorical Structure Parsing
Amandla Mabona | Laura Rimell | Stephen Clark | Andreas Vlachos
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Rhetorical structure trees have been shown to be useful for several document-level tasks including summarization and document classification. Previous approaches to RST parsing have used discriminative models; however, these are less sample efficient than generative models, and RST parsing datasets are typically small. In this paper, we present the first generative model for RST parsing. Our model is a document-level RNN grammar (RNNG) with a bottom-up traversal order. We show that, for our parser’s traversal order, previous beam search algorithms for RNNGs have a left-branching bias which is ill-suited for RST parsing.We develop a novel beam search algorithm that keeps track of both structure-and word-generating actions without exhibit-ing this branching bias and results in absolute improvements of 6.8 and 2.9 on unlabelled and labelled F1 over previous algorithms. Overall, our generative model outperforms a discriminative model with the same features by 2.6 F1points and achieves performance comparable to the state-of-the-art, outperforming all published parsers from a recent replication study that do not use additional training data

pdf bib
Evaluating adversarial attacks against multiple fact verification systems
James Thorne | Andreas Vlachos | Christos Christodoulopoulos | Arpit Mittal
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Automated fact verification has been progressing owing to advancements in modeling and availability of large datasets. Due to the nature of the task, it is critical to understand the vulnerabilities of these systems against adversarial instances designed to make them predict incorrectly. We introduce two novel scoring metrics, attack potency and system resilience which take into account the correctness of the adversarial instances, an aspect often ignored in adversarial evaluations. We consider six fact verification systems from the recent Fact Extraction and VERification (FEVER) challenge: the four best-scoring ones and two baselines. We evaluate adversarial instances generated by a recently proposed state-of-the-art method, a paraphrasing method, and rule-based attacks devised for fact verification. We find that our rule-based attacks have higher potency, and that while the rankings among the top systems changed, they exhibited higher resilience than the baselines.

pdf bib
Incorporating Label Dependencies in Multilabel Stance Detection
William Ferreira | Andreas Vlachos
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Stance detection in social media is a well-studied task in a variety of domains. Nevertheless, previous work has mostly focused on multiclass versions of the problem, where the labels are mutually exclusive, and typically positive, negative or neutral. In this paper, we address versions of the task in which an utterance can have multiple labels, thus corresponding to multilabel classification. We propose a method that explicitly incorporates label dependencies in the training objective and compare it against a variety of baselines, as well as a reduction of multilabel to multiclass learning. In experiments with three datasets, we find that our proposed method improves upon all baselines on two out of three datasets. We also show that the reduction of multilabel to multiclass classification can be very competitive, especially in cases where the output consists of a small number of labels and one can enumerate over all label combinations.

pdf bib
Proceedings of the Second Workshop on Fact Extraction and VERification (FEVER)
James Thorne | Andreas Vlachos | Oana Cocarascu | Christos Christodoulopoulos | Arpit Mittal
Proceedings of the Second Workshop on Fact Extraction and VERification (FEVER)

pdf bib
The FEVER2.0 Shared Task
James Thorne | Andreas Vlachos | Oana Cocarascu | Christos Christodoulopoulos | Arpit Mittal
Proceedings of the Second Workshop on Fact Extraction and VERification (FEVER)

We present the results of the second Fact Extraction and VERification (FEVER2.0) Shared Task. The task challenged participants to both build systems to verify factoid claims using evidence retrieved from Wikipedia and to generate adversarial attacks against other participant’s systems. The shared task had three phases: building, breaking and fixing. There were 8 systems in the builder’s round, three of which were new qualifying submissions for this shared task, and 5 adversaries generated instances designed to induce classification errors and one builder submitted a fixed system which had higher FEVER score and resilience than their first submission. All but one newly submitted systems attained FEVER scores higher than the best performing system from the first shared task and under adversarial evaluation, all systems exhibited losses in FEVER score. There was a great variety in adversarial attack types as well as the techniques used to generate the attacks, In this paper, we present the results of the shared task and a summary of the systems, highlighting commonalities and innovations among participating systems.

pdf bib
Generating Token-Level Explanations for Natural Language Inference
James Thorne | Andreas Vlachos | Christos Christodoulopoulos | Arpit Mittal
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

The task of Natural Language Inference (NLI) is widely modeled as supervised sentence pair classification. While there has been a lot of work recently on generating explanations of the predictions of classifiers on a single piece of text, there have been no attempts to generate explanations of classifiers operating on pairs of sentences. In this paper, we show that it is possible to generate token-level explanations for NLI without the need for training data explicitly annotated for this purpose. We use a simple LSTM architecture and evaluate both LIME and Anchor explanations for this task. We compare these to a Multiple Instance Learning (MIL) method that uses thresholded attention make token-level predictions. The approach we present in this paper is a novel extension of zero-shot single-sentence tagging to sentence pairs for NLI. We conduct our experiments on the well-studied SNLI dataset that was recently augmented with manually annotation of the tokens that explain the entailment relation. We find that our white-box MIL-based method, while orders of magnitude faster, does not reach the same accuracy as the black-box methods.

pdf bib
Strong Baselines for Complex Word Identification across Multiple Languages
Pierre Finnimore | Elisabeth Fritzsch | Daniel King | Alison Sneyd | Aneeq Ur Rehman | Fernando Alva-Manchego | Andreas Vlachos
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Complex Word Identification (CWI) is the task of identifying which words or phrases in a sentence are difficult to understand by a target audience. The latest CWI Shared Task released data for two settings: monolingual (i.e. train and test in the same language) and cross-lingual (i.e. test in a language not seen during training). The best monolingual models relied on language-dependent features, which do not generalise in the cross-lingual setting, while the best cross-lingual model used neural networks with multi-task learning. In this paper, we present monolingual and cross-lingual CWI models that perform as well as (or better than) most models submitted to the latest CWI Shared Task. We show that carefully selected features and simple learning models can achieve state-of-the-art performance, and result in strong baselines for future development in this area. Finally, we discuss how inconsistencies in the annotation of the data can explain some of the results obtained.

pdf bib
Proceedings of the Third Workshop on Structured Prediction for NLP
Andre Martins | Andreas Vlachos | Zornitsa Kozareva | Sujith Ravi | Gerasimos Lampouras | Vlad Niculae | Julia Kreutzer
Proceedings of the Third Workshop on Structured Prediction for NLP

pdf bib
Meta-Learning Improves Lifelong Relation Extraction
Abiola Obamuyide | Andreas Vlachos
Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)

Most existing relation extraction models assume a fixed set of relations and are unable to adapt to exploit newly available supervision data to extract new relations. In order to alleviate such problems, there is the need to develop approaches that make relation extraction models capable of continuous adaptation and learning. We investigate and present results for such an approach, based on a combination of ideas from lifelong learning and optimization-based meta-learning. We evaluate the proposed approach on two recent lifelong relation extraction benchmarks, and demonstrate that it markedly outperforms current state-of-the-art approaches.

pdf bib
HighRES: Highlight-based Reference-less Evaluation of Summarization
Hardy Hardy | Shashi Narayan | Andreas Vlachos
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

There has been substantial progress in summarization research enabled by the availability of novel, often large-scale, datasets and recent advances on neural network-based approaches. However, manual evaluation of the system generated summaries is inconsistent due to the difficulty the task poses to human non-expert readers. To address this issue, we propose a novel approach for manual evaluation, Highlight-based Reference-less Evaluation of Summarization (HighRES), in which summaries are assessed by multiple annotators against the source document via manually highlighted salient content in the latter. Thus summary assessment on the source document by human judges is facilitated, while the highlights can be used for evaluating multiple systems. To validate our approach we employ crowd-workers to augment with highlights a recently proposed dataset and compare two state-of-the-art systems. We demonstrate that HighRES improves inter-annotator agreement in comparison to using the source document directly, while they help emphasize differences among systems that would be ignored under other evaluation approaches.

pdf bib
Merge and Label: A Novel Neural Network Architecture for Nested NER
Joseph Fisher | Andreas Vlachos
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Named entity recognition (NER) is one of the best studied tasks in natural language processing. However, most approaches are not capable of handling nested structures which are common in many applications. In this paper we introduce a novel neural network architecture that first merges tokens and/or entities into entities forming nested structures, and then labels each of them independently. Unlike previous work, our merge and label approach predicts real-valued instead of discrete segmentation structures, which allow it to combine word and nested entity embeddings while maintaining differentiability. We evaluate our approach using the ACE 2005 Corpus, where it achieves state-of-the-art F1 of 74.6, further improved with contextual embeddings (BERT) to 82.4, an overall improvement of close to 8 F1 points over previous approaches trained on the same data. Additionally we compare it against BiLSTM-CRFs, the dominant approach for flat NER structures, demonstrating that its ability to predict nested structures does not impact performance in simpler cases.

pdf bib
Model-Agnostic Meta-Learning for Relation Classification with Limited Supervision
Abiola Obamuyide | Andreas Vlachos
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

In this paper we frame the task of supervised relation classification as an instance of meta-learning. We propose a model-agnostic meta-learning protocol for training relation classifiers to achieve enhanced predictive performance in limited supervision settings. During training, we aim to not only learn good parameters for classifying relations with sufficient supervision, but also learn model parameters that can be fine-tuned to enhance predictive performance for relations with limited supervision. In experiments conducted on two relation classification datasets, we demonstrate that the proposed meta-learning approach improves the predictive performance of two state-of-the-art supervised relation classification models.

2018

pdf bib
Topic or Style? Exploring the Most Useful Features for Authorship Attribution
Yunita Sari | Mark Stevenson | Andreas Vlachos
Proceedings of the 27th International Conference on Computational Linguistics

Approaches to authorship attribution, the task of identifying the author of a document, are based on analysis of individuals’ writing style and/or preferred topics. Although the problem has been widely explored, no previous studies have analysed the relationship between dataset characteristics and effectiveness of different types of features. This study carries out an analysis of four widely used datasets to explore how different types of features affect authorship attribution accuracy under varying conditions. The results of the analysis are applied to authorship attribution models based on both discrete and continuous representations. We apply the conclusions from our analysis to an extension of an existing approach to authorship attribution and outperform the prior state-of-the-art on two out of the four datasets used.

pdf bib
Automated Fact Checking: Task Formulations, Methods and Future Directions
James Thorne | Andreas Vlachos
Proceedings of the 27th International Conference on Computational Linguistics

The recently increased focus on misinformation has stimulated research in fact checking, the task of assessing the truthfulness of a claim. Research in automating this task has been conducted in a variety of disciplines including natural language processing, machine learning, knowledge representation, databases, and journalism. While there has been substantial progress, relevant papers and articles have been published in research communities that are often unaware of each other and use inconsistent terminology, thus impeding understanding and further progress. In this paper we survey automated fact checking research stemming from natural language processing and related disciplines, unifying the task formulations and methodologies across papers and authors. Furthermore, we highlight the use of evidence as an important distinguishing factor among them cutting across task formulations and methods. We conclude with proposing avenues for future NLP research on automated fact checking.

pdf bib
FEVER: a Large-scale Dataset for Fact Extraction and VERification
James Thorne | Andreas Vlachos | Christos Christodoulopoulos | Arpit Mittal
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

In this paper we introduce a new publicly available dataset for verification against textual sources, FEVER: Fact Extraction and VERification. It consists of 185,445 claims generated by altering sentences extracted from Wikipedia and subsequently verified without knowledge of the sentence they were derived from. The claims are classified as Supported, Refuted or NotEnoughInfo by annotators achieving 0.6841 in Fleiss kappa. For the first two classes, the annotators also recorded the sentence(s) forming the necessary evidence for their judgment. To characterize the challenge of the dataset presented, we develop a pipeline approach and compare it to suitably designed oracles. The best accuracy we achieve on labeling a claim accompanied by the correct evidence is 31.87%, while if we ignore the evidence we achieve 50.91%. Thus we believe that FEVER is a challenging testbed that will help stimulate progress on claim verification against textual sources.

pdf bib
Proceedings of the First Workshop on Fact Extraction and VERification (FEVER)
James Thorne | Andreas Vlachos | Oana Cocarascu | Christos Christodoulopoulos | Arpit Mittal
Proceedings of the First Workshop on Fact Extraction and VERification (FEVER)

pdf bib
The Fact Extraction and VERification (FEVER) Shared Task
James Thorne | Andreas Vlachos | Oana Cocarascu | Christos Christodoulopoulos | Arpit Mittal
Proceedings of the First Workshop on Fact Extraction and VERification (FEVER)

We present the results of the first Fact Extraction and VERification (FEVER) Shared Task. The task challenged participants to classify whether human-written factoid claims could be SUPPORTED or REFUTED using evidence retrieved from Wikipedia. We received entries from 23 competing teams, 19 of which scored higher than the previously published baseline. The best performing system achieved a FEVER score of 64.21%. In this paper, we present the results of the shared task and a summary of the systems, highlighting commonalities and innovations among participating systems.

pdf bib
Zero-shot Relation Classification as Textual Entailment
Abiola Obamuyide | Andreas Vlachos
Proceedings of the First Workshop on Fact Extraction and VERification (FEVER)

We consider the task of relation classification, and pose this task as one of textual entailment. We show that this formulation leads to several advantages, including the ability to (i) perform zero-shot relation classification by exploiting relation descriptions, (ii) utilize existing textual entailment models, and (iii) leverage readily available textual entailment datasets, to enhance the performance of relation classification systems. Our experiments show that the proposed approach achieves 20.16% and 61.32% in F1 zero-shot classification performance on two datasets, which further improved to 22.80% and 64.78% respectively with the use of conditional encoding.

pdf bib
Guided Neural Language Generation for Abstractive Summarization using Abstract Meaning Representation
Hardy Hardy | Andreas Vlachos
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Recent work on abstractive summarization has made progress with neural encoder-decoder architectures. However, such models are often challenged due to their lack of explicit semantic modeling of the source document and its summary. In this paper, we extend previous work on abstractive summarization using Abstract Meaning Representation (AMR) with a neural language generation stage which we guide using the source document. We demonstrate that this guidance improves summarization results by 7.4 and 10.5 points in ROUGE-2 using gold standard AMR parses and parses obtained from an off-the-shelf parser respectively. We also find that the summarization performance on later parses is 2 ROUGE-2 points higher than that of a well-established neural encoder-decoder approach trained on a larger dataset.

2017

pdf bib
Fake news stance detection using stacked ensemble of classifiers
James Thorne | Mingjie Chen | Giorgos Myrianthous | Jiashu Pu | Xiaoxuan Wang | Andreas Vlachos
Proceedings of the 2017 EMNLP Workshop: Natural Language Processing meets Journalism

Fake news has become a hotly debated topic in journalism. In this paper, we present our entry to the 2017 Fake News Challenge which models the detection of fake news as a stance classification task that finished in 11th place on the leader board. Our entry is an ensemble system of classifiers developed by students in the context of their coursework. We show how we used the stacking ensemble method for this purpose and obtained improvements in classification accuracy exceeding each of the individual models’ performance on the development data. Finally, we discuss aspects of the experimental setup of the challenge.

pdf bib
Sheffield at SemEval-2017 Task 9: Transition-based language generation from AMR.
Gerasimos Lampouras | Andreas Vlachos
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

This paper describes the submission by the University of Sheffield to the SemEval 2017 Abstract Meaning Representation Parsing and Generation task (SemEval 2017 Task 9, Subtask 2). We cast language generation from AMR as a sequence of actions (e.g., insert/remove/rename edges and nodes) that progressively transform the AMR graph into a dependency parse tree. This transition-based approach relies on the fact that an AMR graph can be considered structurally similar to a dependency tree, with a focus on content rather than function words. An added benefit to this approach is the greater amount of data we can take advantage of to train the parse-to-text linearizer. Our submitted run on the test data achieved a BLEU score of 3.32 and a Trueskill score of -22.04 on automatic and human evaluation respectively.

pdf bib
Continuous N-gram Representations for Authorship Attribution
Yunita Sari | Andreas Vlachos | Mark Stevenson
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

This paper presents work on using continuous representations for authorship attribution. In contrast to previous work, which uses discrete feature representations, our model learns continuous representations for n-gram features via a neural network jointly with the classification layer. Experimental results demonstrate that the proposed model outperforms the state-of-the-art on two datasets, while producing comparable results on the remaining two.

pdf bib
An Extensible Framework for Verification of Numerical Claims
James Thorne | Andreas Vlachos
Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics

In this paper we present our automated fact checking system demonstration which we developed in order to participate in the Fast and Furious Fact Check challenge. We focused on simple numerical claims such as “population of Germany in 2015 was 80 million” which comprised a quarter of the test instances in the challenge, achieving 68% accuracy. Our system extends previous work on semantic parsing and claim identification to handle temporal expressions and knowledge bases consisting of multiple tables, while relying solely on automatically generated training data. We demonstrate the extensible nature of our system by evaluating it on relations used in previous work. We make our system publicly available so that it can be used and extended by the community.

pdf bib
The SUMMA Platform Prototype
Renars Liepins | Ulrich Germann | Guntis Barzdins | Alexandra Birch | Steve Renals | Susanne Weber | Peggy van der Kreeft | Hervé Bourlard | João Prieto | Ondřej Klejch | Peter Bell | Alexandros Lazaridis | Alfonso Mendes | Sebastian Riedel | Mariana S. C. Almeida | Pedro Balage | Shay B. Cohen | Tomasz Dwojak | Philip N. Garner | Andreas Giefer | Marcin Junczys-Dowmunt | Hina Imran | David Nogueira | Ahmed Ali | Sebastião Miranda | Andrei Popescu-Belis | Lesly Miculicich Werlen | Nikos Papasarantopoulos | Abiola Obamuyide | Clive Jones | Fahim Dalvi | Andreas Vlachos | Yang Wang | Sibo Tong | Rico Sennrich | Nikolaos Pappas | Shashi Narayan | Marco Damonte | Nadir Durrani | Sameer Khurana | Ahmed Abdelali | Hassan Sajjad | Stephan Vogel | David Sheppey | Chris Hernon | Jeff Mitchell
Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics

We present the first prototype of the SUMMA Platform: an integrated platform for multilingual media monitoring. The platform contains a rich suite of low-level and high-level natural language processing technologies: automatic speech recognition of broadcast media, machine translation, automated tagging and classification of named entities, semantic parsing to detect relationships between entities, and automatic construction / augmentation of factual knowledge bases. Implemented on the Docker platform, it can easily be deployed, customised, and scaled to large volumes of incoming media streams.

pdf bib
Imitation learning for structured prediction in natural language processing
Andreas Vlachos | Gerasimos Lampouras | Sebastian Riedel
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Tutorial Abstracts

Imitation learning is a learning paradigm originally developed to learn robotic controllers from demonstrations by humans, e.g. autonomous flight from pilot demonstrations. Recently, algorithms for structured prediction were proposed under this paradigm and have been applied successfully to a number of tasks including syntactic dependency parsing, information extraction, coreference resolution, dynamic feature selection, semantic parsing and natural language generation. Key advantages are the ability to handle large output search spaces and to learn with non-decomposable loss functions. Our aim in this tutorial is to have a unified presentation of the various imitation algorithms for structure prediction, and show how they can be applied to a variety of NLP tasks.All material associated with the tutorial will be made available through https://sheffieldnlp.github.io/ImitationLearningTutorialEACL2017/.

2016

pdf bib
Stance Detection with Bidirectional Conditional Encoding
Isabelle Augenstein | Tim Rocktäschel | Andreas Vlachos | Kalina Bontcheva
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Timeline extraction using distant supervision and joint inference
Savelie Cornegruta | Andreas Vlachos
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Noise reduction and targeted exploration in imitation learning for Abstract Meaning Representation parsing
James Goodman | Andreas Vlachos | Jason Naradowsky
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
SHEF-MIME: Word-level Quality Estimation Using Imitation Learning
Daniel Beck | Andreas Vlachos | Gustavo Paetzold | Lucia Specia
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf bib
Emergent: a novel data-set for stance classification
William Ferreira | Andreas Vlachos
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Imitation learning for language generation from unaligned data
Gerasimos Lampouras | Andreas Vlachos
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Natural language generation (NLG) is the task of generating natural language from a meaning representation. Current rule-based approaches require domain-specific and manually constructed linguistic resources, while most machine-learning based approaches rely on aligned training data and/or phrase templates. The latter are needed to restrict the search space for the structured prediction task defined by the unaligned datasets. In this work we propose the use of imitation learning for structured prediction which learns an incremental model that handles the large search space by avoiding explicit enumeration of the outputs. We focus on the Locally Optimal Learning to Search framework which allows us to train against non-decomposable loss functions such as the BLEU or ROUGE scores while not assuming gold standard alignments. We evaluate our approach on three datasets using both automatic measures and human judgements and achieve results comparable to the state-of-the-art approaches developed for each of them.

pdf bib
USFD at SemEval-2016 Task 6: Any-Target Stance Detection on Twitter with Autoencoders
Isabelle Augenstein | Andreas Vlachos | Kalina Bontcheva
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf bib
UCL+Sheffield at SemEval-2016 Task 8: Imitation learning for AMR parsing with an alpha-bound
James Goodman | Andreas Vlachos | Jason Naradowsky
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

2015

pdf bib
Extracting Relations between Non-Standard Entities using Distant Supervision and Imitation Learning
Isabelle Augenstein | Andreas Vlachos | Diana Maynard
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
A Strong Lexical Matching Method for the Machine Comprehension Test
Ellery Smith | Nicola Greco | Matko Bošnjak | Andreas Vlachos
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Identification and Verification of Simple Claims about Statistical Properties
Andreas Vlachos | Sebastian Riedel
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Using word embedding for bio-event extraction
Chen Li | Runqing Song | Maria Liakata | Andreas Vlachos | Stephanie Seneff | Xiangrong Zhang
Proceedings of BioNLP 15

pdf bib
Dependency Recurrent Neural Language Models for Sentence Completion
Piotr Mirowski | Andreas Vlachos
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

pdf bib
Matrix and Tensor Factorization Methods for Natural Language Processing
Guillaume Bouchard | Jason Naradowsky | Sebastian Riedel | Tim Rocktäschel | Andreas Vlachos
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing: Tutorial Abstracts

2014

pdf bib
A New Corpus and Imitation Learning Framework for Context-Dependent Semantic Parsing
Andreas Vlachos | Stephen Clark
Transactions of the Association for Computational Linguistics, Volume 2

Semantic parsing is the task of translating natural language utterances into a machine-interpretable meaning representation. Most approaches to this task have been evaluated on a small number of existing corpora which assume that all utterances must be interpreted according to a database and typically ignore context. In this paper we present a new, publicly available corpus for context-dependent semantic parsing. The MRL used for the annotation was designed to support a portable, interactive tourist information system. We develop a semantic parser for this corpus by adapting the imitation learning algorithm DAgger without requiring alignment information during training. DAgger improves upon independently trained classifiers by 9.0 and 4.8 points in F-score on the development and test sets respectively.

pdf bib
Proceedings of the EACL 2014 Workshop on Dialogue in Motion
Tiphaine Dalmas | Jana Götze | Joakim Gustafson | Srinivasan Janarthanam | Jan Kleindienst | Christian Mueller | Amanda Stent | Andreas Vlachos
Proceedings of the EACL 2014 Workshop on Dialogue in Motion

pdf bib
Fact Checking: Task definition and dataset construction
Andreas Vlachos | Sebastian Riedel
Proceedings of the ACL 2014 Workshop on Language Technologies and Computational Social Science

pdf bib
Application-Driven Relation Extraction with Limited Distant Supervision
Andreas Vlachos | Stephen Clark
Proceedings of the First AHA!-Workshop on Information Discovery in Text

2013

pdf bib
Dependency Language Models for Sentence Completion
Joseph Gubbins | Andreas Vlachos
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
Semantic Parsing as Machine Translation
Jacob Andreas | Andreas Vlachos | Stephen Clark
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2011

pdf bib
Search-based Structured Prediction applied to Biomedical Event Extraction
Andreas Vlachos | Mark Craven
Proceedings of the Fifteenth Conference on Computational Natural Language Learning

pdf bib
Biomedical Event Extraction from Abstracts and Full Papers using Search-based Structured Prediction
Andreas Vlachos | Mark Craven
Proceedings of BioNLP Shared Task 2011 Workshop

pdf bib
Evaluating unsupervised learning for natural language processing tasks
Andreas Vlachos
Proceedings of the First workshop on Unsupervised Learning in NLP

2010

pdf bib
Two Strong Baselines for the BioNLP 2009 Event Extraction Task
Andreas Vlachos
Proceedings of the 2010 Workshop on Biomedical Natural Language Processing

pdf bib
Active Learning for Constrained Dirichlet Process Mixture Models
Andreas Vlachos | Zoubin Ghahramani | Ted Briscoe
Proceedings of the 2010 Workshop on GEometrical Models of Natural Language Semantics

pdf bib
Detecting Speculative Language Using Syntactic Dependencies and Logistic Regression
Andreas Vlachos | Mark Craven
Proceedings of the Fourteenth Conference on Computational Natural Language Learning – Shared Task

2009

pdf bib
The infinite HMM for unsupervised PoS tagging
Jurgen Van Gael | Andreas Vlachos | Zoubin Ghahramani
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
Unsupervised and Constrained Dirichlet Process Mixture Models for Verb Clustering
Andreas Vlachos | Anna Korhonen | Zoubin Ghahramani
Proceedings of the Workshop on Geometrical Models of Natural Language Semantics

pdf bib
Biomedical Event Extraction without Training Data
Andreas Vlachos | Paula Buttery | Diarmuid Ó Séaghdha | Ted Briscoe
Proceedings of the BioNLP 2009 Workshop Companion Volume for Shared Task

2007

pdf bib
Evaluating and combining and biomedical named entity recognition systems
Andreas Vlachos
Biological, translational, and clinical language processing

2006

pdf bib
Active Annotation
Andreas Vlachos
Proceedings of the Workshop on Adaptive Text Extraction and Mining (ATEM 2006)

pdf bib
Bootstrapping and Evaluating Named Entity Recognition in the Biomedical Domain
Andreas Vlachos | Caroline Gasperin
Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology

Search
Co-authors