Simone Teufel - ACL Anthology

Simone Teufel

2025

Minimal Pair-Based Evaluation of Code-Switching
Igor Sterner | Simone Teufel
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

There is a lack of an evaluation methodology that estimates the extent to which large language models (LLMs) use code-switching (CS) in the same way as bilinguals. Existing methods do not have wide language coverage, fail to account for the diverse range of CS phenomena, or do not scale. We propose an intervention based on minimal pairs of CS. Each minimal pair contains one naturally occurring CS sentence and one minimally manipulated variant. We collect up to 1,000 such pairs each for 11 language pairs. Our human experiments show that, for every language pair, bilinguals consistently prefer the naturally occurring CS sentence. Meanwhile our experiments with current LLMs show that the larger the model, the more consistently it assigns higher probability to the naturally occurring CS sentence than to the variant. In accordance with theoretical claims, the largest probability differences arise in those pairs where the manipulated material consisted of closed-class words.

Code-Switching and Syntax: A Large-Scale Experiment
Igor Sterner | Simone Teufel
Findings of the Association for Computational Linguistics: ACL 2025

The theoretical code-switching (CS) literature provides numerous pointwise investigations that aim to explain patterns in CS, i.e. why bilinguals switch language in certain positions in a sentence more often than in others. A resulting consensus is that CS can be explained by the syntax of the contributing languages. There is however no large-scale, multi-language, cross-phenomena experiment that tests this claim. When designing such an experiment, we need to make sure that the system that is predicting where bilinguals tend to switch has access only to syntactic information. We provide such an experiment here. Results show that syntax alone is sufficient for an automatic system to distinguish between sentences in minimal pairs of CS, to the same degree as bilingual humans. Furthermore, the learnt syntactic patterns generalise well to unseen language pairs.

Misalignment of Semantic Relation Knowledge between WordNet and Human Intuition
Zhihan Cao | Hiroaki Yamada | Simone Teufel | Takenobu Tokunaga
Proceedings of the 13th Global Wordnet Conference

2024

Computational Modelling of Undercuts in Real-world Arguments
Yuxiao Ye | Simone Teufel
Proceedings of the 11th Workshop on Argument Mining (ArgMining 2024)

Argument Mining (AM) is the task of automatically analysing arguments, such that the unstructured information contained in them is converted into structured representations. Undercut is a unique structure in arguments, as it challenges the relationship between a premise and a claim, unlike direct attacks which challenge the claim or the premise itself. Undercut is also an important counterargument device as it often reflects the value of arguers. However, undercuts have not received the attention in the filed of AM they should have — there is neither much corpus data about undercuts, nor an existing AM model that can automatically recognise them. In this paper, we present a real-world dataset of arguments with explicitly annotated undercuts, and the first computational model that is able to recognise them. The dataset consists of 400 arguments, containing 326 undercuts. On this dataset, our approach beats a strong baseline in undercut recognition, with F₁ = 38.8%, which is comparable to the performance on recognising direct attacks. We also conduct experiments on a benchmark dataset containing no undercuts, and prove that our approach is as good as the state of the art in terms of recognising the overall structure of arguments. Our work pioneers the systematic analysis and computational modelling of undercuts in real-world arguments, setting a foundation for future research in the role of undercuts in the dynamics of argumentation.

ChainNet: Structured Metaphor and Metonymy in WordNet
Rowan Hall Maudslay | Simone Teufel | Francis Bond | James Pustejovsky
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

The senses of a word exhibit rich internal structure. In a typical lexicon, this structure is overlooked: A word’s senses are encoded as a list, without inter-sense relations. We present ChainNet, a lexical resource which for the first time explicitly identifies these structures, by expressing how senses in the Open English Wordnet are derived from one another. In ChainNet, every nominal sense of a word is either connected to another sense by metaphor or metonymy, or is disconnected (in the case of homonymy). Because WordNet senses are linked to resources which capture information about their meaning, ChainNet represents the first dataset of grounded metaphor and metonymy.

Scansion-based Lyrics Generation
Yiwen Chen | Simone Teufel
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

We aim to generate lyrics for Mandarin songs with a good match between the melody and the tonal contour of the lyrics. Our solution relies on mBart, treating lyrics generation as a translation problem, but rather than translating directly from the melody as is common, our novelty in this paper is that we generate from scansion as an intermediate contour representation that can fit a given melody. One of the advantages of our solution is that it does not require a parallel melody-lyrics dataset. We also present a thorough automatic evaluation of our system against competitors, using several new evaluation metrics. These measure intelligibility, fit to melody, and use proxies for quantifying creativity (variation to other songs created by the same system in different settings, semantic similarity to keywords given to the system, perplexity). When comparing different implementations of scansion to competitor systems, a varied picture emerges. Our best system outperforms all others in lyric-melody fit and is in the top group of systems for two of the creativity metrics (variation and perplexity), overshadowing two large language models (LLM) specialised to this task.

Semantic Map-based Generation of Navigation Instructions
Chengzu Li | Chao Zhang | Simone Teufel | Rama Sanand Doddipatla | Svetlana Stoyanchev
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

We are interested in the generation of navigation instructions, either in their own right or as training material for robotic navigation task. In this paper, we propose a new approach to navigation instruction generation by framing the problem as an image captioning task using semantic maps as visual input. Conventional approaches employ a sequence of panorama images to generate navigation instructions. Semantic maps abstract away from visual details and fuse the information in multiple panorama images into a single top-down representation, thereby reducing computational complexity to process the input. We present a benchmark dataset for instruction generation using semantic maps, propose an initial model and ask human subjects to manually assess the quality of generated instructions. Our initial investigations show promise in using semantic maps for instruction generation instead of a sequence of panorama images, but there is vast scope for improvement. We release the code for data preparation and model training at https://github.com/chengzu-li/VLGen.

The Ethics of Automating Legal Actors
Josef Valvoda | Alec Thompson | Ryan Cotterell | Simone Teufel
Transactions of the Association for Computational Linguistics, Volume 12

The introduction of large public legal datasets has brought about a renaissance in legal NLP. Many of these datasets are composed of legal judgments—the product of judges deciding cases. Since ML algorithms learn to model the data they are trained on, several legal NLP models are models of judges. While some have argued for the automation of judges, in this position piece, we argue that automating the role of the judge raises difficult ethical challenges, in particular for common law legal systems. Our argument follows from the social role of the judge in actively shaping the law, rather than merely applying it. Since current NLP models are too far away from having the facilities necessary for this task, they should not be used to automate judges. Furthermore, even in the case that the models could achieve human-level capabilities, there would still be remaining ethical concerns inherent in the automation of the legal process.

2023

TongueSwitcher: Fine-Grained Identification of German-English Code-Switching
Igor Sterner | Simone Teufel
Proceedings of the 6th Workshop on Computational Approaches to Linguistic Code-Switching

This paper contributes to German-English code-switching research. We provide the largest corpus of naturally occurring German-English code-switching, where English is included in German text, and two methods for code-switching identification. The first method is rule-based, using wordlists and morphological processing. We use this method to compile a corpus of 25.6M tweets employing German-English code-switching. In our second method, we continue pretraining of a neural language model on this corpus and classify tokens based on embeddings from this language model. Our systems establish SoTA on our new corpus and an existing German-English code-switching benchmark. In particular, we systematically study code-switching for language-ambiguous words which can only be resolved in context, and morphologically mixed words consisting of both English and German morphemes. We distribute both corpora and systems to the research community.

On the Role of Negative Precedent in Legal Outcome Prediction
Josef Valvoda | Ryan Cotterell | Simone Teufel
Transactions of the Association for Computational Linguistics, Volume 11

Every legal case sets a precedent by developing the law in one of the following two ways. It either expands its scope, in which case it sets positive precedent, or it narrows it, in which case it sets negative precedent. Legal outcome prediction, the prediction of positive outcome, is an increasingly popular task in AI. In contrast, we turn our focus to negative outcomes here, and introduce a new task of negative outcome prediction. We discover an asymmetry in existing models’ ability to predict positive and negative outcomes. Where the state-of-the-art outcome prediction model we used predicts positive outcomes at 75.06 F1, it predicts negative outcomes at only 10.09 F1, worse than a random baseline. To address this performance gap, we develop two new models inspired by the dynamics of a court process. Our first model significantly improves positive outcome prediction score to 77.15 F1 and our second model more than doubles the negative outcome prediction performance to 24.01 F1. Despite this improvement, shifting focus to negative outcomes reveals that there is still much room for improvement for outcome prediction models. https://github.com/valvoda/Negative-Precedent-in-Legal-Outcome-Prediction

2022

Metaphorical Polysemy Detection: Conventional Metaphor Meets Word Sense Disambiguation
Rowan Hall Maudslay | Simone Teufel
Proceedings of the 29th International Conference on Computational Linguistics

Linguists distinguish between novel and conventional metaphor, a distinction which the metaphor detection task in NLP does not take into account. Instead, metaphoricity is formulated as a property of a token in a sentence, regardless of metaphor type. In this paper, we investigate the limitations of treating conventional metaphors in this way, and advocate for an alternative which we name ‘metaphorical polysemy detection’ (MPD). In MPD, only conventional metaphoricity is treated, and it is formulated as a property of word senses in a lexicon. We develop the first MPD model, which learns to identify conventional metaphors in the English WordNet. To train it, we present a novel training procedure that combines metaphor detection with ‘word sense disambiguation’ (WSD). For evaluation, we manually annotate metaphor in two subsets of WordNet. Our model significantly outperforms a strong baseline based on a state-of-the-art metaphor detection model, attaining an ROC-AUC score of .78 (compared to .65) on one of the sets. Additionally, when paired with a WSD model, our approach outperforms a state-of-the-art metaphor detection model at identifying conventional metaphors in text (.659 F1 compared to .626).

Identifying relevant common sense information in knowledge graphs
Guy Aglionby | Simone Teufel
Proceedings of the First Workshop on Commonsense Representation and Reasoning (CSRR 2022)

Knowledge graphs are often used to store common sense information that is useful for various tasks. However, the extraction of contextually-relevant knowledge is an unsolved problem, and current approaches are relatively simple. Here we introduce a triple selection method based on a ranking model and find that it improves question answering accuracy over existing methods. We additionally investigate methods to ensure that extracted triples form a connected graph. Graph connectivity is important for model interpretability, as paths are frequently used as explanations for the reasoning that connects question and answer.

Faithful Knowledge Graph Explanations in Commonsense Question Answering
Guy Aglionby | Simone Teufel
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Knowledge graphs are commonly used as sources of information in commonsense question answering, and can also be used to express explanations for the model’s answer choice. A common way of incorporating facts from the graph is to encode them separately from the question, and then combine the two representations to select an answer. In this paper, we argue that highly faithful graph-based explanations cannot be extracted from existing models of this type. Such explanations will not include reasoning done by the transformer encoding the question, so will be incomplete. We confirm this theory with a novel proxy measure for faithfulness and propose two architecture changes to address the problem. Our findings suggest a path forward for developing architectures for faithful graph-based explanations.

Homonymy Information for English WordNet
Rowan Hall Maudslay | Simone Teufel
Proceedings of Globalex Workshop on Linked Lexicography within the 13th Language Resources and Evaluation Conference

A widely acknowledged shortcoming of WordNet is that it lacks a distinction between word meanings which are systematically related (polysemy), and those which are coincidental (homonymy). Several previous works have attempted to fill this gap, by inferring this information using computational methods. We revisit this task, and exploit recent advances in language modelling to synthesise homonymy annotation for Princeton WordNet. Previous approaches treat the problem using clustering methods; by contrast, our method works by linking WordNet to the Oxford English Dictionary, which contains the information we need. To perform this alignment, we pair definitions based on their proximity in an embedding space produced by a Transformer model. Despite the simplicity of this approach, our best model attains an F1 of .97 on an evaluation set that we annotate. The outcome of our work is a high-quality homonymy annotation layer for Princeton WordNet, which we release.

Problem-solving Recognition in Scientific Text
Kevin Heffernan | Simone Teufel
Proceedings of the Thirteenth Language Resources and Evaluation Conference

As far back as Aristotle, problems and solutions have been recognised as a core pattern of thought, and in particular of the scientific method. In this work, we present the novel task of problem-solving recognition in scientific text. Previous work on problem-solving either is not computational, is not adapted to scientific text, or has been narrow in scope. This work provides a new annotation scheme of problem-solving tailored to the scientific domain. We validate the scheme with an annotation study, and model the task using state-of-the-art baselines such as a Neural Relational Topic Model. The agreement study indicates that our annotation is reliable, and results from modelling show that problem-solving expressions in text can be recognised to a high degree of accuracy.

2021

Multi-task and Multi-corpora Training Strategies to Enhance Argumentative Sentence Linking Performance
Jan Wira Gotama Putra | Simone Teufel | Takenobu Tokunaga
Proceedings of the 8th Workshop on Argument Mining

Argumentative structure prediction aims to establish links between textual units and label the relationship between them, forming a structured representation for a given input text. The former task, linking, has been identified by earlier works as particularly challenging, as it requires finding the most appropriate structure out of a very large search space of possible link combinations. In this paper, we improve a state-of-the-art linking model by using multi-task and multi-corpora training strategies. Our auxiliary tasks help the model to learn the role of each sentence in the argumentative structure. Combining multi-corpora training with a selective sampling strategy increases the training data size while ensuring that the model still learns the desired target distribution well. Experiments on essays written by English-as-a-foreign-language learners show that both strategies significantly improve the model’s performance; for instance, we observe a 15.8% increase in the F1-macro for individual link predictions.

Parsing Argumentative Structure in English-as-Foreign-Language Essays
Jan Wira Gotama Putra | Simone Teufel | Takenobu Tokunaga
Proceedings of the 16th Workshop on Innovative Use of NLP for Building Educational Applications

This paper presents a study on parsing the argumentative structure in English-as-foreign-language (EFL) essays, which are inherently noisy. The parsing process consists of two steps, linking related sentences and then labelling their relations. We experiment with several deep learning architectures to address each task independently. In the sentence linking task, a biaffine model performed the best. In the relation labelling task, a fine-tuned BERT model performed the best. Two sentence encoders are employed, and we observed that non-fine-tuning models generally performed better when using Sentence-BERT as opposed to BERT encoder. We trained our models using two types of parallel texts: original noisy EFL essays and those improved by annotators, then evaluate them on the original essays. The experiment shows that an end-to-end in-domain system achieved an accuracy of .341. On the other hand, the cross-domain system achieved 94% performance of the in-domain system. This signals that well-written texts can also be useful to train argument mining system for noisy texts.

End-to-End Argument Mining as Biaffine Dependency Parsing
Yuxiao Ye | Simone Teufel
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Non-neural approaches to argument mining (AM) are often pipelined and require heavy feature-engineering. In this paper, we propose a neural end-to-end approach to AM which is based on dependency parsing, in contrast to the current state-of-the-art which relies on relation extraction. Our biaffine AM dependency parser significantly outperforms the state-of-the-art, performing at F1 = 73.5% for component identification and F1 = 46.4% for relation identification. One of the advantages of treating AM as biaffine dependency parsing is the simple neural architecture that results. The idea of treating AM as dependency parsing is not new, but has previously been abandoned as it was lagging far behind the state-of-the-art. In a thorough analysis, we investigate the factors that contribute to the success of our model: the biaffine model itself, our representation for the dependency structure of arguments, different encoders in the biaffine model, and syntactic information additionally fed to the model. Our work demonstrates that dependency parsing for AM, an overlooked idea from the past, deserves more attention in the future.

A surprisal–duration trade-off across and within the world’s languages
Tiago Pimentel | Clara Meister | Elizabeth Salesky | Simone Teufel | Damián Blasi | Ryan Cotterell
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

While there exist scores of natural languages, each with its unique features and idiosyncrasies, they all share a unifying theme: enabling human communication. We may thus reasonably predict that human cognition shapes how these languages evolve and are used. Assuming that the capacity to process information is roughly constant across human populations, we expect a surprisal–duration trade-off to arise both across and within languages. We analyse this trade-off using a corpus of 600 languages and, after controlling for several potential confounds, we find strong supporting evidence in both settings. Specifically, we find that, on average, phones are produced faster in languages where they are less surprising, and vice versa. Further, we confirm that more surprising phones are longer, on average, in 319 languages out of the 600. We thus conclude that there is strong evidence of a surprisal–duration trade-off in operation, both across and within the world’s languages.

On Homophony and Rényi Entropy
Tiago Pimentel | Clara Meister | Simone Teufel | Ryan Cotterell
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Homophony’s widespread presence in natural languages is a controversial topic. Recent theories of language optimality have tried to justify its prevalence, despite its negative effects on cognitive processing time, e.g., Piantadosi et al. (2012) argued homophony enables the reuse of efficient wordforms and is thus beneficial for languages. This hypothesis has recently been challenged by Trott and Bergen (2020), who posit that good wordforms are more often homophonous simply because they are more phonotactically probable. In this paper, we join in on the debate. We first propose a new information-theoretic quantification of a language’s homophony: the sample Rényi entropy. Then, we use this quantification to revisit Trott and Bergen’s claims. While their point is theoretically sound, a specific methodological issue in their experiments raises doubts about their results. After addressing this issue, we find no clear pressure either towards or against homophony—a much more nuanced result than either Piantadosi et al.’s or Trott and Bergen’s findings.

Synthetic Textual Features for the Large-Scale Detection of Basic-level Categories in English and Mandarin
Yiwen Chen | Simone Teufel
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Basic-level categories (BLC) are an important psycholinguistic concept introduced by Rosch et al. (1976); they are defined as the most inclusive categories for which a concrete mental image of the category as a whole can be formed, and also as those categories which are acquired early in life. Rosch’s original algorithm for detecting BLC (called cue-validity) is based on the availability of semantic features such as “has tail” for “cat”, and has remained untested at large. An at-scale algorithm for the automatic determination of BLC exists, but it operates without Rosch-style semantic features, and is thus unable to verify Rosch’s hypothesis. We present the first method for the detection of BLC at scale that makes use of Rosch-style semantic features. For both English and Mandarin, we test three methods of generating such features for any synset within Wordnet (WN): extraction of textual features from Wikipedia pages, Distributional Memory (DM) and BART. The best of our methods outperforms the current SoA in BLC detection, with an accuracy of English BLC detection of 75.0%, and of Mandarin BLC detection 80.7% on a test set. When applied to all of WordNet, our model predicts that 1,118 synsets in English Wordnet (1.4%) are BLC, far fewer than existing methods, and with a precision improvement of over 200% over these. As well as confirming the usefulness of Rosch’s cue validity algorithm, we also developed and evaluated our own new indicator for BLC, which models the fact that BLC features tend to be BLC themselves.

What About the Precedent: An Information-Theoretic Analysis of Common Law
Josef Valvoda | Tiago Pimentel | Niklas Stoehr | Ryan Cotterell | Simone Teufel
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

In common law, the outcome of a new case is determined mostly by precedent cases, rather than by existing statutes. However, how exactly does the precedent influence the outcome of a new case? Answering this question is crucial for guaranteeing fair and consistent judicial decision-making. We are the first to approach this question computationally by comparing two longstanding jurisprudential views; Halsbury’s, who believes that the arguments of the precedent are the main determinant of the outcome, and Goodhart’s, who believes that what matters most is the precedent’s facts. We base our study on the corpus of legal cases from the European Court of Human Rights (ECtHR), which allows us to access not only the case itself, but also cases cited in the judges’ arguments (i.e. the precedent cases). Taking an information-theoretic view, and modelling the question as a case out-come classification task, we find that the precedent’s arguments share 0.38 nats of information with the case’s outcome, whereas precedent’s facts only share 0.18 nats of information (i.e.,58% less); suggesting Halsbury’s view may be more accurate in this specific court. We found however in a qualitative analysis that there are specific statues where Goodhart’s view dominates, and present some evidence these are the ones where the legal concept at hand is less straightforward.

2020

A Corpus of Very Short Scientific Summaries
Yifan Chen | Tamara Polajnar | Colin Batchelor | Simone Teufel
Proceedings of the 24th Conference on Computational Natural Language Learning

We present a new summarisation task, taking scientific articles and producing journal table-of-contents entries in the chemistry domain. These are one- or two-sentence author-written summaries that present the key findings of a paper. This is a first look at this summarisation task with an open access publication corpus consisting of titles and abstracts, as input texts, and short author-written advertising blurbs, as the ground truth. We introduce the dataset and evaluate it with state-of-the-art summarisation methods.

Metaphor Detection using Context and Concreteness
Rowan Hall Maudslay | Tiago Pimentel | Ryan Cotterell | Simone Teufel
Proceedings of the Second Workshop on Figurative Language Processing

We report the results of our system on the Metaphor Detection Shared Task at the Second Workshop on Figurative Language Processing 2020. Our model is an ensemble, utilising contextualised and static distributional semantic representations, along with word-type concreteness ratings. Using these features, it predicts word metaphoricity with a deep multi-layer perceptron. We are able to best the state-of-the-art from the 2018 Shared Task by an average of 8.0% F1, and finish fourth in both sub-tasks in which we participate.

TIARA: A Tool for Annotating Discourse Relations and Sentence Reordering
Jan Wira Gotama Putra | Simone Teufel | Kana Matsumura | Takenobu Tokunaga
Proceedings of the Twelfth Language Resources and Evaluation Conference

This paper introduces TIARA, a new publicly available web-based annotation tool for discourse relations and sentence reordering. Annotation tasks such as these, which are based on relations between large textual objects, are inherently hard to visualise without either cluttering the display and/or confusing the annotators. TIARA deals with the visual complexity during the annotation process by systematically simplifying the layout, and by offering interactive visualisation, including coloured links, indentation, and dual-view. TIARA’s text view allows annotators to focus on the analysis of logical sequencing between sentences. A separate tree view allows them to review their analysis in terms of the overall discourse structure. The dual-view gives it an edge over other discourse annotation tools and makes it particularly attractive as an educational tool (e.g., for teaching students how to argue more effectively). As it is based on standard web technologies and can be easily customised to other annotation schemes, it can be easily used by anybody. Apart from the project it was originally designed for, in which hundreds of texts were annotated by three annotators, TIARA has already been adopted by a second discourse annotation study, which uses it in the teaching of argumentation.

2019

It’s All in the Name: Mitigating Gender Bias with Name-Based Counterfactual Data Substitution
Rowan Hall Maudslay | Hila Gonen | Ryan Cotterell | Simone Teufel
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

This paper treats gender bias latent in word embeddings. Previous mitigation attempts rely on the operationalisation of gender bias as a projection over a linear subspace. An alternative approach is Counterfactual Data Augmentation (CDA), in which a corpus is duplicated and augmented to remove bias, e.g. by swapping all inherently-gendered words in the copy. We perform an empirical comparison of these approaches on the English Gigaword and Wikipedia, and find that whilst both successfully reduce direct bias and perform well in tasks which quantify embedding quality, CDA variants outperform projection-based methods at the task of drawing non-biased gender analogies by an average of 19% across both corpora. We propose two improvements to CDA: Counterfactual Data Substitution (CDS), a variant of CDA in which potentially biased text is randomly substituted to avoid duplication, and the Names Intervention, a novel name-pairing technique that vastly increases the number of words being treated. CDA/S with the Names Intervention is the only approach which is able to mitigate indirect gender bias: following debiasing, previously biased words are significantly less clustered according to gender (cluster purity is reduced by 49%), thus improving on the state-of-the-art for bias mitigation.

2018

Variable Typing: Assigning Meaning to Variables in Mathematical Text
Yiannos Stathopoulos | Simon Baker | Marek Rei | Simone Teufel
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Information about the meaning of mathematical variables in text is useful in NLP/IR tasks such as symbol disambiguation, topic modeling and mathematical information retrieval (MIR). We introduce variable typing, the task of assigning one mathematical type (multi-word technical terms referring to mathematical concepts) to each variable in a sentence of mathematical text. As part of this work, we also introduce a new annotated data set composed of 33,524 data points extracted from scientific documents published on arXiv. Our intrinsic evaluation demonstrates that our data set is sufficient to successfully train and evaluate current classifiers from three different model architectures. The best performing model is evaluated on an extrinsic task: MIR, by producing a typed formula index. Our results show that the best performing MIR models make use of our typed index, compared to a formula index only containing raw symbols, thereby demonstrating the usefulness of variable typing.

2017

Annotation of argument structure in Japanese legal documents
Hiroaki Yamada | Simone Teufel | Takenobu Tokunaga
Proceedings of the 4th Workshop on Argument Mining

We propose a method for the annotation of Japanese civil judgment documents, with the purpose of creating flexible summaries of these. The first step, described in the current paper, concerns content selection, i.e., the question of which material should be extracted initially for the summary. In particular, we utilize the hierarchical argument structure of the judgment documents. Our main contributions are a) the design of an annotation scheme that stresses the connection between legal points (called issue topics) and argument structure, b) an adaptation of rhetorical status to suit the Japanese legal system and c) the definition of a linked argument structure based on legal sub-arguments. In this paper, we report agreement between two annotators on several aspects of the overall task.

2016

A Proposition-Based Abstractive Summariser
Yimai Fang | Haoyue Zhu | Ewa Muszyńska | Alexander Kuhnle | Simone Teufel
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Abstractive summarisation is not yet common amongst today’s deployed and research systems. Most existing systems either extract sentences or compress individual sentences. In this paper, we present a summariser that works by a different paradigm. It is a further development of an existing summariser that has an incremental, proposition-based content selection process but lacks a natural language (NL) generator for the final output. Using an NL generator, we can now produce the summary text to directly reflect the selected propositions. Our evaluation compares textual quality of our system to the earlier preliminary output method, and also uses ROUGE to compare to various summarisers that use the traditional method of sentence extraction, followed by compression. Our results suggest that cutting out the middle-man of sentence extraction can lead to better abstractive summaries.

Mathematical Information Retrieval based on Type Embeddings and Query Expansion
Yiannos Stathopoulos | Simone Teufel
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

We present an approach to mathematical information retrieval (MIR) that exploits a special kind of technical terminology, referred to as a mathematical type. In this paper, we present and evaluate a type detection mechanism and show its positive effect on the retrieval of research-level mathematics. Our best model, which performs query expansion with a type-aware embedding space, strongly outperforms standard IR models with state-of-the-art query expansion (vector space-based and language modelling-based), on a relatively new corpus of research-level queries.

Unsupervised Timeline Generation for Wikipedia History Articles
Sandro Bauer | Simone Teufel
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

Solving the AL Chicken-and-Egg Corpus and Model Problem: Model-free Active Learning for Phenomena-driven Corpus Construction
Dain Kaplan | Neil Rubens | Simone Teufel | Takenobu Tokunaga
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Active learning (AL) is often used in corpus construction (CC) for selecting “informative” documents for annotation. This is ideal for focusing annotation efforts when all documents cannot be annotated, but has the limitation that it is carried out in a closed-loop, selecting points that will improve an existing model. For phenomena-driven and exploratory CC, the lack of existing-models and specific task(s) for using it make traditional AL inapplicable. In this paper we propose a novel method for model-free AL utilising characteristics of phenomena for applying AL to select documents for annotation. The method can also supplement traditional closed-loop AL-based CC to extend the utility of the corpus created beyond a single task. We introduce our tool, MOVE, and show its potential with a real world case-study.

Improving Argument Overlap for Proposition-Based Summarisation
Yimai Fang | Simone Teufel
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2015

Retrieval of Research-level Mathematical Information Needs: A Test Collection and Technical Terminology Experiment
Yiannos Stathopoulos | Simone Teufel
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

A Methodology for Evaluating Timeline Generation Algorithms based on Deep Semantic Units
Sandro Bauer | Simone Teufel
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

2014

Unsupervised learning of rhetorical structure with un-topic models
Diarmuid Ó Séaghdha | Simone Teufel
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

Topical PageRank: A Model of Scientific Expertise for Bibliographic Search
James Jardine | Simone Teufel
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

A Summariser based on Human Memory Limitations and Lexical Competition
Yimai Fang | Simone Teufel
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

Resolving Coreferent and Associative Noun Phrases in Scientific Text
Ina Roesiger | Simone Teufel
Proceedings of the Student Research Workshop at the 14th Conference of the European Chapter of the Association for Computational Linguistics

2013

Statistical Metaphor Processing
Ekaterina Shutova | Simone Teufel | Anna Korhonen
Computational Linguistics, Volume 39, Issue 2 - June 2013

2012

Context-Enhanced Citation Sentiment Detection
Awais Athar | Simone Teufel
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Detection of Implicit Citations for Sentiment Detection
Awais Athar | Simone Teufel
Proceedings of the Workshop on Detecting Structure in Scholarly Discourse

2010

Metaphor Corpus Annotated for Source - Target Domain Mappings
Ekaterina Shutova | Simone Teufel
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Besides making our thoughts more vivid and filling our communication with richer imagery, metaphor also plays an important structural role in our cognition. Although there is a consensus in the linguistics and NLP research communities that the phenomenon of metaphor is not restricted to similarity-based extensions of meanings of isolated words, but rather involves reconceptualization of a whole area of experience (target domain) in terms of another (source domain), there still has been no proposal for a comprehensive procedure for annotation of cross-domain mappings. However, a corpus annotated for conceptual mappings could provide a new starting point for both linguistic and cognitive experiments. The annotation scheme we present in this paper is a step towards filling this gap. We test our procedure in an experimental setting involving multiple annotators and estimate their agreement on the task. The associated corpus annotated for source ― target domain mappings will be publicly available.

Corpora for the Conceptualisation and Zoning of Scientific Papers
Maria Liakata | Simone Teufel | Advaith Siddharthan | Colin Batchelor
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

We present two complementary annotation schemes for sentence based annotation of full scientific papers, CoreSC and AZ-II, applied to primary research articles in chemistry. AZ-II is the extension of AZ for chemistry papers. AZ has been shown to have been reliably annotated by independent human coders and useful for various information access tasks. Like AZ, AZ-II follows the rhetorical structure of a scientific paper and the knowledge claims made by the authors. The CoreSC scheme takes a different view of scientific papers, treating them as the humanly readable representations of scientific investigations. It seeks to retrieve the structure of the investigation from the paper as generic high-level Core Scientific Concepts (CoreSC). CoreSCs have been annotated by 16 chemistry experts over a total of 265 full papers in physical chemistry and biochemistry. We describe the differences and similarities between the two schemes in detail and present the two corpora produced using each scheme. There are 36 shared papers in the corpora, which allows us to quantitatively compare aspects of the annotation schemes. We show the correlation between the two schemes, their strengths and weeknesses and discuss the benefits of combining a rhetorical based analysis of the papers with a content-based one.

2009

Towards Domain-Independent Argumentative Zoning: Evidence from Chemistry and Computational Linguistics
Simone Teufel | Advaith Siddharthan | Colin Batchelor
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries (NLPIR4DL)
Min-Yen Kan | Simone Teufel
Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries (NLPIR4DL)

2008

Language Resources and Chemical Informatics
C.J. Rupp | Ann Copestake | Peter Corbett | Peter Murray-Rust | Advaith Siddharthan | Simone Teufel | Benjamin Waldron
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Chemistry research papers are a primary source of information about chemistry, as in any scientific field. The presentation of the data is, predominantly, unstructured information, and so not immediately susceptible to processes developed within chemical informatics for carrying out chemistry research by information processing techniques. At one level, extracting the relevant information from research papers is a text mining task, requiring both extensive language resources and specialised knowledge of the subject domain. However, the papers also encode information about the way the research is conducted and the structure of the field itself. Applying language technology to research papers in chemistry can facilitate eScience on several different levels. The SciBorg project sets out to provide an extensive, analysed corpus of published chemistry research. This relies on the cooperation of several journal publishers to provide papers in an appropriate form. The work is carried out as a collaboration involving the Computer Laboratory, Chemistry Department and eScience Centre at Cambridge University, and is funded under the UK eScience programme.

Proceedings of ACL-08: HLT
Johanna D. Moore | Simone Teufel | James Allan | Sadaoki Furui
Proceedings of ACL-08: HLT

Proceedings of ACL-08: HLT, Short Papers
Johanna D. Moore | Simone Teufel | James Allan | Sadaoki Furui
Proceedings of ACL-08: HLT, Short Papers

2007

Whose Idea Was This, and Why Does it Matter? Attributing Scientific Work to Citations
Advaith Siddharthan | Simone Teufel
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference

Annotation of Chemical Named Entities
Peter Corbett | Colin Batchelor | Simone Teufel
Biological, translational, and clinical language processing

Panel Session: Discourse Annotation
Manfred Stede | Janyce Wiebe | Eva Hajičová | Brian Reese | Simone Teufel | Bonnie Webber | Theresa Wilson
Proceedings of the Linguistic Annotation Workshop

2006

Creating a Test Collection for Citation-based IR Experiments
Anna Ritchie | Simone Teufel | Stephen Robertson
Proceedings of the Human Language Technology Conference of the NAACL, Main Conference

A Bootstrapping Approach to Unsupervised Detection of Cue Phrase Variants
Rashid M. Abdalla | Simone Teufel
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

Proceedings of the Workshop on Task-Focused Summarization and Question Answering
Tat-Seng Chua | Jade Goldstein | Simone Teufel | Lucy Vanderwende
Proceedings of the Workshop on Task-Focused Summarization and Question Answering

How to Find Better Index Terms Through Citations
Anna Ritchie | Simone Teufel | Stephen Robertson
Proceedings of the Workshop on How Can Computational Linguistics Improve Information Retrieval?

An annotation scheme for citation function
Simone Teufel | Advaith Siddharthan | Dan Tidhar
Proceedings of the 7th SIGdial Workshop on Discourse and Dialogue

Automatic classification of citation function
Simone Teufel | Advaith Siddharthan | Dan Tidhar
Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing

2004

Agreement in Human Factoid Annotation for Summarization Evaluation
Simone Teufel | Hans van Halteren
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

Evaluating Information Content by Factoid Analysis: Human annotation and stability
Simone Teufel | Hans van Halteren
Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing

2003

Evaluation Challenges in Large-Scale Document Summarization
Dragomir R. Radev | Simone Teufel | Horacio Saggion | Wai Lam | John Blitzer | Hong Qi | Arda Çelebi | Danyu Liu | Elliott Drabek
Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics

Examining the consensus between human summaries: initial experiments with factoid analysis
Hans van Halteren | Simone Teufel
Proceedings of the HLT-NAACL 03 Text Summarization Workshop

2002

Meta-evaluation of Summaries in a Cross-lingual Environment using Content-based Metrics
Horacio Saggion | Dragomir Radev | Simone Teufel | Wai Lam
COLING 2002: The 19th International Conference on Computational Linguistics

Summarizing Scientific Articles: Experiments with Relevance and Rhetorical Status
Simone Teufel | Marc Moens
Computational Linguistics, Volume 28, Number 4, December 2002

Collection and linguistic processing of a large-scale corpus of medical articles
Simone Teufel | Noemie Elhadad
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

Developing Infrastructure for the Evaluation of Single and Multi-document Summarization Systems in a Cross-lingual Environment
Horacio Saggion | Dragomir Radev | Simone Teufel | Wai Lam | Stephanie M. Strassel
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

2000

What’s Yours and What’s Mine: Determining Intellectual Attribution in Scientific Text
Simone Teufel | Marc Moens
2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora

1999

An annotation scheme for discourse-level argumentation in research articles
Simone Teufel | Jean Carletta | Marc Moens
Ninth Conference of the European Chapter of the Association for Computational Linguistics

Discourse-level argumentation in scientific articles: human and automatic annotation
Simone Teufel | Marc Moens
Towards Standards and Tools for Discourse Tagging

1998

Meta-discourse markers and problem-structuring in scientific articles
Simone Teufel
Discourse Relations and Discourse Markers

1997

Towards resolution of bridging descriptions
Renata Vieira | Simone Teufel
35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics

Sentence extraction as a classification task
Simone Teufel
Intelligent Scalable Text Summarization

Resolving bridging references in unrestricted text
Massimo Poesio | Renata Vieira | Simone Teufel
Operational Factors in Practical, Robust Anaphora Resolution for Unrestricted Texts

1995

Corpus-based Method for Automatic Identification of Support Verbs for Nominalizations
Simone Teufel | Gregory Grefenstette
Seventh Conference of the European Chapter of the Association for Computational Linguistics

Co-authors

Tiago Pimentel 4

Dragomir Radev 4

Horacio Saggion 4

Jan Wira Gotama Putra 3

Yiannos Stathopoulos 3

Josef Valvoda 3

Hans van Halteren 3

Peter Corbett 2

Elliott Franco Drabek 2

Sadaoki Furui 2

Clara Meister 2

Johanna D. Moore 2

Stephen Robertson 2

Ekaterina Shutova 2

Renata Vieira 2

Hiroaki Yamada 2

Rashid M. Abdalla 1

Timothy Allison 1

Sasha Blair-Goldensohn 1

Jean Carletta 1

Tat-Seng Chua 1

Ann Copestake 1

Stanko Dimitrov 1

Rama Sanand Doddipatla 1

Noémie Elhadad 1

Jade Goldstein 1

Gregory Grefenstette 1

Kevin Heffernan 1

James Jardine 1

Anna Korhonen 1

Alexander Kuhnle 1

Maria Liakata 1

Kana Matsumura 1

Peter Murray-Rust 1

Ewa Muszyńska 1

Jahna Otterbacher 1

Massimo Poesio 1

Tamara Polajnar 1

James Pustejovsky 1

Elizabeth Salesky 1

Manfred Stede 1

Niklas Stoehr 1

Svetlana Stoyanchev 1

Stephanie Strassel 1

Alec Thompson 1

Michael Topper 1

Lucy Vanderwende 1

Benjamin Waldron 1

Bonnie Webber 1

Theresa Wilson 1

Diarmuid Ó Séaghdha 1

Venues