Steven Schockaert


2021

pdf bib
Deriving Word Vectors from Contextualized Language Models using Topic-Aware Mention Selection
Yixiao Wang | Zied Bouraoui | Luis Espinosa Anke | Steven Schockaert
Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021)

One of the long-standing challenges in lexical semantics consists in learning representations of words which reflect their semantic properties. The remarkable success of word embeddings for this purpose suggests that high-quality representations can be obtained by summarizing the sentence contexts of word mentions. In this paper, we propose a method for learning word representations that follows this basic strategy, but differs from standard word embeddings in two important ways. First, we take advantage of contextualized language models (CLMs) rather than bags of word vectors to encode contexts. Second, rather than learning a word vector directly, we use a topic model to partition the contexts in which words appear, and then learn different topic-specific vectors for each word. Finally, we use a task-specific supervision signal to make a soft selection of the resulting vectors. We show that this simple strategy leads to high-quality word vectors, which are more predictive of semantic properties than word embeddings and existing CLM-based strategies.

pdf bib
Distilling Relation Embeddings from Pretrained Language Models
Asahi Ushio | Jose Camacho-Collados | Steven Schockaert
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Pre-trained language models have been found to capture a surprisingly rich amount of lexical knowledge, ranging from commonsense properties of everyday concepts to detailed factual knowledge about named entities. Among others, this makes it possible to distill high-quality word vectors from pre-trained language models. However, it is currently unclear to what extent it is possible to distill relation embeddings, i.e. vectors that characterize the relationship between two words. Such relation embeddings are appealing because they can, in principle, encode relational knowledge in a more fine-grained way than is possible with knowledge graphs. To obtain relation embeddings from a pre-trained language model, we encode word pairs using a (manually or automatically generated) prompt, and we fine-tune the language model such that relationally similar word pairs yield similar output vectors. We find that the resulting relation embeddings are highly competitive on analogy (unsupervised) and relation classification (supervised) benchmarks, even without any task-specific fine-tuning. Source code to reproduce our experimental results and the model checkpoints are available in the following repository: https://github.com/asahi417/relbert

pdf bib
Probing Pre-Trained Language Models for Disease Knowledge
Israa Alghanmi | Luis Espinosa Anke | Steven Schockaert
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies?
Asahi Ushio | Luis Espinosa Anke | Steven Schockaert | Jose Camacho-Collados
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Analogies play a central role in human commonsense reasoning. The ability to recognize analogies such as “eye is to seeing what ear is to hearing”, sometimes referred to as analogical proportions, shape how we structure knowledge and understand language. Surprisingly, however, the task of identifying such analogies has not yet received much attention in the language model era. In this paper, we analyze the capabilities of transformer-based language models on this unsupervised task, using benchmarks obtained from educational settings, as well as more commonly used datasets. We find that off-the-shelf language models can identify analogies to a certain extent, but struggle with abstract and complex relations, and results are highly sensitive to model architecture and hyperparameters. Overall the best results were obtained with GPT-2 and RoBERTa, while configurations using BERT were not able to outperform word embedding models. Our results raise important questions for future work about how, and to what extent, pre-trained language models capture knowledge about abstract semantic relations.

2020

pdf bib
A Mixture-of-Experts Model for Learning Multi-Facet Entity Embeddings
Rana Alshaikh | Zied Bouraoui | Shelan Jeawak | Steven Schockaert
Proceedings of the 28th International Conference on Computational Linguistics

Various methods have already been proposed for learning entity embeddings from text descriptions. Such embeddings are commonly used for inferring properties of entities, for recommendation and entity-oriented search, and for injecting background knowledge into neural architectures, among others. Entity embeddings essentially serve as a compact encoding of a similarity relation, but similarity is an inherently multi-faceted notion. By representing entities as single vectors, existing methods leave it to downstream applications to identify these different facets, and to select the most relevant ones. In this paper, we propose a model that instead learns several vectors for each entity, each of which intuitively captures a different aspect of the considered domain. We use a mixture-of-experts formulation to jointly learn these facet-specific embeddings. The individual entity embeddings are learned using a variant of the GloVe model, which has the advantage that we can easily identify which properties are modelled well in which of the learned embeddings. This is exploited by an associated gating network, which uses pre-trained word vectors to encourage the properties that are modelled by a given embedding to be semantically coherent, i.e. to encourage each of the individual embeddings to capture a meaningful facet.

pdf bib
Don’t Patronize Me! An Annotated Dataset with Patronizing and Condescending Language towards Vulnerable Communities
Carla Perez Almendros | Luis Espinosa Anke | Steven Schockaert
Proceedings of the 28th International Conference on Computational Linguistics

In this paper, we introduce a new annotated dataset which is aimed at supporting the development of NLP models to identify and categorize language that is patronizing or condescending towards vulnerable communities (e.g. refugees, homeless people, poor families). While the prevalence of such language in the general media has long been shown to have harmful effects, it differs from other types of harmful language, in that it is generally used unconsciously and with good intentions. We furthermore believe that the often subtle nature of patronizing and condescending language (PCL) presents an interesting technical challenge for the NLP community. Our analysis of the proposed dataset shows that identifying PCL is hard for standard NLP models, with language models such as BERT achieving the best results.

pdf bib
Learning Company Embeddings from Annual Reports for Fine-grained Industry Characterization
Tomoki Ito | Jose Camacho Collados | Hiroki Sakaji | Steven Schockaert
Proceedings of the Second Workshop on Financial Technology and Natural Language Processing

pdf bib
Combining BERT with Static Word Embeddings for Categorizing Social Media
Israa Alghanmi | Luis Espinosa Anke | Steven Schockaert
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)

Pre-trained neural language models (LMs) have achieved impressive results in various natural language processing tasks, across different languages. Surprisingly, this extends to the social media genre, despite the fact that social media often has very different characteristics from the language that LMs have seen during training. A particularly striking example is the performance of AraBERT, an LM for the Arabic language, which is successful in categorizing social media posts in Arabic dialects, despite only having been trained on Modern Standard Arabic. Our hypothesis in this paper is that the performance of LMs for social media can nonetheless be improved by incorporating static word vectors that have been specifically trained on social media. We show that a simple method for incorporating such word vectors is indeed successful in several Arabic and English benchmarks. Curiously, however, we also find that similar improvements are possible with word vectors that have been trained on traditional text sources (e.g. Wikipedia).

pdf bib
Cardiff University at SemEval-2020 Task 6: Fine-tuning BERT for Domain-Specific Definition Classification
Shelan Jeawak | Luis Espinosa-Anke | Steven Schockaert
Proceedings of the Fourteenth Workshop on Semantic Evaluation

We describe the system submitted to SemEval-2020 Task 6, Subtask 1. The aim of this subtask is to predict whether a given sentence contains a definition or not. Unsurprisingly, we found that strong results can be achieved by fine-tuning a pre-trained BERT language model. In this paper, we analyze the performance of this strategy. Among others, we show that results can be improved by using a two-step fine-tuning process, in which the BERT model is first fine-tuned on the full training set, and then further specialized towards a target domain.

pdf bib
On the Robustness of Unsupervised and Semi-supervised Cross-lingual Word Embedding Learning
Yerai Doval | Jose Camacho-Collados | Luis Espinosa Anke | Steven Schockaert
Proceedings of the 12th Language Resources and Evaluation Conference

Cross-lingual word embeddings are vector representations of words in different languages where words with similar meaning are represented by similar vectors, regardless of the language. Recent developments which construct these embeddings by aligning monolingual spaces have shown that accurate alignments can be obtained with little or no supervision, which usually comes in the form of bilingual dictionaries. However, the focus has been on a particular controlled scenario for evaluation, and there is no strong evidence on how current state-of-the-art systems would fare with noisy text or for language pairs with major linguistic differences. In this paper we present an extensive evaluation over multiple cross-lingual embedding models, analyzing their strengths and limitations with respect to different variables such as target language, training corpora and amount of supervision. Our conclusions put in doubt the view that high-quality cross-lingual embeddings can always be learned without much supervision.

2019

pdf bib
Learning Conceptual Spaces with Disentangled Facets
Rana Alshaikh | Zied Bouraoui | Steven Schockaert
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

Conceptual spaces are geometric representations of meaning that were proposed by G ̈ardenfors (2000). They share many similarities with the vector space embeddings that are commonly used in natural language processing. However, rather than representing entities in a single vector space, conceptual spaces are usually decomposed into several facets, each of which is then modelled as a relatively low dimensional vector space. Unfortunately, the problem of learning such conceptual spaces has thus far only received limited attention. To address this gap, we analyze how, and to what extent, a given vector space embedding can be decomposed into meaningful facets in an unsupervised fashion. While this problem is highly challenging, we show that useful facets can be discovered by relying on word embeddings to group semantically related features.

pdf bib
Cardiff University at SemEval-2019 Task 4: Linguistic Features for Hyperpartisan News Detection
Carla Pérez-Almendros | Luis Espinosa-Anke | Steven Schockaert
Proceedings of the 13th International Workshop on Semantic Evaluation

This paper summarizes our contribution to the Hyperpartisan News Detection task in SemEval 2019. We experiment with two different approaches: 1) an SVM classifier based on word vector averages and hand-crafted linguistic features, and 2) a BiLSTM-based neural text classifier trained on a filtered training set. Surprisingly, despite their different nature, both approaches achieve an accuracy of 0.74. The main focus of this paper is to further analyze the remarkable fact that a simple feature-based approach can perform on par with modern neural classifiers. We also highlight the effectiveness of our filtering strategy for training the neural network on a large but noisy training set.

pdf bib
Learning Household Task Knowledge from WikiHow Descriptions
Yilun Zhou | Julie Shah | Steven Schockaert
Proceedings of the 5th Workshop on Semantic Deep Learning (SemDeep-5)

pdf bib
Relational Word Embeddings
Jose Camacho-Collados | Luis Espinosa Anke | Steven Schockaert
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

While word embeddings have been shown to implicitly encode various forms of attributional knowledge, the extent to which they capture relational information is far more limited. In previous work, this limitation has been addressed by incorporating relational knowledge from external knowledge bases when learning the word embedding. Such strategies may not be optimal, however, as they are limited by the coverage of available resources and conflate similarity with other forms of relatedness. As an alternative, in this paper we propose to encode relational knowledge in a separate word embedding, which is aimed to be complementary to a given standard word embedding. This relational word embedding is still learned from co-occurrence statistics, and can thus be used even when no external knowledge base is available. Our analysis shows that relational word vectors do indeed capture information that is complementary to what is encoded in standard word embeddings.

pdf bib
Word and Document Embedding with vMF-Mixture Priors on Context Word Vectors
Shoaib Jameel | Steven Schockaert
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Word embedding models typically learn two types of vectors: target word vectors and context word vectors. These vectors are normally learned such that they are predictive of some word co-occurrence statistic, but they are otherwise unconstrained. However, the words from a given language can be organized in various natural groupings, such as syntactic word classes (e.g. nouns, adjectives, verbs) and semantic themes (e.g. sports, politics, sentiment). Our hypothesis in this paper is that embedding models can be improved by explicitly imposing a cluster structure on the set of context word vectors. To this end, our model relies on the assumption that context word vectors are drawn from a mixture of von Mises-Fisher (vMF) distributions, where the parameters of this mixture distribution are jointly optimized with the word vectors. We show that this results in word vectors which are qualitatively different from those obtained with existing word embedding models. We furthermore show that our embedding model can also be used to learn high-quality document representations.

pdf bib
Collocation Classification with Unsupervised Relation Vectors
Luis Espinosa Anke | Steven Schockaert | Leo Wanner
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Lexical relation classification is the task of predicting whether a certain relation holds between a given pair of words. In this paper, we explore to which extent the current distributional landscape based on word embeddings provides a suitable basis for classification of collocations, i.e., pairs of words between which idiosyncratic lexical relations hold. First, we introduce a novel dataset with collocations categorized according to lexical functions. Second, we conduct experiments on a subset of this benchmark, comparing it in particular to the well known DiffVec dataset. In these experiments, in addition to simple word vector arithmetic operations, we also investigate the role of unsupervised relation vectors as a complementary input. While these relation vectors indeed help, we also show that lexical function classification poses a greater challenge than the syntactic and semantic relations that are typically used for benchmarks in the literature.

2018

pdf bib
Unsupervised Learning of Distributional Relation Vectors
Shoaib Jameel | Zied Bouraoui | Steven Schockaert
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Word embedding models such as GloVe rely on co-occurrence statistics to learn vector representations of word meaning. While we may similarly expect that co-occurrence statistics can be used to capture rich information about the relationships between different words, existing approaches for modeling such relationships are based on manipulating pre-trained word vectors. In this paper, we introduce a novel method which directly learns relation vectors from co-occurrence statistics. To this end, we first introduce a variant of GloVe, in which there is an explicit connection between word vectors and PMI weighted co-occurrence vectors. We then show how relation vectors can be naturally embedded into the resulting vector space.

pdf bib
Modelling Salient Features as Directions in Fine-Tuned Semantic Spaces
Thomas Ager | Ondřej Kuželka | Steven Schockaert
Proceedings of the 22nd Conference on Computational Natural Language Learning

In this paper we consider semantic spaces consisting of objects from some particular domain (e.g. IMDB movie reviews). Various authors have observed that such semantic spaces often model salient features (e.g. how scary a movie is) as directions. These feature directions allow us to rank objects according to how much they have the corresponding feature, and can thus play an important role in interpretable classifiers, recommendation systems, or entity-oriented search engines, among others. Methods for learning semantic spaces, however, are mostly aimed at modelling similarity. In this paper, we argue that there is an inherent trade-off between capturing similarity and faithfully modelling features as directions. Following this observation, we propose a simple method to fine-tune existing semantic spaces, with the aim of improving the quality of their feature directions. Crucially, our method is fully unsupervised, requiring only a bag-of-words representation of the objects as input.

pdf bib
Knowledge Representation with Conceptual Spaces
Steven Schockaert
Proceedings of the Third Workshop on Semantic Deep Learning

Entity embeddings are vector space representations of a given domain of interest. They are typically learned from text corpora (possibly in combination with any available structured knowledge), based on the intuition that similar entities should be represented by similar vectors. The usefulness of such entity embeddings largely stems from the fact that they implicitly encode a rich amount of knowledge about the considered domain, beyond mere similarity. In an embedding of movies, for instance, we may expect all movies from a given genre to be located in some low-dimensional manifold. This is particularly useful in supervised learning settings, where it may e.g. allow neural movie recommenders to base predictions on the genre of a movie, without that genre having to be specified explicitly for each movie, or without even the need to specify that the genre of a movie is a property that may have predictive value for the considered task. In unsupervised settings, however, such implicitly encoded knowledge cannot be leveraged. Conceptual spaces, as proposed by Grdenfors, are similar to entity embeddings, but provide more structure. In conceptual spaces, among others, dimensions are interpretable and grouped into facets, and properties and concepts are explicitly modelled as (vague) regions. Thanks to this additional structure, conceptual spaces can be used as a knowledge representation framework, which can also be effectively exploited in unsupervised settings. Given a conceptual space of movies, for instance, we are able to answer queries that ask about similarity w.r.t. a particular facet (e.g. movies which are cinematographically similar to Jurassic Park), that refer to a given feature (e.g. movies which are scarier than Jurassic Park but otherwise similar), or that refer to particular properties or concepts (e.g. thriller from the 1990s with a dinosaur theme). Compared to standard entity embeddings, however, conceptual spaces are more challenging to learn in a purely data-driven fashion. In this talk, I will give an overview of some approaches for learning such representations that have recently been developed within the context of the FLEXILOG project.

pdf bib
Syntactically Aware Neural Architectures for Definition Extraction
Luis Espinosa-Anke | Steven Schockaert
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

Automatically identifying definitional knowledge in text corpora (Definition Extraction or DE) is an important task with direct applications in, among others, Automatic Glossary Generation, Taxonomy Learning, Question Answering and Semantic Search. It is generally cast as a binary classification problem between definitional and non-definitional sentences. In this paper we present a set of neural architectures combining Convolutional and Recurrent Neural Networks, which are further enriched by incorporating linguistic information via syntactic dependencies. Our experimental results in the task of sentence classification, on two benchmarking DE datasets (one generic, one domain-specific), show that these models obtain consistent state of the art results. Furthermore, we demonstrate that models trained on clean Wikipedia-like definitions can successfully be applied to more noisy domain-specific corpora.

pdf bib
Relation Induction in Word Embeddings Revisited
Zied Bouraoui | Shoaib Jameel | Steven Schockaert
Proceedings of the 27th International Conference on Computational Linguistics

Given a set of instances of some relation, the relation induction task is to predict which other word pairs are likely to be related in the same way. While it is natural to use word embeddings for this task, standard approaches based on vector translations turn out to perform poorly. To address this issue, we propose two probabilistic relation induction models. The first model is based on translations, but uses Gaussians to explicitly model the variability of these translations and to encode soft constraints on the source and target words that may be chosen. In the second model, we use Bayesian linear regression to encode the assumption that there is a linear relationship between the vector representations of related words, which is considerably weaker than the assumption underlying translation based models.

pdf bib
SeVeN: Augmenting Word Embeddings with Unsupervised Relation Vectors
Luis Espinosa-Anke | Steven Schockaert
Proceedings of the 27th International Conference on Computational Linguistics

We present SeVeN (Semantic Vector Networks), a hybrid resource that encodes relationships between words in the form of a graph. Different from traditional semantic networks, these relations are represented as vectors in a continuous vector space. We propose a simple pipeline for learning such relation vectors, which is based on word vector averaging in combination with an ad hoc autoencoder. We show that by explicitly encoding relational information in a dedicated vector space we can capture aspects of word meaning that are complementary to what is captured by word embeddings. For example, by examining clusters of relation vectors, we observe that relational similarities can be identified at a more abstract level than with traditional word vector differences. Finally, we test the effectiveness of semantic vector networks in two tasks: measuring word similarity and neural text categorization. SeVeN is available at bitbucket.org/luisespinosa/seven.

pdf bib
Improving Cross-Lingual Word Embeddings by Meeting in the Middle
Yerai Doval | Jose Camacho-Collados | Luis Espinosa-Anke | Steven Schockaert
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Cross-lingual word embeddings are becoming increasingly important in multilingual NLP. Recently, it has been shown that these embeddings can be effectively learned by aligning two disjoint monolingual vector spaces through linear transformations, using no more than a small bilingual dictionary as supervision. In this work, we propose to apply an additional transformation after the initial alignment step, which moves cross-lingual synonyms towards a middle point between them. By applying this transformation our aim is to obtain a better cross-lingual integration of the vector spaces. In addition, and perhaps surprisingly, the monolingual spaces also improve by this transformation. This is in contrast to the original alignment, which is typically learned such that the structure of the monolingual spaces is preserved. Our experiments confirm that the resulting cross-lingual embeddings outperform state-of-the-art models in both monolingual and cross-lingual evaluation tasks.

pdf bib
Interpretable Emoji Prediction via Label-Wise Attention LSTMs
Francesco Barbieri | Luis Espinosa-Anke | Jose Camacho-Collados | Steven Schockaert | Horacio Saggion
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Human language has evolved towards newer forms of communication such as social media, where emojis (i.e., ideograms bearing a visual meaning) play a key role. While there is an increasing body of work aimed at the computational modeling of emoji semantics, there is currently little understanding about what makes a computational model represent or predict a given emoji in a certain way. In this paper we propose a label-wise attention mechanism with which we attempt to better understand the nuances underlying emoji prediction. In addition to advantages in terms of interpretability, we show that our proposed architecture improves over standard baselines in emoji prediction, and does particularly well when predicting infrequent emojis.

2017

pdf bib
Modeling Context Words as Regions: An Ordinal Regression Approach to Word Embedding
Shoaib Jameel | Steven Schockaert
Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)

Vector representations of word meaning have found many applications in the field of natural language processing. Word vectors intuitively represent the average context in which a given word tends to occur, but they cannot explicitly model the diversity of these contexts. Although region representations of word meaning offer a natural alternative to word vectors, only few methods have been proposed that can effectively learn word regions. In this paper, we propose a new word embedding model which is based on SVM regression. We show that the underlying ranking interpretation of word contexts is sufficient to match, and sometimes outperform, the performance of popular methods such as Skip-gram. Furthermore, we show that by using a quadratic kernel, we can effectively learn word regions, which outperform existing unsupervised models for the task of hypernym detection.

2016

pdf bib
D-GloVe: A Feasible Least Squares Model for Estimating Word Embedding Densities
Shoaib Jameel | Steven Schockaert
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

We propose a new word embedding model, inspired by GloVe, which is formulated as a feasible least squares optimization problem. In contrast to existing models, we explicitly represent the uncertainty about the exact definition of each word vector. To this end, we estimate the error that results from using noisy co-occurrence counts in the formulation of the model, and we model the imprecision that results from including uninformative context words. Our experimental results demonstrate that this model compares favourably with existing word embedding models.

2006

pdf bib
Supporting temporal question answering: strategies for offline data collection
David Ahn | Steven Schockaert | Martine De Cock | Etienne Kerre
Proceedings of the Fifth International Workshop on Inference in Computational Semantics (ICoS-5)