Chakaveh Saedi

2020

Large Scale Author Obfuscation Using Siamese Variational Auto-Encoder: The SiamAO System
Chakaveh Saedi | Mark Dras
Proceedings of the Ninth Joint Conference on Lexical and Computational Semantics

Author obfuscation is the task of masking the author of a piece of text, with applications in privacy. Recent advances in deep neural networks have boosted author identification performance making author obfuscation more challenging. Existing approaches to author obfuscation are largely heuristic. Obfuscation can, however, be thought of as the construction of adversarial examples to attack author identification, suggesting that the deep learning architectures used for adversarial attacks could have application here. Current architectures are proposed to construct adversarial examples against classification-based models, which in author identification would exclude the high-performing similarity-based models employed when facing large number of authorial classes. In this paper, we propose the first deep learning architecture for constructing adversarial examples against similarity-based learners, and explore its application to author obfuscation. We analyse the output from both success in obfuscation and language acceptability, as well as comparing the performance with some common baselines, and showing promising results in finding a balance between safety and soundness of the perturbed texts.

pdf bib abs

Comparative Probing of Lexical Semantics Theories for Cognitive Plausibility and Technological Usefulness
António Branco | João António Rodrigues | Malgorzata Salawa | Ruben Branco | Chakaveh Saedi
Proceedings of the 28th International Conference on Computational Linguistics

Lexical semantics theories differ in advocating that the meaning of words is represented as an inference graph, a feature mapping or a cooccurrence vector, thus raising the question: is it the case that one of these approaches is superior to the others in representing lexical semantics appropriately? Or in its non antagonistic counterpart: could there be a unified account of lexical semantics where these approaches seamlessly emerge as (partial) renderings of (different) aspects of a core semantic knowledge base? In this paper, we contribute to these research questions with a number of experiments that systematically probe different lexical semantics theories for their levels of cognitive plausibility and of technological usefulness. The empirical findings obtained from these experiments advance our insight on lexical semantics as the feature-based approach emerges as superior to the other ones, and arguably also move us closer to finding answers to the research questions above.

pdf bib abs

The objective of the present paper is twofold, to present the MWN.PT WordNet and to report on its construction and on the lessons learned with it. The MWN.PT WordNet for Portuguese includes 41,000 concepts, expressed by 38,000 lexical units. Its synsets were manually validated and are linked to semantically equivalent synsets of the Princeton WordNet of English, and thus transitively to the many wordnets for other languages that are also linked to this English wordnet. To the best of our knowledge, it is the largest high quality, manually validated and cross-lingually integrated, wordnet of Portuguese distributed for reuse. Its construction was initiated more than one decade ago and its description is published for the first time in the present paper. It follows a three step <projection, validation with alignment, completion> methodology consisting on the manual validation and expansion of the outcome of an automatic projection procedure of synsets and their hypernym relations, followed by another automatic procedure that transferred the relations of remaining semantic types across wordnets of different languages.

2019

pdf bib abs

Whom to Learn From? Graph- vs. Text-based Word Embeddings
Małgorzata Salawa | António Branco | Ruben Branco | João António Rodrigues | Chakaveh Saedi
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

Vectorial representations of meaning can be supported by empirical data from diverse sources and obtained with diverse embedding approaches. This paper aims at screening this experimental space and reports on an assessment of word embeddings supported (i) by data in raw texts vs. in lexical graphs, (ii) by lexical information encoded in association- vs. inference-based graphs, and obtained (iii) by edge reconstruction- vs. matrix factorisation vs. random walk-based graph embedding methods. The results observed with these experiments indicate that the best solutions with graph-based word embeddings are very competitive, consistently outperforming mainstream text-based ones.

pdf bib abs

Assessing Wordnets with WordNet Embeddings
Ruben Branco | João Rodrigues | Chakaveh Saedi | António Branco
Proceedings of the 10th Global Wordnet Conference

An effective conversion method was proposed in the literature to obtain a lexical semantic space from a lexical semantic graph, thus permitting to obtain WordNet embeddings from WordNets. In this paper, we propose the exploitation of this conversion methodology as the basis for the comparative assessment of WordNets: given two WordNets, their relative quality in terms of capturing the lexical semantics of a given language, can be assessed by (i) converting each WordNet into the corresponding semantic space (i.e. into WordNet embeddings), (ii) evaluating the resulting WordNet embeddings under the typical semantic similarity prediction task used to evaluate word embeddings in general; and (iii) comparing the performance in that task of the two word embeddings, extracted from the two WordNets. A better performance in that evaluation task results from the word embeddings that are better at capturing the semantic similarity of words, which, in turn, result from the WordNet that is of higher quality at capturing the semantics of words.

2018

pdf bib abs

WordnetLoom – a Multilingual Wordnet Editing System Focused on Graph-based Presentation
Tomasz Naskręt | Agnieszka Dziob | Maciej Piasecki | Chakaveh Saedi | António Branco
Proceedings of the 9th Global Wordnet Conference

The paper presents a new re-built and expanded, version 2.0 of WordnetLoom – an open wordnet editor. It facilitates work on a multilingual system of wordnets, is based on efficient software architecture of thin client, and offers more flexibility in enriching wordnet representation. This new version is built on the experience collected during the use of the previous one for more than 10 years of plWordNet development. We discuss its extensions motivated by the collected experience. A special focus is given to the development of a variant for the needs of MultiWordnet of Portuguese, which is based on a very different wordnet development model.

pdf bib

Semantic Equivalence Detection: Are Interrogatives Harder than Declaratives?
João Rodrigues | Chakaveh Saedi | António Branco | João Silva
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib abs

Predicting Brain Activation with WordNet Embeddings
João António Rodrigues | Ruben Branco | João Silva | Chakaveh Saedi | António Branco
Proceedings of the Eight Workshop on Cognitive Aspects of Computational Language Learning and Processing

The task of taking a semantic representation of a noun and predicting the brain activity triggered by it in terms of fMRI spatial patterns was pioneered by Mitchell et al. 2008. That seminal work used word co-occurrence features to represent the meaning of the nouns. Even though the task does not impose any specific type of semantic representation, the vast majority of subsequent approaches resort to feature-based models or to semantic spaces (aka word embeddings). We address this task, with competitive results, by using instead a semantic network to encode lexical semantics, thus providing further evidence for the cognitive plausibility of this approach to model lexical meaning.

pdf bib abs

WordNet Embeddings
Chakaveh Saedi | António Branco | João António Rodrigues | João Silva
Proceedings of the Third Workshop on Representation Learning for NLP

Semantic networks and semantic spaces have been two prominent approaches to represent lexical semantics. While a unified account of the lexical meaning relies on one being able to convert between these representations, in both directions, the conversion direction from semantic networks into semantic spaces started to attract more attention recently. In this paper we present a methodology for this conversion and assess it with a case study. When it is applied over WordNet, the performance of the resulting embeddings in a mainstream semantic similarity task is very good, substantially superior to the performance of word embeddings based on very large collections of texts like word2vec.

pdf bib

Browsing and Supporting Pluricentric Global Wordnet, or just your Wordnet of Interest
António Branco | Ruben Branco | Chakaveh Saedi | João Silva
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib abs

Ways of Asking and Replying in Duplicate Question Detection
João António Rodrigues | Chakaveh Saedi | Vladislav Maraev | João Silva | António Branco
Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017)

This paper presents the results of systematic experimentation on the impact in duplicate question detection of different types of questions across both a number of established approaches and a novel, superior one used to address this language processing task. This study permits to gain a novel insight on the different levels of robustness of the diverse detection methods with respect to different conditions of their application, including the ones that approximate real usage scenarios.

Chakaveh Saedi

2020

2019

2018

2017

2016

2009

Co-authors

Venues