Paola Merlo


2021

pdf bib
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume
Paola Merlo | Jorg Tiedemann | Reut Tsarfaty
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

pdf bib
Multi-Adversarial Learning for Cross-Lingual Word Embeddings
Haozhou Wang | James Henderson | Paola Merlo
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Generative adversarial networks (GANs) have succeeded in inducing cross-lingual word embeddings - maps of matching words across languages - without supervision. Despite these successes, GANs’ performance for the difficult case of distant languages is still not satisfactory. These limitations have been explained by GANs’ incorrect assumption that source and target embedding spaces are related by a single linear mapping and are approximately isomorphic. We assume instead that, especially across distant languages, the mapping is only piece-wise linear, and propose a multi-adversarial learning method. This novel method induces the seed cross-lingual dictionary through multiple mappings, each induced to fit the mapping for one subspace. Our experiments on unsupervised bilingual lexicon induction and cross-lingual document classification show that this method improves performance over previous single-mapping methods, especially for distant languages.

2020

pdf bib
Word associations and the distance properties of context-aware word embeddings
Maria A. Rodriguez | Paola Merlo
Proceedings of the 24th Conference on Computational Natural Language Learning

What do people know when they know the meaning of words? Word associations have been widely used to tap into lexical repre- sentations and their structure, as a way of probing semantic knowledge in humans. We investigate whether current word embedding spaces (contextualized and uncontextualized) can be considered good models of human lexi- cal knowledge by studying whether they have comparable characteristics to human associa- tion spaces. We study the three properties of association rank, asymmetry of similarity and triangle inequality. We find that word embeddings are good mod- els of some word associations properties. They replicate well human associations between words, and, like humans, their context-aware variants show violations of the triangle in- equality. While they do show asymmetry of similarities, their asymmetries do not map those of human association norms.

pdf bib
Syntactic Parsing in Humans and Machines
Paola Merlo
Proceedings of the 16th International Conference on Parsing Technologies and the IWPT 2020 Shared Task on Parsing into Enhanced Universal Dependencies

To process the syntactic structures of a language in ways that are compatible with human expectations, we need computational representations of lexical and syntactic properties that form the basis of human knowledge of words and sentences. Recent neural-network-based and distributed semantics techniques have developed systems of considerable practical success and impressive performance. As has been advocated by many, however, such systems still lack human-like properties. In particular, linguistic, psycholinguistic and neuroscientific investigations have shown that human processing of sentences is sensitive to structure and unbounded relations. In the spirit of better understanding the structure building and long-distance properties of neural networks, I will present an overview of recent results on agreement and island effects in syntax in several languages. While certain sets of results in the literature indicate that neural language models exhibit long-distance agreement abilities, other finer-grained investigation of how these effects are calculated indicates that that the similarity spaces they define do not correlate with human experimental results on intervention similarity in long-distance dependencies. This opens the way to reflections on how to better match the syntactic properties of natural languages in the representations of neural models.

2019

pdf bib
Weakly-Supervised Concept-based Adversarial Learning for Cross-lingual Word Embeddings
Haozhou Wang | James Henderson | Paola Merlo
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Distributed representations of words which map each word to a continuous vector have proven useful in capturing important linguistic information not only in a single language but also across different languages. Current unsupervised adversarial approaches show that it is possible to build a mapping matrix that aligns two sets of monolingual word embeddings without high quality parallel data, such as a dictionary or a sentence-aligned corpus. However, without an additional step of refinement, the preliminary mapping learnt by these methods is unsatisfactory, leading to poor performance for typologically distant languages. In this paper, we propose a weakly-supervised adversarial training method to overcome this limitation, based on the intuition that mapping across languages is better done at the concept level than at the word level. We propose a concept-based adversarial training method which improves the performance of previous unsupervised adversarial methods for most languages, and especially for typologically distant language pairs.

pdf bib
Cross-Lingual Word Embeddings and the Structure of the Human Bilingual Lexicon
Paola Merlo | Maria Andueza Rodriguez
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

Research on the bilingual lexicon has uncovered fascinating interactions between the lexicons of the native language and of the second language in bilingual speakers. In particular, it has been found that the lexicon of the underlying native language affects the organisation of the second language. In the spirit of interpreting current distributed representations, this paper investigates two models of cross-lingual word embeddings, comparing them to the shared-translation effect and the cross-lingual coactivation effects of false and true friends (cognates) found in humans. We find that the similarity structure of the cross-lingual word embeddings space yields the same effects as the human bilingual lexicon.

pdf bib
Probing Word and Sentence Embeddings for Long-distance Dependencies Effects in French and English
Paola Merlo
Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

The recent wide-spread and strong interest in RNNs has spurred detailed investigations of the distributed representations they generate and specifically if they exhibit properties similar to those characterising human languages. Results are at present inconclusive. In this paper, we extend previous work on long-distance dependencies in three ways. We manipulate word embeddings to translate them in a space that is attuned to the linguistic properties under study. We extend the work to sentence embeddings and to new languages. We confirm previous negative results: word embeddings and sentence embeddings do not unequivocally encode fine-grained linguistic properties of long-distance dependencies.

pdf bib
SyntaxFest 2019 Invited talk - Quantitative Computational Syntax: dependencies, intervention effects and word embeddings
Paola Merlo
Proceedings of the 18th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2019)

pdf bib
Intervention effects in object relatives in English and Italian: a study in quantitative computational syntax
Giuseppe Samo | Paola Merlo
Proceedings of the First Workshop on Quantitative Syntax (Quasy, SyntaxFest 2019)

2018

pdf bib
Festina Lente: A Farewell from the Editor
Paola Merlo
Computational Linguistics, Volume 44, Issue 2 - June 2018

pdf bib
Vectorial Semantic Spaces Do Not Encode Human Judgments of Intervention Similarity
Paola Merlo | Francesco Ackermann
Proceedings of the 22nd Conference on Computational Natural Language Learning

Despite their practical success and impressive performances, neural-network-based and distributed semantics techniques have often been criticized as they remain fundamentally opaque and difficult to interpret. In a vein similar to recent pieces of work investigating the linguistic abilities of these representations, we study another core, defining property of language: the property of long-distance dependencies. Human languages exhibit the ability to interpret discontinuous elements distant from each other in the string as if they were adjacent. This ability is blocked if a similar, but extraneous, element intervenes between the discontinuous components. We present results that show, under exhaustive and precise conditions, that one kind of word embeddings and the similarity spaces they define do not encode the properties of intervention similarity in long-distance dependencies, and that therefore they fail to represent this core linguistic notion.

2017

pdf bib
CLCL (Geneva) DINN Parser: a Neural Network Dependency Parser Ten Years Later
Christophe Moor | Paola Merlo | James Henderson | Haozhou Wang
Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

This paper describes the University of Geneva’s submission to the CoNLL 2017 shared task Multilingual Parsing from Raw Text to Universal Dependencies (listed as the CLCL (Geneva) entry). Our submitted parsing system is the grandchild of the first transition-based neural network dependency parser, which was the University of Geneva’s entry in the CoNLL 2007 multilingual dependency parsing shared task, with some improvements to speed and portability. These results provide a baseline for investigating how far we have come in the past ten years of work on neural network dependency parsing.

2016

pdf bib
Obituary: In Memoriam: Susan Armstrong
Pierrette Bouillon | Paola Merlo | Gertjan van Noord | Mike Rosner
Computational Linguistics, Volume 42, Issue 2 - June 2016

pdf bib
Modifications of Machine Translation Evaluation Metrics by Using Word Embeddings
Haozhou Wang | Paola Merlo
Proceedings of the Sixth Workshop on Hybrid Approaches to Translation (HyTra6)

Traditional machine translation evaluation metrics such as BLEU and WER have been widely used, but these metrics have poor correlations with human judgements because they badly represent word similarity and impose strict identity matching. In this paper, we propose some modifications to the traditional measures based on word embeddings for these two metrics. The evaluation results show that our modifications significantly improve their correlation with human judgements.

pdf bib
Multi-lingual Dependency Parsing Evaluation: a Large-scale Analysis of Word Order Properties using Artificial Data
Kristina Gulordava | Paola Merlo
Transactions of the Association for Computational Linguistics, Volume 4

The growing work in multi-lingual parsing faces the challenge of fair comparative evaluation and performance analysis across languages and their treebanks. The difficulty lies in teasing apart the properties of treebanks, such as their size or average sentence length, from those of the annotation scheme, and from the linguistic properties of languages. We propose a method to evaluate the effects of word order of a language on dependency parsing performance, while controlling for confounding treebank properties. The method uses artificially-generated treebanks that are minimal permutations of actual treebanks with respect to two word order properties: word order variation and dependency lengths. Based on these artificial data on twelve languages, we show that longer dependencies and higher word order variability degrade parsing performance. Our method also extends to minimal pairs of individual sentences, leading to a finer-grained understanding of parsing errors.

2015

pdf bib
Diachronic Trends in Word Order Freedom and Dependency Length in Dependency-Annotated Corpora of Latin and Ancient Greek
Kristina Gulordava | Paola Merlo
Proceedings of the Third International Conference on Dependency Linguistics (Depling 2015)

pdf bib
Evaluation of Two-level Dependency Representations of Argument Structure in Long-Distance Dependencies
Paola Merlo
Proceedings of the Third International Conference on Dependency Linguistics (Depling 2015)

pdf bib
Dependency length minimisation effects in short spans: a large-scale analysis of adjective placement in complex noun phrases
Kristina Gulordava | Paola Merlo | Benoit Crabbé
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

pdf bib
Structural and lexical factors in adjective placement in complex noun phrases across Romance languages
Kristina Gulordava | Paola Merlo
Proceedings of the Nineteenth Conference on Computational Natural Language Learning

2014

pdf bib
Likelihood of External Causation in the Structure of Events
Tanja Samardžić | Paola Merlo
Proceedings of the EACL 2014 Workshop on Computational Approaches to Causality in Language (CAtoCL)

2013

pdf bib
Multilingual Joint Parsing of Syntactic and Semantic Dependencies with a Latent Variable Model
James Henderson | Paola Merlo | Ivan Titov | Gabriele Musillo
Computational Linguistics, Volume 39, Issue 4 - December 2013

2011

pdf bib
Scaling up Automatic Cross-Lingual Semantic Role Annotation
Lonneke van der Plas | Paola Merlo | James Henderson
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2010

pdf bib
Cross-Lingual Validity of PropBank in the Manual Annotation of French
Lonneke van der Plas | Tanja Samardz̆ić | Paola Merlo
Proceedings of the Fourth Linguistic Annotation Workshop

pdf bib
Cross-Lingual Variation of Light Verb Constructions: Using Parallel Corpora and Automatic Alignment for Linguistic Research
Tanja Samardžić | Paola Merlo
Proceedings of the 2010 Workshop on NLP and Linguistics: Finding the Common Ground

2009

pdf bib
Abstraction and Generalisation in Semantic Role Labels: PropBank, VerbNet or both?
Paola Merlo | Lonneke Van Der Plas
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

pdf bib
Domain Adaptation with Artificial Data for Semantic Parsing of Speech
Lonneke van der Plas | James Henderson | Paola Merlo
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers

pdf bib
A Latent Variable Model of Synchronous Syntactic-Semantic Parsing for Multiple Languages
Andrea Gesmundo | James Henderson | Paola Merlo | Ivan Titov
Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL 2009): Shared Task

2008

pdf bib
Semantic Parsing for High-Precision Semantic Role Labelling
Paola Merlo | Gabriele Musillo
CoNLL 2008: Proceedings of the Twelfth Conference on Computational Natural Language Learning

pdf bib
A Latent Variable Model of Synchronous Parsing for Syntactic and Semantic Dependencies
James Henderson | Paola Merlo | Gabriele Musillo | Ivan Titov
CoNLL 2008: Proceedings of the Twelfth Conference on Computational Natural Language Learning

pdf bib
Unlexicalised Hidden Variable Models of Split Dependency Grammars
Gabriele Antonio Musillo | Paola Merlo
Proceedings of ACL-08: HLT, Short Papers

2007

pdf bib
Proceedings of the Tenth International Conference on Parsing Technologies
Harry Bunt | Paola Merlo
Proceedings of the Tenth International Conference on Parsing Technologies

2006

pdf bib
The Notion of Argument in Prepositional Phrase Attachment
Paola Merlo | Eva Esteve Ferrer
Computational Linguistics, Volume 32, Number 3, September 2006

pdf bib
Robust Parsing of the Proposition Bank
Gabriele Musillo | Paola Merlo
Proceedings of the Workshop on ROMAND 2006:Robust Methods in Analysis of Natural language Data

pdf bib
Accurate Parsing of the Proposition Bank
Gabriele Musillo | Paola Merlo
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers

2005

pdf bib
Accurate Function Parsing
Paola Merlo | Gabriele Musillo
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

pdf bib
Lexical and Structural Biases for Function Parsing
Gabriele Musillo | Paola Merlo
Proceedings of the Ninth International Workshop on Parsing Technology

2003

pdf bib
Generalised PP-attachment Disambiguation Using Corpus-based Linguistic Diagnostics
Paola Merlo
10th Conference of the European Chapter of the Association for Computational Linguistics

2002

pdf bib
A Multilingual Paradigm for Automatic Verb Classification
Paola Merlo | Suzanne Stevenson | Vivian Tsang | Gianluca Allaria
Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics

pdf bib
Using Syntactic Analysis to Increase Efficiency in Visualizing Text Collections
James Henderson | Paola Merlo | Ivan Petroff | Gerold Schneider
COLING 2002: The 19th International Conference on Computational Linguistics

pdf bib
Crosslinguistic Transfer in Automatic Verb Classification
Vivian Tsang | Suzanne Stevenson | Paola Merlo
COLING 2002: The 19th International Conference on Computational Linguistics

2001

pdf bib
Automatic distinction of arguments and modifiers: the case of prepositional phrases
Paola Merlo | Matthias Leybold
Proceedings of the ACL 2001 Workshop on Computational Natural Language Learning (ConLL)

pdf bib
Automatic Verb Classification Based on Statistical Distributions of Argument Structure
Paola Merlo | Suzanne Stevenson
Computational Linguistics, Volume 27, Number 3, September 2001

2000

pdf bib
Establishing the Upper Bound and Inter-judge Agreement of a Verb Classification Task
Paola Merlo | Suzanne Stevenson
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

pdf bib
Automatic Lexical Acquisition Based on Statistical Distributions
Suzanne Stevenson | Paola Merlo
COLING 2000 Volume 2: The 18th International Conference on Computational Linguistics

1999

pdf bib
Supervised Learning of Lexical Semantic Verb Classes Using Frequency Distributions
Suzanne Stevenson | Paola Merlo | Natalia Kariaeva Rutgers
SIGLEX99: Standardizing Lexical Resources

pdf bib
Automatic Verb Classification Using Distributions of Grammatical Features
Suzanne Stevenson | Paola Merlo
Ninth Conference of the European Chapter of the Association for Computational Linguistics

1998

pdf bib
What grammars tell us about corpora: the case of reduced relative clauses
Paola Merlo | Suzanne Stevenson
Sixth Workshop on Very Large Corpora

1997

pdf bib
Attaching Multiple Prepositional Phrases: Backed-off Estimation Generalized
Paola Merlo
Second Conference on Empirical Methods in Natural Language Processing

1995

pdf bib
Modularity and Information Content Classes in Principle-Based Parsing
Paola Merlo
Computational Linguistics, Volume 21, Number 4, December 1995

1993

pdf bib
A Principle-based Parser for Foreign Language Training in German and Arabic
Joe Garman | Jeffery Martin | Paola Merlo | Amy Weinberg
Proceedings of the Third International Workshop on Parsing Technologies

In this paper we discuss the design and implementation of a parser for German and Arabic, which is currently being used in a tutoring system for foreign language training. Computer-aided language tutoring is a good application for testing the robustness and flexibility of a parsing system, since the input is usually ungrammatical in some way. Efficiency is also a concern, as tutoring applications typically run on personal computers, with the parser sharing memory with other components of the system. Our system is principle-based, which ensures a compact representation, and improves portability, needed in order to extend the initial design from German to Arabic and (eventually) Spanish. Currently, the parser diagnoses agreement errors, case errors, selection errors, and some word order errors. The parser can handle simple and complex declaratives and questions, topicalisations, verb movement, relative clauses — broad enough coverage to be useful in the design of real exercises and dialogues.

1992

pdf bib
An LR Category-Neutral Parser With Left Corner Prediction
Paola Merlo
30th Annual Meeting of the Association for Computational Linguistics