Yang Xu


2022

pdf bib
Neural reality of argument structure constructions
Bai Li | Zining Zhu | Guillaume Thomas | Frank Rudzicz | Yang Xu
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In lexicalist linguistic theories, argument structure is assumed to be predictable from the meaning of verbs. As a result, the verb is the primary determinant of the meaning of a clause. In contrast, construction grammarians propose that argument structure is encoded in constructions (or form-meaning pairs) that are distinct from verbs. Two decades of psycholinguistic research have produced substantial empirical evidence in favor of the construction view. Here we adapt several psycholinguistic studies to probe for the existence of argument structure constructions (ASCs) in Transformer-based language models (LMs). First, using a sentence sorting experiment, we find that sentences sharing the same construction are closer in embedding space than sentences sharing the same verb. Furthermore, LMs increasingly prefer grouping by construction with more input data, mirroring the behavior of non-native language learners. Second, in a “Jabberwocky” priming-based experiment, we find that LMs associate ASCs with meaning, even in semantically nonsensical sentences. Our work offers the first evidence for ASCs in LMs and highlights the potential to devise novel probing methods grounded in psycholinguistic research.

2021

pdf bib
Proceedings of the 2nd International Workshop on Computational Approaches to Historical Language Change 2021
Nina Tahmasebi | Adam Jatowt | Yang Xu | Simon Hengchen | Syrielle Montariol | Haim Dubossarsky
Proceedings of the 2nd International Workshop on Computational Approaches to Historical Language Change 2021

pdf bib
A diachronic evaluation of gender asymmetry in euphemism
Anna Kapron-King | Yang Xu
Proceedings of the 2nd International Workshop on Computational Approaches to Historical Language Change 2021

The use of euphemisms is a known driver of language change. It has been proposed that women use euphemisms more than men. Although there have been several studies investigating gender differences in language, the claim about euphemism usage has not been tested comprehensively through time. If women do use euphemisms more, this could mean that women also lead the formation of new euphemisms and language change over time. Using four large diachronic text corpora of English, we evaluate the claim that women use euphemisms more than men through a quantitative analysis. We assembled a list of 106 euphemism-taboo pairs to analyze their relative use through time by each gender in the corpora. Contrary to the existing belief, our results show that women do not use euphemisms with a higher proportion than men. We repeated the analysis using different subsets of the euphemism-taboo pairs list and found that our result was robust. Our study indicates that in a broad range of settings involving both speech and writing, and with varying degrees of formality, women do not use or form euphemisms more than men.

pdf bib
Global divergence and local convergence of utterance semantic representations in dialogue
Yang Xu
Proceedings of the Society for Computation in Linguistics 2021

pdf bib
Predicting emergent linguistic compositions through time: Syntactic frame extension via multimodal chaining
Lei Yu | Yang Xu
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Natural language relies on a finite lexicon to express an unbounded set of emerging ideas. One result of this tension is the formation of new compositions, such that existing linguistic units can be combined with emerging items into novel expressions. We develop a framework that exploits the cognitive mechanisms of chaining and multimodal knowledge to predict emergent compositional expressions through time. We present the syntactic frame extension model (SFEM) that draws on the theory of chaining and knowledge from “percept”, “concept”, and “language” to infer how verbs extend their frames to form new compositions with existing and novel nouns. We evaluate SFEM rigorously on the 1) modalities of knowledge and 2) categorization models of chaining, in a syntactically parsed English corpus over the past 150 years. We show that multimodal SFEM predicts newly emerged verb syntax and arguments substantially better than competing models using purely linguistic or unimodal knowledge. We find support for an exemplar view of chaining as opposed to a prototype view and reveal how the joint approach of multimodal chaining may be fundamental to the creation of literal and figurative language uses including metaphor and metonymy.

pdf bib
A Computational Framework for Slang Generation
Zhewei Sun | Richard Zemel | Yang Xu
Transactions of the Association for Computational Linguistics, Volume 9

Abstract Slang is a common type of informal language, but its flexible nature and paucity of data resources present challenges for existing natural language systems. We take an initial step toward machine generation of slang by developing a framework that models the speaker’s word choice in slang context. Our framework encodes novel slang meaning by relating the conventional and slang senses of a word while incorporating syntactic and contextual knowledge in slang usage. We construct the framework using a combination of probabilistic inference and neural contrastive learning. We perform rigorous evaluations on three slang dictionaries and show that our approach not only outperforms state-of-the-art language models, but also better predicts the historical emergence of slang word usages from 1960s to 2000s. We interpret the proposed models and find that the contrastively learned semantic space is sensitive to the similarities between slang and conventional senses of words. Our work creates opportunities for the automated generation and interpretation of informal language.

pdf bib
An unsupervised framework for tracing textual sources of moral change
Aida Ramezani | Zining Zhu | Frank Rudzicz | Yang Xu
Findings of the Association for Computational Linguistics: EMNLP 2021

Morality plays an important role in social well-being, but people’s moral perception is not stable and changes over time. Recent advances in natural language processing have shown that text is an effective medium for informing moral change, but no attempt has been made to quantify the origins of these changes. We present a novel unsupervised framework for tracing textual sources of moral change toward entities through time. We characterize moral change with probabilistic topical distributions and infer the source text that exerts prominent influence on the moral time course. We evaluate our framework on a diverse set of data ranging from social media to news articles. We show that our framework not only captures fine-grained human moral judgments, but also identifies coherent source topics of moral change triggered by historical events. We apply our methodology to analyze the news in the COVID-19 pandemic and demonstrate its utility in identifying sources of moral change in high-impact and real-time social events.

pdf bib
LayoutLMv2: Multi-modal Pre-training for Visually-rich Document Understanding
Yang Xu | Yiheng Xu | Tengchao Lv | Lei Cui | Furu Wei | Guoxin Wang | Yijuan Lu | Dinei Florencio | Cha Zhang | Wanxiang Che | Min Zhang | Lidong Zhou
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Pre-training of text and layout has proved effective in a variety of visually-rich document understanding tasks due to its effective model architecture and the advantage of large-scale unlabeled scanned/digital-born documents. We propose LayoutLMv2 architecture with new pre-training tasks to model the interaction among text, layout, and image in a single multi-modal framework. Specifically, with a two-stream multi-modal Transformer encoder, LayoutLMv2 uses not only the existing masked visual-language modeling task but also the new text-image alignment and text-image matching tasks, which make it better capture the cross-modality interaction in the pre-training stage. Meanwhile, it also integrates a spatial-aware self-attention mechanism into the Transformer architecture so that the model can fully understand the relative positional relationship among different text blocks. Experiment results show that LayoutLMv2 outperforms LayoutLM by a large margin and achieves new state-of-the-art results on a wide variety of downstream visually-rich document understanding tasks, including FUNSD (0.7895 to 0.8420), CORD (0.9493 to 0.9601), SROIE (0.9524 to 0.9781), Kleister-NDA (0.8340 to 0.8520), RVL-CDIP (0.9443 to 0.9564), and DocVQA (0.7295 to 0.8672).

pdf bib
How is BERT surprised? Layerwise detection of linguistic anomalies
Bai Li | Zining Zhu | Guillaume Thomas | Yang Xu | Frank Rudzicz
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Transformer language models have shown remarkable ability in detecting when a word is anomalous in context, but likelihood scores offer no information about the cause of the anomaly. In this work, we use Gaussian models for density estimation at intermediate layers of three language models (BERT, RoBERTa, and XLNet), and evaluate our method on BLiMP, a grammaticality judgement benchmark. In lower layers, surprisal is highly correlated to low token frequency, but this correlation diminishes in upper layers. Next, we gather datasets of morphosyntactic, semantic, and commonsense anomalies from psycholinguistic studies; we find that the best performing model RoBERTa exhibits surprisal in earlier layers when the anomaly is morphosyntactic than when it is semantic, while commonsense anomalies do not exhibit surprisal at any intermediate layer. These results suggest that language models employ separate mechanisms to detect different types of linguistic anomalies.

2020

pdf bib
Interactively-Propagative Attention Learning for Implicit Discourse Relation Recognition
Huibin Ruan | Yu Hong | Yang Xu | Zhen Huang | Guodong Zhou | Min Zhang
Proceedings of the 28th International Conference on Computational Linguistics

We tackle implicit discourse relation recognition. Both self-attention and interactive-attention mechanisms have been applied for attention-aware representation learning, which improves the current discourse analysis models. To take advantages of the two attention mechanisms simultaneously, we develop a propagative attention learning model using a cross-coupled two-channel network. We experiment on Penn Discourse Treebank. The test results demonstrate that our model yields substantial improvements over the baselines (BiLSTM and BERT).

pdf bib
Word class flexibility: A deep contextualized approach
Bai Li | Guillaume Thomas | Yang Xu | Frank Rudzicz
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Word class flexibility refers to the phenomenon whereby a single word form is used across different grammatical categories. Extensive work in linguistic typology has sought to characterize word class flexibility across languages, but quantifying this phenomenon accurately and at scale has been fraught with difficulties. We propose a principled methodology to explore regularity in word class flexibility. Our method builds on recent work in contextualized word embeddings to quantify semantic shift between word classes (e.g., noun-to-verb, verb-to-noun), and we apply this method to 37 languages. We find that contextualized embeddings not only capture human judgment of class variation within words in English, but also uncover shared tendencies in class flexibility across languages. Specifically, we find greater semantic variation when flexible lemmas are used in their dominant word class, supporting the view that word class flexibility is a directional process. Our work highlights the utility of deep contextualized models in linguistic typology.

pdf bib
Inferring symmetry in natural language
Chelsea Tanchip | Lei Yu | Aotao Xu | Yang Xu
Findings of the Association for Computational Linguistics: EMNLP 2020

We present a methodological framework for inferring symmetry of verb predicates in natural language. Empirical work on predicate symmetry has taken two main approaches. The feature-based approach focuses on linguistic features pertaining to symmetry. The context-based approach denies the existence of absolute symmetry but instead argues that such inference is context dependent. We develop methods that formalize these approaches and evaluate them against a novel symmetry inference sentence (SIS) dataset comprised of 400 naturalistic usages of literature-informed verbs spanning the spectrum of symmetry-asymmetry. Our results show that a hybrid transfer learning model that integrates linguistic features with contextualized language models most faithfully predicts the empirical data. Our work integrates existing approaches to symmetry in natural language and suggests how symmetry inference can improve systematicity in state-of-the-art language models.

pdf bib
Corpus of Chinese Dynastic Histories: Gender Analysis over Two Millennia
Sergey Zinin | Yang Xu
Proceedings of the 12th Language Resources and Evaluation Conference

Chinese dynastic histories form a large continuous linguistic space of approximately 2000 years, from the 3rd century BCE to the 18th century CE. The histories are documented in Classical (Literary) Chinese in a corpus of over 20 million characters, suitable for the computational analysis of historical lexicon and semantic change. However, there is no freely available open-source corpus of these histories, making Classical Chinese low-resource. This project introduces a new open-source corpus of twenty-four dynastic histories covered by Creative Commons license. An original list of Classical Chinese gender-specific terms was developed as a case study for analyzing the historical linguistic use of male and female terms. The study demonstrates considerable stability in the usage of these terms, with dominance of male terms. Exploration of word meanings uses keyword analysis of focus corpora created for gender-specific terms. This method yields meaningful semantic representations that can be used for future studies of diachronic semantics.

2019

pdf bib
Slang Detection and Identification
Zhengqi Pei | Zhewei Sun | Yang Xu
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

The prevalence of informal language such as slang presents challenges for natural language systems, particularly in the automatic discovery of flexible word usages. Previous work has explored slang in terms of dictionary construction, sentiment analysis, word formation, and interpretation, but scarce research has attempted the basic problem of slang detection and identification. We examine the extent to which deep learning methods support automatic detection and identification of slang from natural sentences using a combination of bidirectional recurrent neural networks, conditional random field, and multilayer perceptron. We test these models based on a comprehensive set of linguistic features in sentence-level detection and token-level identification of slang. We found that a prominent feature of slang is the surprising use of words across syntactic categories or syntactic shift (e.g., verb-noun). Our best models detect the presence of slang at the sentence level with an F1-score of 0.80 and identify its exact position at the token level with an F1-Score of 0.50.

pdf bib
HIT-SCIR at MRP 2019: A Unified Pipeline for Meaning Representation Parsing via Efficient Training and Effective Encoding
Wanxiang Che | Longxu Dou | Yang Xu | Yuxuan Wang | Yijia Liu | Ting Liu
Proceedings of the Shared Task on Cross-Framework Meaning Representation Parsing at the 2019 Conference on Natural Language Learning

This paper describes our system (HIT-SCIR) for CoNLL 2019 shared task: Cross-Framework Meaning Representation Parsing. We extended the basic transition-based parser with two improvements: a) Efficient Training by realizing Stack LSTM parallel training; b) Effective Encoding via adopting deep contextualized word embeddings BERT. Generally, we proposed a unified pipeline to meaning representation parsing, including framework-specific transition-based parsers, BERT-enhanced word representation, and post-processing. In the final evaluation, our system was ranked first according to ALL-F1 (86.2%) and especially ranked first in UCCA framework (81.67%).

pdf bib
Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change
Nina Tahmasebi | Lars Borin | Adam Jatowt | Yang Xu
Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change

pdf bib
Treat the Word As a Whole or Look Inside? Subword Embeddings Model Language Change and Typology
Yang Xu | Jiasheng Zhang | David Reitter
Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change

We use a variant of word embedding model that incorporates subword information to characterize the degree of compositionality in lexical semantics. Our models reveal some interesting yet contrastive patterns of long-term change in multiple languages: Indo-European languages put more weight on subword units in newer words, while conversely Chinese puts less weights on the subwords, but more weight on the word as a whole. Our method provides novel evidence and methodology that enriches existing theories in evolutionary linguistics. The resulting word vectors also has decent performance in NLP-related tasks.

pdf bib
Leveraging BERT to Improve the FEARS Index for Stock Forecasting
Linyi Yang | Ruihai Dong | Tin Lok James Ng | Yang Xu
Proceedings of the First Workshop on Financial Technology and Natural Language Processing

pdf bib
Text-based inference of moral sentiment change
Jing Yi Xie | Renato Ferreira Pinto Junior | Graeme Hirst | Yang Xu
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

We present a text-based framework for investigating moral sentiment change of the public via longitudinal corpora. Our framework is based on the premise that language use can inform people’s moral perception toward right or wrong, and we build our methodology by exploring moral biases learned from diachronic word embeddings. We demonstrate how a parameter-free model supports inference of historical shifts in moral sentiment toward concepts such as slavery and democracy over centuries at three incremental levels: moral relevance, moral polarity, and fine-grained moral dimensions. We apply this methodology to visualizing moral time courses of individual concepts and analyzing the relations between psycholinguistic variables and rates of moral sentiment change at scale. Our work offers opportunities for applying natural language processing toward characterizing moral sentiment change in society.

2018

pdf bib
Not that much power: Linguistic alignment is influenced more by low-level linguistic features rather than social power
Yang Xu | Jeremy Cole | David Reitter
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Linguistic alignment between dialogue partners has been claimed to be affected by their relative social power. A common finding has been that interlocutors of higher power tend to receive more alignment than those of lower power. However, these studies overlook some low-level linguistic features that can also affect alignment, which casts doubts on these findings. This work characterizes the effect of power on alignment with logistic regression models in two datasets, finding that the effect vanishes or is reversed after controlling for low-level features such as utterance length. Thus, linguistic alignment is explained better by low-level features than by social power. We argue that a wider range of factors, especially cognitive factors, need to be taken into account for future studies on observational data when social factors of language use are in question.

pdf bib
Incorporating Latent Meanings of Morphological Compositions to Enhance Word Embeddings
Yang Xu | Jiawei Liu | Wei Yang | Liusheng Huang
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Traditional word embedding approaches learn semantic information at word level while ignoring the meaningful internal structures of words like morphemes. Furthermore, existing morphology-based models directly incorporate morphemes to train word embeddings, but still neglect the latent meanings of morphemes. In this paper, we explore to employ the latent meanings of morphological compositions of words to train and enhance word embeddings. Based on this purpose, we propose three Latent Meaning Models (LMMs), named LMM-A, LMM-S and LMM-M respectively, which adopt different strategies to incorporate the latent meanings of morphemes during the training process. Experiments on word similarity, syntactic analogy and text classification are conducted to validate the feasibility of our models. The results demonstrate that our models outperform the baselines on five word similarity datasets. On Wordsim-353 and RG-65 datasets, our models nearly achieve 5% and 7% gains over the classic CBOW model, respectively. For the syntactic analogy and text classification tasks, our models also surpass all the baselines including a morphology-based model.

pdf bib
Incorporating Image Matching Into Knowledge Acquisition for Event-Oriented Relation Recognition
Yu Hong | Yang Xu | Huibin Ruan | Bowei Zou | Jianmin Yao | Guodong Zhou
Proceedings of the 27th International Conference on Computational Linguistics

Event relation recognition is a challenging language processing task. It is required to determine the relation class of a pair of query events, such as causality, under the condition that there isn’t any reliable clue for use. We follow the traditional statistical approach in this paper, speculating the relation class of the target events based on the relation-class distributions on the similar events. There is minimal supervision used during the speculation process. In particular, we incorporate image processing into the acquisition of similar event instances, including the utilization of images for visually representing event scenes, and the use of the neural network based image matching for approximate calculation between events. We test our method on the ACE-R2 corpus and compared our model with the fully-supervised neural network models. Experimental results show that we achieve a comparable performance to CNN while slightly better than LSTM.

pdf bib
Using active learning to expand training data for implicit discourse relation recognition
Yang Xu | Yu Hong | Huibin Ruan | Jianmin Yao | Min Zhang | Guodong Zhou
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

We tackle discourse-level relation recognition, a problem of determining semantic relations between text spans. Implicit relation recognition is challenging due to the lack of explicit relational clues. The increasingly popular neural network techniques have been proven effective for semantic encoding, whereby widely employed to boost semantic relation discrimination. However, learning to predict semantic relations at a deep level heavily relies on a great deal of training data, but the scale of the publicly available data in this field is limited. In this paper, we follow Rutherford and Xue (2015) to expand the training data set using the corpus of explicitly-related arguments, by arbitrarily dropping the overtly presented discourse connectives. On the basis, we carry out an experiment of sampling, in which a simple active learning approach is used, so as to take the informative instances for data expansion. The goal is to verify whether the selective use of external data not only reduces the time consumption of retraining but also ensures a better system performance. Using the expanded training data, we retrain a convolutional neural network (CNN) based classifer which is a simplified version of Qin et al. (2016)’s stacking gated relation recognizer. Experimental results show that expanding the training set with small-scale carefully-selected external data yields substantial performance gain, with the improvements of about 4% for accuracy and 3.6% for F-score. This allows a weak classifier to achieve a comparable performance against the state-of-the-art systems.

pdf bib
Is Nike female? Exploring the role of sound symbolism in predicting brand name gender
Sridhar Moorthy | Ruth Pogacar | Samin Khan | Yang Xu
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Are brand names such as Nike female or male? Previous research suggests that the sound of a person’s first name is associated with the person’s gender, but no research has tried to use this knowledge to assess the gender of brand names. We present a simple computational approach that uses sound symbolism to address this open issue. Consistent with previous research, a model trained on various linguistic features of name endings predicts human gender with high accuracy. Applying this model to a data set of over a thousand commercially-traded brands in 17 product categories, our results reveal an overall bias toward male names, cutting across both male-oriented product categories as well as female-oriented categories. In addition, we find variation within categories, suggesting that firms might be seeking to imbue their brands with differentiating characteristics as part of their competitive strategy.

2017

pdf bib
Spectral Analysis of Information Density in Dialogue Predicts Collaborative Task Performance
Yang Xu | David Reitter
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We propose a perspective on dialogue that focuses on relative information contributions of conversation partners as a key to successful communication. We predict the success of collaborative task in English and Danish corpora of task-oriented dialogue. Two features are extracted from the frequency domain representations of the lexical entropy series of each interlocutor, power spectrum overlap (PSO) and relative phase (RP). We find that PSO is a negative predictor of task success, while RP is a positive one. An SVM with these features significantly improved on previous task success prediction models. Our findings suggest that the strategic distribution of information density between interlocutors is relevant to task success.

2016

pdf bib
Entropy Converges Between Dialogue Participants: Explanations from an Information-Theoretic Perspective
Yang Xu | David Reitter
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Convergence of Syntactic Complexity in Conversation
Yang Xu | David Reitter
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2015

pdf bib
An Evaluation and Comparison of Linguistic Alignment Measures
Yang Xu | David Reitter
Proceedings of the 6th Workshop on Cognitive Modeling and Computational Linguistics