2022
pdf
bib
abs
The Aligned Multimodal Movie Treebank: An audio, video, dependency-parse treebank
Adam Yaari
|
Jan DeWitt
|
Henry Hu
|
Bennett Stankovits
|
Sue Felshin
|
Yevgeni Berzak
|
Helena Aparicio
|
Boris Katz
|
Ignacio Cases
|
Andrei Barbu
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Treebanks have traditionally included only text and were derived from written sources such as newspapers or the web. We introduce the Aligned Multimodal Movie Treebank (AMMT), an English language treebank derived from dialog in Hollywood movies which includes transcriptions of the audio-visual streams with word-level alignment, as well as part of speech tags and dependency parses in the Universal Dependencies formalism. AMMT consists of 31,264 sentences and 218,090 words, that will amount to the 3rd largest UD English treebank and the only multimodal treebank in UD. To help with the web-based annotation effort, we also introduce the Efficient Audio Alignment Annotator (EAAA), a companion tool that enables annotators to significantly speed-up their annotation processes.
2019
pdf
bib
abs
Recursive Routing Networks: Learning to Compose Modules for Language Understanding
Ignacio Cases
|
Clemens Rosenbaum
|
Matthew Riemer
|
Atticus Geiger
|
Tim Klinger
|
Alex Tamkin
|
Olivia Li
|
Sandhini Agarwal
|
Joshua D. Greene
|
Dan Jurafsky
|
Christopher Potts
|
Lauri Karttunen
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
We introduce Recursive Routing Networks (RRNs), which are modular, adaptable models that learn effectively in diverse environments. RRNs consist of a set of functions, typically organized into a grid, and a meta-learner decision-making component called the router. The model jointly optimizes the parameters of the functions and the meta-learner’s policy for routing inputs through those functions. RRNs can be incorporated into existing architectures in a number of ways; we explore adding them to word representation layers, recurrent network hidden layers, and classifier layers. Our evaluation task is natural language inference (NLI). Using the MultiNLI corpus, we show that an RRN’s routing decisions reflect the high-level genre structure of that corpus. To show that RRNs can learn to specialize to more fine-grained semantic distinctions, we introduce a new corpus of NLI examples involving implicative predicates, and show that the model components become fine-tuned to the inferential signatures that are characteristic of these predicates.
pdf
bib
abs
Posing Fair Generalization Tasks for Natural Language Inference
Atticus Geiger
|
Ignacio Cases
|
Lauri Karttunen
|
Christopher Potts
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
Deep learning models for semantics are generally evaluated using naturalistic corpora. Adversarial testing methods, in which models are evaluated on new examples with known semantic properties, have begun to reveal that good performance at these naturalistic tasks can hide serious shortcomings. However, we should insist that these evaluations be fair – that the models are given data sufficient to support the requisite kinds of generalization. In this paper, we define and motivate a formal notion of fairness in this sense. We then apply these ideas to natural language inference by constructing very challenging but provably fair artificial datasets and showing that standard neural models fail to generalize in the required ways; only task-specific models that jointly compose the premise and hypothesis are able to achieve high performance, and even these models do not solve the task perfectly.
2016
pdf
bib
Distinguishing Past, On-going, and Future Events: The EventStatus Corpus
Ruihong Huang
|
Ignacio Cases
|
Dan Jurafsky
|
Cleo Condoravdi
|
Ellen Riloff
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing
2014
pdf
bib
abs
Automatic Expansion of the MRC Psycholinguistic Database Imageability Ratings
Ting Liu
|
Kit Cho
|
G. Aaron Broadwell
|
Samira Shaikh
|
Tomek Strzalkowski
|
John Lien
|
Sarah Taylor
|
Laurie Feldman
|
Boris Yamrom
|
Nick Webb
|
Umit Boz
|
Ignacio Cases
|
Ching-sheng Lin
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Recent studies in metaphor extraction across several languages (Broadwell et al., 2013; Strzalkowski et al., 2013) have shown that word imageability ratings are highly correlated with the presence of metaphors in text. Information about imageability of words can be obtained from the MRC Psycholinguistic Database (MRCPD) for English words and Léxico Informatizado del Español Programa (LEXESP) for Spanish words, which is a collection of human ratings obtained in a series of controlled surveys. Unfortunately, word imageability ratings were collected for only a limited number of words: 9,240 words in English, 6,233 in Spanish; and are unavailable at all in the other two languages studied: Russian and Farsi. The present study describes an automated method for expanding the MRCPD by conferring imageability ratings over the synonyms and hyponyms of existing MRCPD words, as identified in Wordnet. The result is an expanded MRCPD+ database with imagea-bility scores for more than 100,000 words. The appropriateness of this expansion process is assessed by examining the structural coherence of the expanded set and by validating the expanded lexicon against human judgment. Finally, the performance of the metaphor extraction system is shown to improve significantly with the expanded database. This paper describes the process for English MRCPD+ and the resulting lexical resource. The process is analogous for other languages.
pdf
bib
abs
A Multi-Cultural Repository of Automatically Discovered Linguistic and Conceptual Metaphors
Samira Shaikh
|
Tomek Strzalkowski
|
Ting Liu
|
George Aaron Broadwell
|
Boris Yamrom
|
Sarah Taylor
|
Laurie Feldman
|
Kit Cho
|
Umit Boz
|
Ignacio Cases
|
Yuliya Peshkova
|
Ching-Sheng Lin
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
In this article, we present details about our ongoing work towards building a repository of Linguistic and Conceptual Metaphors. This resource is being developed as part of our research effort into the large-scale detection of metaphors from unrestricted text. We have stored a large amount of automatically extracted metaphors in American English, Mexican Spanish, Russian and Iranian Farsi in a relational database, along with pertinent metadata associated with these metaphors. A substantial subset of the contents of our repository has been systematically validated via rigorous social science experiments. Using information stored in the repository, we are able to posit certain claims in a cross-cultural context about how peoples in these cultures (America, Mexico, Russia and Iran) view particular concepts related to Governance and Economic Inequality through the use of metaphor. Researchers in the field can use this resource as a reference of typical metaphors used across these cultures. In addition, it can be used to recognize metaphors of the same form or pattern, in other domains of research.
pdf
bib
Computing Affect in Metaphors
Tomek Strzalkowski
|
Samira Shaikh
|
Kit Cho
|
George Aaron Broadwell
|
Laurie Feldman
|
Sarah Taylor
|
Boris Yamrom
|
Ting Liu
|
Ignacio Cases
|
Yuliya Peshkova
|
Kyle Elliot
Proceedings of the Second Workshop on Metaphor in NLP
pdf
bib
Discovering Conceptual Metaphors using Source Domain Spaces
Samira Shaikh
|
Tomek Strzalkowski
|
Kit Cho
|
Ting Liu
|
George Aaron Broadwell
|
Laurie Feldman
|
Sarah Taylor
|
Boris Yamrom
|
Ching-Sheng Lin
|
Ning Sa
|
Ignacio Cases
|
Yuliya Peshkova
|
Kyle Elliot
Proceedings of the 4th Workshop on Cognitive Aspects of the Lexicon (CogALex)
2013
pdf
bib
Robust Extraction of Metaphor from Novel Data
Tomek Strzalkowski
|
George Aaron Broadwell
|
Sarah Taylor
|
Laurie Feldman
|
Samira Shaikh
|
Ting Liu
|
Boris Yamrom
|
Kit Cho
|
Umit Boz
|
Ignacio Cases
|
Kyle Elliot
Proceedings of the First Workshop on Metaphor in NLP