James Glass

Also published as: James R. Glass


2021

pdf bib
Analyzing the Forgetting Problem in Pretrain-Finetuning of Open-domain Dialogue Response Models
Tianxing He | Jun Liu | Kyunghyun Cho | Myle Ott | Bing Liu | James Glass | Fuchun Peng
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

In this work, we study how the finetuning stage in the pretrain-finetune framework changes the behavior of a pretrained neural language generator. We focus on the transformer encoder-decoder model for the open-domain dialogue response generation task. Our major finding is that after standard finetuning, the model forgets some of the important language generation skills acquired during large-scale pretraining. We demonstrate the forgetting phenomenon through a set of detailed behavior analysis from the perspectives of knowledge transfer, context sensitivity, and function space projection. As a preliminary attempt to alleviate the forgetting problem, we propose an intuitive finetuning strategy named “mix-review”. We find that mix-review effectively regularizes the finetuning process, and the forgetting problem is alleviated to some extent. Finally, we discuss interesting behavior of the resulting dialogue model and its implications.

pdf bib
Text-Free Image-to-Speech Synthesis Using Learned Segmental Units
Wei-Ning Hsu | David Harwath | Tyler Miller | Christopher Song | James Glass
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

In this paper we present the first model for directly synthesizing fluent, natural-sounding spoken audio captions for images that does not require natural language text as an intermediate representation or source of supervision. Instead, we connect the image captioning module and the speech synthesis module with a set of discrete, sub-word speech units that are discovered with a self-supervised visual grounding task. We conduct experiments on the Flickr8k spoken caption dataset in addition to a novel corpus of spoken audio captions collected for the popular MSCOCO dataset, demonstrating that our generated captions also capture diverse visual semantics of the images they describe. We investigate several different intermediate speech representations, and empirically find that the representation must satisfy several important properties to serve as drop-in replacements for text.

pdf bib
Mitigating Biases in Toxic Language Detection through Invariant Rationalization
Yung-Sung Chuang | Mingye Gao | Hongyin Luo | James Glass | Hung-yi Lee | Yun-Nung Chen | Shang-Wen Li
Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021)

Automatic detection of toxic language plays an essential role in protecting social media users, especially minority groups, from verbal abuse. However, biases toward some attributes, including gender, race, and dialect, exist in most training datasets for toxicity detection. The biases make the learned models unfair and can even exacerbate the marginalization of people. Considering that current debiasing methods for general natural language understanding tasks cannot effectively mitigate the biases in the toxicity detectors, we propose to use invariant rationalization (InvRat), a game-theoretic framework consisting of a rationale generator and a predictor, to rule out the spurious correlation of certain syntactic patterns (e.g., identity mentions, dialect) to toxicity labels. We empirically show that our method yields lower false positive rate in both lexical and dialectal attributes than previous debiasing methods.

2020

pdf bib
Negative Training for Neural Dialogue Response Generation
Tianxing He | James Glass
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Although deep learning models have brought tremendous advancements to the field of open-domain dialogue response generation, recent research results have revealed that the trained models have undesirable generation behaviors, such as malicious responses and generic (boring) responses. In this work, we propose a framework named “Negative Training” to minimize such behaviors. Given a trained model, the framework will first find generated samples that exhibit the undesirable behavior, and then use them to feed negative training signals for fine-tuning the model. Our experiments show that negative training can significantly reduce the hit rate of malicious responses, or discourage frequent responses and improve response diversity.

pdf bib
Improved Speech Representations with Multi-Target Autoregressive Predictive Coding
Yu-An Chung | James Glass
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Training objectives based on predictive coding have recently been shown to be very effective at learning meaningful representations from unlabeled speech. One example is Autoregressive Predictive Coding (Chung et al., 2019), which trains an autoregressive RNN to generate an unseen future frame given a context such as recent past frames. The basic hypothesis of these approaches is that hidden states that can accurately predict future frames are a useful representation for many downstream tasks. In this paper we extend this hypothesis and aim to enrich the information encoded in the hidden states by training the model to make more accurate future predictions. We propose an auxiliary objective that serves as a regularization to improve generalization of the future frame prediction task. Experimental results on phonetic classification, speech recognition, and speech translation not only support the hypothesis, but also demonstrate the effectiveness of our approach in learning representations that contain richer phonetic content.

pdf bib
What Was Written vs. Who Read It: News Media Profiling Using Text Analysis and Social Media Context
Ramy Baly | Georgi Karadzhov | Jisun An | Haewoon Kwak | Yoan Dinkov | Ahmed Ali | James Glass | Preslav Nakov
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Predicting the political bias and the factuality of reporting of entire news outlets are critical elements of media profiling, which is an understudied but an increasingly important research direction. The present level of proliferation of fake, biased, and propagandistic content online has made it impossible to fact-check every single suspicious claim, either manually or automatically. Thus, it has been proposed to profile entire news outlets and to look for those that are likely to publish fake or biased content. This makes it possible to detect likely “fake news” the moment they are published, by simply checking the reliability of their source. From a practical perspective, political bias and factuality of reporting have a linguistic aspect but also a social context. Here, we study the impact of both, namely (i) what was written (i.e., what was published by the target medium, and how it describes itself in Twitter) vs. (ii) who reads it (i.e., analyzing the target medium’s audience on social media). We further study (iii) what was written about the target medium (in Wikipedia). The evaluation results show that what was written matters most, and we further show that putting all information sources together yields huge improvements over the current state-of-the-art.

pdf bib
Similarity Analysis of Contextual Word Representation Models
John Wu | Yonatan Belinkov | Hassan Sajjad | Nadir Durrani | Fahim Dalvi | James Glass
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

This paper investigates contextual word representation models from the lens of similarity analysis. Given a collection of trained models, we measure the similarity of their internal representations and attention. Critically, these models come from vastly different architectures. We use existing and novel similarity measures that aim to gauge the level of localization of information in the deep models, and facilitate the investigation of which design factors affect model similarity, without requiring any external linguistic annotation. The analysis reveals that models within the same family are more similar to one another, as may be expected. Surprisingly, different architectures have rather similar representations, but different individual neurons. We also observed differences in information localization in lower and higher layers and found that higher layers are more affected by fine-tuning on downstream tasks.

pdf bib
On the Linguistic Representational Power of Neural Machine Translation Models
Yonatan Belinkov | Nadir Durrani | Fahim Dalvi | Hassan Sajjad | James Glass
Computational Linguistics, Volume 46, Issue 1 - March 2020

Despite the recent success of deep neural networks in natural language processing and other spheres of artificial intelligence, their interpretability remains a challenge. We analyze the representations learned by neural machine translation (NMT) models at various levels of granularity and evaluate their quality through relevant extrinsic properties. In particular, we seek answers to the following questions: (i) How accurately is word structure captured within the learned representations, which is an important aspect in translating morphologically rich languages? (ii) Do the representations capture long-range dependencies, and effectively handle syntactically divergent languages? (iii) Do the representations capture lexical semantics? We conduct a thorough investigation along several parameters: (i) Which layers in the architecture capture each of these linguistic phenomena; (ii) How does the choice of translation unit (word, character, or subword unit) impact the linguistic properties captured by the underlying representations? (iii) Do the encoder and decoder learn differently and independently? (iv) Do the representations learned by multilingual NMT models capture the same amount of linguistic information as their bilingual counterparts? Our data-driven, quantitative evaluation illuminates important aspects in NMT models and their ability to capture various linguistic phenomena. We show that deep NMT models trained in an end-to-end fashion, without being provided any direct supervision during the training process, learn a non-trivial amount of linguistic information. Notable findings include the following observations: (i) Word morphology and part-of-speech information are captured at the lower layers of the model; (ii) In contrast, lexical semantics or non-local syntactic and semantic dependencies are better represented at the higher layers of the model; (iii) Representations learned using characters are more informed about word-morphology compared to those learned using subword units; and (iv) Representations learned by multilingual models are richer compared to bilingual models.

pdf bib
We Can Detect Your Bias: Predicting the Political Ideology of News Articles
Ramy Baly | Giovanni Da San Martino | James Glass | Preslav Nakov
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

We explore the task of predicting the leading political ideology or bias of news articles. First, we collect and release a large dataset of 34,737 articles that were manually annotated for political ideology –left, center, or right–, which is well-balanced across both topics and media. We further use a challenging experimental setup where the test examples come from media that were not seen during training, which prevents the model from learning to detect the source of the target news article instead of predicting its political ideology. From a modeling perspective, we propose an adversarial media adaptation, as well as a specially adapted triplet loss. We further add background information about the source, and we show that it is quite helpful for improving article-level prediction. Our experimental results show very sizable improvements over using state-of-the-art pre-trained Transformers in this challenging setup.

pdf bib
Knowledge Grounded Conversational Symptom Detection with Graph Memory Networks
Hongyin Luo | Shang-Wen Li | James Glass
Proceedings of the 3rd Clinical Natural Language Processing Workshop

In this work, we propose a novel goal-oriented dialog task, automatic symptom detection. We build a system that can interact with patients through dialog to detect and collect clinical symptoms automatically, which can save a doctor’s time interviewing the patient. Given a set of explicit symptoms provided by the patient to initiate a dialog for diagnosing, the system is trained to collect implicit symptoms by asking questions, in order to collect more information for making an accurate diagnosis. After getting the reply from the patient for each question, the system also decides whether current information is enough for a human doctor to make a diagnosis. To achieve this goal, we propose two neural models and a training pipeline for the multi-step reasoning task. We also build a knowledge graph as additional inputs to further improve model performance. Experiments show that our model significantly outperforms the baseline by 4%, discovering 67% of implicit symptoms on average with a limited number of questions.

pdf bib
A Systematic Characterization of Sampling Algorithms for Open-ended Language Generation
Moin Nadeem | Tianxing He | Kyunghyun Cho | James Glass
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing

This work studies the widely adopted ancestral sampling algorithms for auto-regressive language models. We use the quality-diversity (Q-D) trade-off to investigate three popular sampling methods (top-k, nucleus and tempered sampling). We focus on the task of open-ended language generation, and first show that the existing sampling algorithms have similar performance. By carefully inspecting the transformations defined by different sampling algorithms, we identify three key properties that are shared among them: entropy reduction, order preservation, and slope preservation. To validate the importance of the identified properties, we design two sets of new sampling methods: one set in which each algorithm satisfies all three properties, and one set in which each algorithm violates at least one of the properties. We compare their performance with existing algorithms, and find that violating the identified properties could lead to drastic performance degradation, as measured by the Q-D trade-off. On the other hand, we find that the set of sampling algorithms that satisfy these properties performs on par with the existing sampling algorithms.

2019

pdf bib
Contrastive Language Adaptation for Cross-Lingual Stance Detection
Mitra Mohtarami | James Glass | Preslav Nakov
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

We study cross-lingual stance detection, which aims to leverage labeled data in one language to identify the relative perspective (or stance) of a given document with respect to a claim in a different target language. In particular, we introduce a novel contrastive language adaptation approach applied to memory networks, which ensures accurate alignment of stances in the source and target languages, and can effectively deal with the challenge of limited labeled data in the target language. The evaluation results on public benchmark datasets and comparison against current state-of-the-art approaches demonstrate the effectiveness of our approach.

pdf bib
Tanbih: Get To Know What You Are Reading
Yifan Zhang | Giovanni Da San Martino | Alberto Barrón-Cedeño | Salvatore Romeo | Jisun An | Haewoon Kwak | Todor Staykovski | Israa Jaradat | Georgi Karadzhov | Ramy Baly | Kareem Darwish | James Glass | Preslav Nakov
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations

We introduce Tanbih, a news aggregator with intelligent analysis tools to help readers understanding what’s behind a news story. Our system displays news grouped into events and generates media profiles that show the general factuality of reporting, the degree of propagandistic content, hyper-partisanship, leading political ideology, general frame of reporting, and stance with respect to various claims and topics of a news outlet. In addition, we automatically analyse each article to detect whether it is propagandistic and to determine its stance with respect to a number of controversial topics.

pdf bib
Neural Multi-Task Learning for Stance Prediction
Wei Fang | Moin Nadeem | Mitra Mohtarami | James Glass
Proceedings of the Second Workshop on Fact Extraction and VERification (FEVER)

We present a multi-task learning model that leverages large amount of textual information from existing datasets to improve stance prediction. In particular, we utilize multiple NLP tasks under both unsupervised and supervised settings for the target stance prediction task. Our model obtains state-of-the-art performance on a public benchmark dataset, Fake News Challenge, outperforming current approaches by a wide margin.

pdf bib
Multi-Task Ordinal Regression for Jointly Predicting the Trustworthiness and the Leading Political Ideology of News Media
Ramy Baly | Georgi Karadzhov | Abdelrhman Saleh | James Glass | Preslav Nakov
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

In the context of fake news, bias, and propaganda, we study two important but relatively under-explored problems: (i) trustworthiness estimation (on a 3-point scale) and (ii) political ideology detection (left/right bias on a 7-point scale) of entire news outlets, as opposed to evaluating individual articles. In particular, we propose a multi-task ordinal regression framework that models the two problems jointly. This is motivated by the observation that hyper-partisanship is often linked to low trustworthiness, e.g., appealing to emotions rather than sticking to the facts, while center media tend to be generally more impartial and trustworthy. We further use several auxiliary tasks, modeling centrality, hyper-partisanship, as well as left-vs.-right bias on a coarse-grained scale. The evaluation results show sizable performance gains by the joint models over models that target the problems in isolation.

pdf bib
FAKTA: An Automatic End-to-End Fact Checking System
Moin Nadeem | Wei Fang | Brian Xu | Mitra Mohtarami | James Glass
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations)

We present FAKTA which is a unified framework that integrates various components of a fact-checking process: document retrieval from media sources with various types of reliability, stance detection of documents with respect to given claims, evidence extraction, and linguistic analysis. FAKTA predicts the factuality of given claims and provides evidence at the document and sentence level to explain its predictions.

pdf bib
Team QCRI-MIT at SemEval-2019 Task 4: Propaganda Analysis Meets Hyperpartisan News Detection
Abdelrhman Saleh | Ramy Baly | Alberto Barrón-Cedeño | Giovanni Da San Martino | Mitra Mohtarami | Preslav Nakov | James Glass
Proceedings of the 13th International Workshop on Semantic Evaluation

We describe our submission to SemEval-2019 Task 4 on Hyperpartisan News Detection. We rely on a variety of engineered features originally used to detect propaganda. This is based on the assumption that biased messages are propagandistic and promote a particular political cause or viewpoint. In particular, we trained a logistic regression model with features ranging from simple bag of words to vocabulary richness and text readability. Our system achieved 72.9% accuracy on the manually annotated testset, and 60.8% on the test data that was obtained with distant supervision. Additional experiments showed that significant performance gains can be achieved with better feature pre-processing.

pdf bib
Improving Neural Language Models by Segmenting, Attending, and Predicting the Future
Hongyin Luo | Lan Jiang | Yonatan Belinkov | James Glass
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Common language models typically predict the next word given the context. In this work, we propose a method that improves language modeling by learning to align the given context and the following phrase. The model does not require any linguistic annotation of phrase segmentation. Instead, we define syntactic heights and phrase segmentation rules, enabling the model to automatically induce phrases, recognize their task-specific heads, and generate phrase embeddings in an unsupervised learning manner. Our method can easily be applied to language models with different network architectures since an independent module is used for phrase induction and context-phrase alignment, and no change is required in the underlying language modeling network. Experiments have shown that our model outperformed several strong baseline models on different data sets. We achieved a new state-of-the-art performance of 17.4 perplexity on the Wikitext-103 dataset. Additionally, visualizing the outputs of the phrase induction module showed that our model is able to learn approximate phrase-level structural knowledge without any annotation.

pdf bib
Analysis Methods in Neural Language Processing: A Survey
Yonatan Belinkov | James Glass
Transactions of the Association for Computational Linguistics, Volume 7

The field of natural language processing has seen impressive progress in recent years, with neural network models replacing many of the traditional systems. A plethora of new models have been proposed, many of which are thought to be opaque compared to their feature-rich counterparts. This has led researchers to analyze, interpret, and evaluate neural networks in novel and more fine-grained ways. In this survey paper, we review analysis methods in neural language processing, categorize them according to prominent research trends, highlight existing limitations, and point to potential directions for future work.

2018

pdf bib
Automatic Stance Detection Using End-to-End Memory Networks
Mitra Mohtarami | Ramy Baly | James Glass | Preslav Nakov | Lluís Màrquez | Alessandro Moschitti
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

We present an effective end-to-end memory network model that jointly (i) predicts whether a given document can be considered as relevant evidence for a given claim, and (ii) extracts snippets of evidence that can be used to reason about the factuality of the target claim. Our model combines the advantages of convolutional and recurrent neural networks as part of a memory network. We further introduce a similarity matrix at the inference level of the memory network in order to extract snippets of evidence for input claims more accurately. Our experiments on a public benchmark dataset, FakeNewsChallenge, demonstrate the effectiveness of our approach.

pdf bib
Supervised and Unsupervised Transfer Learning for Question Answering
Yu-An Chung | Hung-Yi Lee | James Glass
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Although transfer learning has been shown to be successful for tasks like object and speech recognition, its applicability to question answering (QA) has yet to be well-studied. In this paper, we conduct extensive experiments to investigate the transferability of knowledge learned from a source QA dataset to a target dataset using two QA models. The performance of both models on a TOEFL listening comprehension test (Tseng et al., 2016) and MCTest (Richardson et al., 2013) is significantly improved via a simple transfer learning technique from MovieQA (Tapaswi et al., 2016). In particular, one of the models achieves the state-of-the-art on all target datasets; for the TOEFL listening comprehension test, it outperforms the previous best model by 7%. Finally, we show that transfer learning is helpful even in unsupervised scenarios when correct answers for target QA dataset examples are not available.

pdf bib
Integrating Stance Detection and Fact Checking in a Unified Corpus
Ramy Baly | Mitra Mohtarami | James Glass | Lluís Màrquez | Alessandro Moschitti | Preslav Nakov
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

A reasonable approach for fact checking a claim involves retrieving potentially relevant documents from different sources (e.g., news websites, social media, etc.), determining the stance of each document with respect to the claim, and finally making a prediction about the claim’s factuality by aggregating the strength of the stances, while taking the reliability of the source into account. Moreover, a fact checking system should be able to explain its decision by providing relevant extracts (rationales) from the documents. Yet, this setup is not directly supported by existing datasets, which treat fact checking, document retrieval, source credibility, stance detection and rationale extraction as independent tasks. In this paper, we support the interdependencies between these tasks as annotations in the same corpus. We implement this setup on an Arabic fact checking corpus, the first of its kind.

pdf bib
On the Evaluation of Semantic Phenomena in Neural Machine Translation Using Natural Language Inference
Adam Poliak | Yonatan Belinkov | James Glass | Benjamin Van Durme
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

We propose a process for investigating the extent to which sentence representations arising from neural machine translation (NMT) systems encode distinct semantic phenomena. We use these representations as features to train a natural language inference (NLI) classifier based on datasets recast from existing semantic annotations. In applying this process to a representative NMT system, we find its encoder appears most suited to supporting inferences at the syntax-semantics interface, as compared to anaphora resolution requiring world knowledge. We conclude with a discussion on the merits and potential deficiencies of the existing process, and how it may be improved and extended as a broader framework for evaluating semantic coverage

pdf bib
Role-specific Language Models for Processing Recorded Neuropsychological Exams
Tuka Al Hanai | Rhoda Au | James Glass
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

Neuropsychological examinations are an important screening tool for the presence of cognitive conditions (e.g. Alzheimer’s, Parkinson’s Disease), and require a trained tester to conduct the exam through spoken interactions with the subject. While audio is relatively easy to record, it remains a challenge to automatically diarize (who spoke when?), decode (what did they say?), and assess a subject’s cognitive health. This paper demonstrates a method to determine the cognitive health (impaired or not) of 92 subjects, from audio that was diarized using an automatic speech recognition system trained on TED talks and on the structured language used by testers and subjects. Using leave-one-out cross validation and logistic regression modeling we show that even with noisily decoded data (81% WER) we can still perform accurate enough diarization (0.02% confusion rate) to determine the cognitive state of a subject (0.76 AUC).

pdf bib
Language Identification and Morphosyntactic Tagging: The Second VarDial Evaluation Campaign
Marcos Zampieri | Shervin Malmasi | Preslav Nakov | Ahmed Ali | Suwon Shon | James Glass | Yves Scherrer | Tanja Samardžić | Nikola Ljubešić | Jörg Tiedemann | Chris van der Lee | Stefan Grondelaers | Nelleke Oostdijk | Dirk Speelman | Antal van den Bosch | Ritesh Kumar | Bornini Lahiri | Mayank Jain
Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018)

We present the results and the findings of the Second VarDial Evaluation Campaign on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects. The campaign was organized as part of the fifth edition of the VarDial workshop, collocated with COLING’2018. This year, the campaign included five shared tasks, including two task re-runs – Arabic Dialect Identification (ADI) and German Dialect Identification (GDI) –, and three new tasks – Morphosyntactic Tagging of Tweets (MTT), Discriminating between Dutch and Flemish in Subtitles (DFS), and Indo-Aryan Language Identification (ILI). A total of 24 teams submitted runs across the five shared tasks, and contributed 22 system description papers, which were included in the VarDial workshop proceedings and are referred to in this report.

pdf bib
Predicting Factuality of Reporting and Bias of News Media Sources
Ramy Baly | Georgi Karadzhov | Dimitar Alexandrov | James Glass | Preslav Nakov
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

We present a study on predicting the factuality of reporting and bias of news media. While previous work has focused on studying the veracity of claims or documents, here we are interested in characterizing entire news media. This is an under-studied, but arguably important research problem, both in its own right and as a prior for fact-checking systems. We experiment with a large list of news websites and with a rich set of features derived from (i) a sample of articles from the target news media, (ii) its Wikipedia page, (iii) its Twitter account, (iv) the structure of its URL, and (v) information about the Web traffic it attracts. The experimental results show sizable performance gains over the baseline, and reveal the importance of each feature type.

2017

pdf bib
Learning Word-Like Units from Joint Audio-Visual Analysis
David Harwath | James Glass
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Given a collection of images and spoken audio captions, we present a method for discovering word-like acoustic units in the continuous speech signal and grounding them to semantically relevant image regions. For example, our model is able to detect spoken instances of the word ‘lighthouse’ within an utterance and associate them with image regions containing lighthouses. We do not use any form of conventional automatic speech recognition, nor do we use any text transcriptions or conventional linguistic annotations. Our model effectively implements a form of spoken language acquisition, in which the computer learns not only to recognize word categories by sound, but also to enrich the words it learns with semantics by grounding them in images.

pdf bib
What do Neural Machine Translation Models Learn about Morphology?
Yonatan Belinkov | Nadir Durrani | Fahim Dalvi | Hassan Sajjad | James Glass
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Neural machine translation (MT) models obtain state-of-the-art performance while maintaining a simple, end-to-end architecture. However, little is known about what these models learn about source and target languages during the training process. In this work, we analyze the representations learned by neural MT models at various levels of granularity and empirically evaluate the quality of the representations for learning morphology through extrinsic part-of-speech and morphological tagging tasks. We conduct a thorough investigation along several parameters: word-based vs. character-based representations, depth of the encoding layer, the identity of the target language, and encoder vs. decoder representations. Our data-driven, quantitative evaluation sheds light on important aspects in the neural MT system and its ability to capture word structure.

pdf bib
Evaluating Layers of Representation in Neural Machine Translation on Part-of-Speech and Semantic Tagging Tasks
Yonatan Belinkov | Lluís Màrquez | Hassan Sajjad | Nadir Durrani | Fahim Dalvi | James Glass
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

While neural machine translation (NMT) models provide improved translation quality in an elegant framework, it is less clear what they learn about language. Recent work has started evaluating the quality of vector representations learned by NMT models on morphological and syntactic tasks. In this paper, we investigate the representations learned at different layers of NMT encoders. We train NMT systems on parallel data and use the models to extract features for training a classifier on two tasks: part-of-speech and semantic tagging. We then measure the performance of the classifier as a proxy to the quality of the original NMT model for the given task. Our quantitative analysis yields interesting insights regarding representation learning in NMT models. For instance, we find that higher layers are better at learning semantics while lower layers tend to be better for part-of-speech tagging. We also observe little effect of the target language on source-side representations, especially in higher quality models.

2016

pdf bib
Learning Semantic Relatedness in Community Question Answering Using Neural Models
Henry Nassif | Mitra Mohtarami | James Glass
Proceedings of the 1st Workshop on Representation Learning for NLP

pdf bib
A Character-level Convolutional Neural Network for Distinguishing Similar Languages and Dialects
Yonatan Belinkov | James Glass
Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3)

Discriminating between closely-related language varieties is considered a challenging and important task. This paper describes our submission to the DSL 2016 shared-task, which included two sub-tasks: one on discriminating similar languages and one on identifying Arabic dialects. We developed a character-level neural network for this task. Given a sequence of characters, our model embeds each character in vector space, runs the sequence through multiple convolutions with different filter widths, and pools the convolutional representations to obtain a hidden vector representation of the text that is used for predicting the language or dialect. We primarily focused on the Arabic dialect identification task and obtained an F1 score of 0.4834, ranking 6th out of 18 participants. We also analyze errors made by our system on the Arabic data in some detail, and point to challenges such an approach is faced with.

pdf bib
Neural Attention for Learning to Rank Questions in Community Question Answering
Salvatore Romeo | Giovanni Da San Martino | Alberto Barrón-Cedeño | Alessandro Moschitti | Yonatan Belinkov | Wei-Ning Hsu | Yu Zhang | Mitra Mohtarami | James Glass
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

In real-world data, e.g., from Web forums, text is often contaminated with redundant or irrelevant content, which leads to introducing noise in machine learning algorithms. In this paper, we apply Long Short-Term Memory networks with an attention mechanism, which can select important parts of text for the task of similar question retrieval from community Question Answering (cQA) forums. In particular, we use the attention weights for both selecting entire sentences and their subparts, i.e., word/chunk, from shallow syntactic trees. More interestingly, we apply tree kernels to the filtered text representations, thus exploiting the implicit features of the subtree space for learning question reranking. Our results show that the attention-based pruning allows for achieving the top position in the cQA challenge of SemEval 2016, with a relatively large gap from the other participants while greatly decreasing running time.

2015

pdf bib
Arabic Diacritization with Recurrent Neural Networks
Yonatan Belinkov | James Glass
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
VectorSLU: A Continuous Word Vector Approach to Answer Selection in Community Question Answering Systems
Yonatan Belinkov | Mitra Mohtarami | Scott Cyphers | James Glass
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf bib
Unsupervised Lexicon Discovery from Acoustic Input
Chia-ying Lee | Timothy J. O’Donnell | James Glass
Transactions of the Association for Computational Linguistics, Volume 3

We present a model of unsupervised phonological lexicon discovery—the problem of simultaneously learning phoneme-like and word-like units from acoustic input. Our model builds on earlier models of unsupervised phone-like unit discovery from acoustic data (Lee and Glass, 2012), and unsupervised symbolic lexicon discovery using the Adaptor Grammar framework (Johnson et al., 2006), integrating these earlier approaches using a probabilistic model of phonological variation. We show that the model is competitive with state-of-the-art spoken term discovery systems, and present analyses exploring the model’s behavior and the kinds of linguistic structures it learns.

2013

pdf bib
Joint Learning of Phonetic Units and Word Pronunciations for ASR
Chia-ying Lee | Yu Zhang | James Glass
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

2012

pdf bib
A Nonparametric Bayesian Approach to Acoustic Model Discovery
Chia-ying Lee | James Glass
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2009

pdf bib
Syntactic Phrase Reordering for English-to-Arabic Statistical Machine Translation
Ibrahim Badr | Rabih Zbib | James Glass
Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009)

2008

pdf bib
A Multimodal Home Entertainment Interface via a Mobile Device
Alexander Gruenstein | Bo-June Paul Hsu | James Glass | Stephanie Seneff | Lee Hetherington | Scott Cyphers | Ibrahim Badr | Chao Wang | Sean Liu
Proceedings of the ACL-08: HLT Workshop on Mobile Language Processing

pdf bib
N-gram Weighting: Reducing Training Data Mismatch in Cross-Domain Language Model Estimation
Bo-June Paul Hsu | James Glass
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

pdf bib
Segmentation for English-to-Arabic Statistical Machine Translation
Ibrahim Badr | Rabih Zbib | James Glass
Proceedings of ACL-08: HLT, Short Papers

2007

pdf bib
Making Sense of Sound: Unsupervised Topic Segmentation over Acoustic Input
Igor Malioutov | Alex Park | Regina Barzilay | James Glass
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

2006

pdf bib
Style & Topic Language Model Adaptation Using HMM-LDA
Bo-June Paul Hsu | James Glass
Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing

2005

pdf bib
The MIT Spoken Lecture Processing Project
James R. Glass | Timothy J. Hazen | D. Scott Cyphers | Ken Schutte | Alex Park
Proceedings of HLT/EMNLP 2005 Interactive Demonstrations

2004

pdf bib
Analysis and Processing of Lecture Audio Data: Preliminary Investigations
James Glass | Timothy J. Hazen | Lee Hetherington | Chao Wang
Proceedings of the Workshop on Interdisciplinary Approaches to Speech Indexing and Retrieval at HLT-NAACL 2004

pdf bib
Modeling Prosodic Consistency for Automatic Speech Recognition: Preliminary Investigations
Ernest> Pusateri | James Glass
Proceedings of the HLT-NAACL 2004 Workshop on Spoken Language Understanding for Conversational Systems and Higher Level Linguistic Information for Speech Processing

pdf bib
Feature-based Pronunciation Modeling for Speech Recognition
Karen Livescu | James Glass
Proceedings of HLT-NAACL 2004: Short Papers

2003

pdf bib
Flexible and Personalizable Mixed-Initiative Dialogue Systems
James Glass | Stephanie Seneff
Proceedings of the HLT-NAACL 2003 Workshop on Research Directions in Dialogue Processing

1994

pdf bib
PEGASUS: A Spoken Language Interface for On-Line Air Travel Planning
Victor Zue | Stephanie Seneff | Joseph Polifroni | Michael Phillips | Christine Pao | David Goddeau | James Glass | Eric Brill
Human Language Technology: Proceedings of a Workshop held at Plainsboro, New Jersey, March 8-11, 1994

1992

pdf bib
The MIT ATIS System: February 1992 Progress Report
Victor Zue | James Glass | David Goddeau | David Goodine | Lynette Hirschman | Michael Phillips | Joseph Polifroni | Stephanie Seneff
Speech and Natural Language: Proceedings of a Workshop Held at Harriman, New York, February 23-26, 1992

pdf bib
Collection and Analyses of WSJ-CSR Data at MIT
Michael Phillips | James Glass | Joseph Polifroni | Victor Zue
Speech and Natural Language: Proceedings of a Workshop Held at Harriman, New York, February 23-26, 1992

1991

pdf bib
Modelling Context Dependency in Acoustic-Phonetic and Lexical Representations
Michael Phillips | James Glass | Victor Zue
Speech and Natural Language: Proceedings of a Workshop Held at Pacific Grove, California, February 19-22, 1991

pdf bib
Development and Preliminary Evaluation of the MIT ATIS System
Stephanie Seneff | James Glass | David Goddeau | David Goodine | Lynette Hirschman | Hong Leung | Michael Phillips | Joseph Polifroni | Victor Zue
Speech and Natural Language: Proceedings of a Workshop Held at Pacific Grove, California, February 19-22, 1991

1990

pdf bib
Preliminary ATIS Development at MIT
Victor Zue | James Glass | David Goodine | Hong Leung | Michael Phillips | Joseph Polifroni | Stephanie Seneff
Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, Pennsylvania, June 24-27,1990

pdf bib
Recent Progress on the VOYAGER System
Victor Zue | James Glass | David Goodine | Hong Leung | Michael McCandless | Michael Phillips | Joseph Polifroni | Stephanie Seneff
Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, Pennsylvania, June 24-27,1990

pdf bib
Recent Progress on the SUMMIT System
Victor Zue | James Glass | David Goodine | Hong Leung | Michael Phillips | Joseph Polifroni | Stephanie Seneff
Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, Pennsylvania, June 24-27,1990

1989

pdf bib
The MIT SUMMIT Speech Recognition System: A Progress Report
Victor Zue | James Glass | Michael Phillips | Stephanie Seneff
Speech and Natural Language: Proceedings of a Workshop Held at Philadelphia, Pennsylvania, February 21-23, 1989

pdf bib
The VOYAGER Speech Understanding System: A Progress Report
Victor Zue | James Glass | David Goodine | Hong Leung | Michael Phillips | Joseph Polifroni | Stephanie Seneff
Speech and Natural Language: Proceedings of a Workshop Held at Cape Cod, Massachusetts, October 15-18, 1989

pdf bib
The Collection and Preliminary Analysis of a Spontaneous Speech Database
Victor Zue | Nancy Daly | James Glass | David Goodine | Hong Leung | Michael Phillips | Joseph Polifroni | Stephanie Seneff | Michal Soclof
Speech and Natural Language: Proceedings of a Workshop Held at Cape Cod, Massachusetts, October 15-18, 1989

pdf bib
Preliminary Evaluation of the VOYAGER Spoken Language System
Victor Zue | James Glass | David Goodine | Hong Leung | Michael Phillips | Joseph Polifroni | Stephanie Seneff
Speech and Natural Language: Proceedings of a Workshop Held at Cape Cod, Massachusetts, October 15-18, 1989

Search