Chris Callison-Burch


2021

pdf bib
“Wikily” Supervised Neural Translation Tailored to Cross-Lingual Tasks
Mohammad Sadegh Rasooli | Chris Callison-Burch | Derry Tanti Wijaya
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

We present a simple but effective approach for leveraging Wikipedia for neural machine translation as well as cross-lingual tasks of image captioning and dependency parsing without using any direct supervision from external parallel data or supervised models in the target language. We show that first sentences and titles of linked Wikipedia pages, as well as cross-lingual image captions, are strong signals for a seed parallel data to extract bilingual dictionaries and cross-lingual word embeddings for mining parallel text from Wikipedia. Our final model achieves high BLEU scores that are close to or sometimes higher than strong supervised baselines in low-resource languages; e.g. supervised BLEU of 4.0 versus 12.1 from our model in English-to-Kazakh. Moreover, we tailor our wikily translation models to unsupervised image captioning, and cross-lingual dependency parser transfer. In image captioning, we train a multi-tasking machine translation and image captioning pipeline for Arabic and English from which the Arabic training data is a wikily translation of the English captioning data. Our captioning results on Arabic are slightly better than that of its supervised model. In dependency parsing, we translate a large amount of monolingual text, and use it as an artificial training data in an annotation projection framework. We show that our model outperforms recent work on cross-lingual transfer of dependency parsers.

pdf bib
Visual Goal-Step Inference using wikiHow
Yue Yang | Artemis Panagopoulou | Qing Lyu | Li Zhang | Mark Yatskar | Chris Callison-Burch
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Understanding what sequence of steps are needed to complete a goal can help artificial intelligence systems reason about human activities. Past work in NLP has examined the task of goal-step inference for text. We introduce the visual analogue. We propose the Visual Goal-Step Inference (VGSI) task, where a model is given a textual goal and must choose which of four images represents a plausible step towards that goal. With a new dataset harvested from wikiHow consisting of 772,277 images representing human actions, we show that our task is challenging for state-of-the-art multimodal models. Moreover, the multimodal representation learned from our data can be effectively transferred to other datasets like HowTo100m, increasing the VGSI accuracy by 15 - 20%. Our task will facilitate multimodal reasoning about procedural events.

pdf bib
BiSECT: Learning to Split and Rephrase Sentences with Bitexts
Joongwon Kim | Mounica Maddela | Reno Kriz | Wei Xu | Chris Callison-Burch
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

An important task in NLP applications such as sentence simplification is the ability to take a long, complex sentence and split it into shorter sentences, rephrasing as necessary. We introduce a novel dataset and a new model for this ‘split and rephrase’ task. Our BiSECT training data consists of 1 million long English sentences paired with shorter, meaning-equivalent English sentences. We obtain these by extracting 1-2 sentence alignments in bilingual parallel corpora and then using machine translation to convert both sides of the corpus into the same language. BiSECT contains higher quality training examples than the previous Split and Rephrase corpora, with sentence splits that require more significant modifications. We categorize examples in our corpus and use these categories in a novel model that allows us to target specific regions of the input sentence to be split and edited. Moreover, we show that models trained on BiSECT can perform a wider variety of split operations and improve upon previous state-of-the-art approaches in automatic and human evaluations.

pdf bib
GooAQ: Open Question Answering with Diverse Answer Types
Daniel Khashabi | Amos Ng | Tushar Khot | Ashish Sabharwal | Hannaneh Hajishirzi | Chris Callison-Burch
Findings of the Association for Computational Linguistics: EMNLP 2021

While day-to-day questions come with a variety of answer types, the current question-answering (QA) literature has failed to adequately address the answer diversity of questions. To this end, we present GooAQ, a large-scale dataset with a variety of answer types. This dataset contains over 5 million questions and 3 million answers collected from Google. GooAQ questions are collected semi-automatically from the Google search engine using its autocomplete feature. This results in naturalistic questions of practical interest that are nonetheless short and expressed using simple language. GooAQ answers are mined from Google’s responses to our collected questions, specifically from the answer boxes in the search results. This yields a rich space of answer types, containing both textual answers (short and long) as well as more structured ones such as collections. We benchmark T5 models on GooAQ and observe that: (a) in line with recent work, LM’s strong performance on GooAQ’s short-answer questions heavily benefit from annotated data; however, (b) their quality in generating coherent and accurate responses for questions requiring long responses (such as ‘how’ and ‘why’ questions) is less reliant on observing annotated data and mainly supported by their pre-training. We release GooAQ to facilitate further research on improving QA with diverse response types.

pdf bib
TopGuNN: Fast NLP Training Data Augmentation using Large Corpora
Rebecca Iglesias-Flores | Megha Mishra | Ajay Patel | Akanksha Malhotra | Reno Kriz | Martha Palmer | Chris Callison-Burch
Proceedings of the Second Workshop on Data Science with Human in the Loop: Language Advances

Acquiring training data for natural language processing systems can be expensive and time-consuming. Given a few training examples crafted by experts, large corpora can be mined for thousands of semantically similar examples that provide useful variability to improve model generalization. We present TopGuNN, a fast contextualized k-NN retrieval system that can efficiently index and search over contextual embeddings generated from large corpora. TopGuNN is demonstrated for a training data augmentation use case over the Gigaword corpus. Using approximate k-NN and an efficient architecture, TopGuNN performs queries over an embedding space of 4.63TB (approximately 1.5B embeddings) in less than a day.

pdf bib
Goal-Oriented Script Construction
Qing Lyu | Li Zhang | Chris Callison-Burch
Proceedings of the 14th International Conference on Natural Language Generation

The knowledge of scripts, common chains of events in stereotypical scenarios, is a valuable asset for task-oriented natural language understanding systems. We propose the Goal-Oriented Script Construction task, where a model produces a sequence of steps to accomplish a given goal. We pilot our task on the first multilingual script learning dataset supporting 18 languages collected from wikiHow, a website containing half a million how-to articles. For baselines, we consider both a generation-based approach using a language model and a retrieval-based approach by first retrieving the relevant steps from a large candidate pool and then ordering them. We show that our task is practical, feasible but challenging for state-of-the-art Transformer models, and that our methods can be readily deployed for various other datasets and domains with decent zero-shot performance.

pdf bib
Cultural and Geographical Influences on Image Translatability of Words across Languages
Nikzad Khani | Isidora Tourni | Mohammad Sadegh Rasooli | Chris Callison-Burch | Derry Tanti Wijaya
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Neural Machine Translation (NMT) models have been observed to produce poor translations when there are few/no parallel sentences to train the models. In the absence of parallel data, several approaches have turned to the use of images to learn translations. Since images of words, e.g., horse may be unchanged across languages, translations can be identified via images associated with words in different languages that have a high degree of visual similarity. However, translating via images has been shown to improve upon text-only models only marginally. To better understand when images are useful for translation, we study image translatability of words, which we define as the translatability of words via images, by measuring intra- and inter-cluster similarities of image representations of words that are translations of each other. We find that images of words are not always invariant across languages, and that language pairs with shared culture, meaning having either a common language family, ethnicity or religion, have improved image translatability (i.e., have more similar images for similar words) compared to its converse, regardless of their geographic proximity. In addition, in line with previous works that show images help more in translating concrete words, we found that concrete words have improved image translatability compared to abstract ones.

pdf bib
RESIN: A Dockerized Schema-Guided Cross-document Cross-lingual Cross-media Information Extraction and Event Tracking System
Haoyang Wen | Ying Lin | Tuan Lai | Xiaoman Pan | Sha Li | Xudong Lin | Ben Zhou | Manling Li | Haoyu Wang | Hongming Zhang | Xiaodong Yu | Alexander Dong | Zhenhailong Wang | Yi Fung | Piyush Mishra | Qing Lyu | Dídac Surís | Brian Chen | Susan Windisch Brown | Martha Palmer | Chris Callison-Burch | Carl Vondrick | Jiawei Han | Dan Roth | Shih-Fu Chang | Heng Ji
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations

We present a new information extraction system that can automatically construct temporal event graphs from a collection of news documents from multiple sources, multiple languages (English and Spanish for our experiment), and multiple data modalities (speech, text, image and video). The system advances state-of-the-art from two aspects: (1) extending from sentence-level event extraction to cross-document cross-lingual cross-media event extraction, coreference resolution and temporal event tracking; (2) using human curated event schema library to match and enhance the extraction output. We have made the dockerlized system publicly available for research purpose at GitHub, with a demo video.

2020

pdf bib
Automatic Detection of Generated Text is Easiest when Humans are Fooled
Daphne Ippolito | Daniel Duckworth | Chris Callison-Burch | Douglas Eck
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Recent advancements in neural language modelling make it possible to rapidly generate vast amounts of human-sounding text. The capabilities of humans and automatic discriminators to detect machine-generated text have been a large source of research interest, but humans and machines rely on different cues to make their decisions. Here, we perform careful benchmarking and analysis of three popular sampling-based decoding strategies—top-_k_, nucleus sampling, and untruncated random sampling—and show that improvements in decoding methods have primarily optimized for fooling humans. This comes at the expense of introducing statistical abnormalities that make detection easy for automatic systems. We also show that though both human and automatic detector performance improve with longer excerpt length, even multi-sentence excerpts can fool expert human raters over 30% of the time. Our findings reveal the importance of using both human and automatic detectors to assess the humanness of text generation systems.

pdf bib
Toward Better Storylines with Sentence-Level Language Models
Daphne Ippolito | David Grangier | Douglas Eck | Chris Callison-Burch
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We propose a sentence-level language model which selects the next sentence in a story from a finite set of fluent alternatives. Since it does not need to model fluency, the sentence-level language model can focus on longer range dependencies, which are crucial for multi-sentence coherence. Rather than dealing with individual words, our method treats the story so far as a list of pre-trained sentence embeddings and predicts an embedding for the next sentence, which is more efficient than predicting word embeddings. Notably this allows us to consider a large number of candidates for the next sentence during training. We demonstrate the effectiveness of our approach with state-of-the-art accuracy on the unsupervised Story Cloze task and with promising results on larger-scale next sentence prediction tasks.

pdf bib
Intent Detection with WikiHow
Li Zhang | Qing Lyu | Chris Callison-Burch
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing

Modern task-oriented dialog systems need to reliably understand users’ intents. Intent detection is even more challenging when moving to new domains or new languages, since there is little annotated data. To address this challenge, we present a suite of pretrained intent detection models which can predict a broad range of intended goals from many actions because they are trained on wikiHow, a comprehensive instructional website. Our models achieve state-of-the-art results on the Snips dataset, the Schema-Guided Dialogue dataset, and all 3 languages of the Facebook multilingual dialog datasets. Our models also demonstrate strong zero- and few-shot performance, reaching over 75% accuracy using only 100 training examples in all datasets.

pdf bib
Reasoning about Goals, Steps, and Temporal Ordering with WikiHow
Li Zhang | Qing Lyu | Chris Callison-Burch
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

We propose a suite of reasoning tasks on two types of relations between procedural events: goal-step relations (“learn poses” is a step in the larger goal of “doing yoga”) and step-step temporal relations (“buy a yoga mat” typically precedes “learn poses”). We introduce a dataset targeting these two relations based on wikiHow, a website of instructional how-to articles. Our human-validated test set serves as a reliable benchmark for common-sense inference, with a gap of about 10% to 20% between the performance of state-of-the-art transformer models and human performance. Our automatically-generated training set allows models to effectively transfer to out-of-domain tasks requiring knowledge of procedural events, with greatly improved performances on SWAG, Snips, and Story Cloze Test in zero- and few-shot settings.

pdf bib
RoFT: A Tool for Evaluating Human Detection of Machine-Generated Text
Liam Dugan | Daphne Ippolito | Arun Kirubarajan | Chris Callison-Burch
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

In recent years, large neural networks for natural language generation (NLG) have made leaps and bounds in their ability to generate fluent text. However, the tasks of evaluating quality differences between NLG systems and understanding how humans perceive the generated text remain both crucial and difficult. In this system demonstration, we present Real or Fake Text (RoFT), a website that tackles both of these challenges by inviting users to try their hand at detecting machine-generated text in a variety of domains. We introduce a novel evaluation task based on detecting the boundary at which a text passage that starts off human-written transitions to being machine-generated. We show preliminary results of using RoFT to evaluate detection of machine-generated news articles.

pdf bib
Resolving Pronouns in Twitter Streams: Context can Help!
Anietie Andy | Chris Callison-Burch | Derry Tanti Wijaya
Proceedings of the Third Workshop on Computational Models of Reference, Anaphora and Coreference

Many people live-tweet televised events like Presidential debates and popular TV-shows and discuss people or characters in the event. Naturally, many tweets make pronominal reference to these people/characters. We propose an algorithm for resolving personal pronouns that make reference to people involved in an event, in tweet streams collected during the event.

pdf bib
Proceedings of the LREC 2020 Workshop on "Citizen Linguistics in Language Resource Development"
James Fiumara | Christopher Cieri | Mark Liberman | Chris Callison-Burch
Proceedings of the LREC 2020 Workshop on "Citizen Linguistics in Language Resource Development"

2019

pdf bib
Paraphrase-Sense-Tagged Sentences
Anne Cocos | Chris Callison-Burch
Transactions of the Association for Computational Linguistics, Volume 7

Many natural language processing tasks require discriminating the particular meaning of a word in context, but building corpora for developing sense-aware models can be a challenge. We present a large resource of example usages for words having a particular meaning, called Paraphrase-Sense-Tagged Sentences (PSTS). Built on the premise that a word’s paraphrases instantiate its fine-grained meanings (i.e., bug has different meanings corresponding to its paraphrases fly and microbe) the resource contains up to 10,000 sentences for each of 3 million target-paraphrase pairs where the target word takes on the meaning of the paraphrase. We describe an automatic method based on bilingual pivoting used to enumerate sentences for PSTS, and present two models for ranking PSTS sentences based on their quality. Finally, we demonstrate the utility of PSTS by using it to build a dataset for the task of hypernym prediction in context. Training a model on this automatically generated dataset produces accuracy that is competitive with a model trained on smaller datasets crafted with some manual effort.

pdf bib
Comparison of Diverse Decoding Methods from Conditional Language Models
Daphne Ippolito | Reno Kriz | João Sedoc | Maria Kustikova | Chris Callison-Burch
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

While conditional language models have greatly improved in their ability to output high quality natural language, many NLP applications benefit from being able to generate a diverse set of candidate sequences. Diverse decoding strategies aim to, within a given-sized candidate list, cover as much of the space of high-quality outputs as possible, leading to improvements for tasks that rerank and combine candidate outputs. Standard decoding methods, such as beam search, optimize for generating high likelihood sequences rather than diverse ones, though recent work has focused on increasing diversity in these methods. In this work, we perform an extensive survey of decoding-time strategies for generating diverse outputs from a conditional language model. In addition, we present a novel method where we over-sample candidates, then use clustering to remove similar sequences, thus achieving high diversity without sacrificing quality.

pdf bib
PerspectroScope: A Window to the World of Diverse Perspectives
Sihao Chen | Daniel Khashabi | Chris Callison-Burch | Dan Roth
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

This work presents PerspectroScope, a web-based system which lets users query a discussion-worthy natural language claim, and extract and visualize various perspectives in support or against the claim, along with evidence supporting each perspective. The system thus lets users explore various perspectives that could touch upon aspects of the issue at hand.The system is built as a combination of retrieval engines and learned textual-entailment-like classifiers built using a few recent developments in natural language understanding. To make the system more adaptive, expand its coverage, and improve its decisions over time, our platform employs various mechanisms to get corrections from the users. PerspectroScope is available at github.com/CogComp/perspectroscope Web demo link: http://orwell.seas.upenn.edu:4002/ Link to demo video: https://www.youtube.com/watch?v=MXBTR1Sp3Bs

pdf bib
A Comparison of Context-sensitive Models for Lexical Substitution
Aina Garí Soler | Anne Cocos | Marianna Apidianaki | Chris Callison-Burch
Proceedings of the 13th International Conference on Computational Semantics - Long Papers

Word embedding representations provide good estimates of word meaning and give state-of-the art performance in semantic tasks. Embedding approaches differ as to whether and how they account for the context surrounding a word. We present a comparison of different word and context representations on the task of proposing substitutes for a target word in context (lexical substitution). We also experiment with tuning contextualized word embeddings on a dataset of sense-specific instances for each target word. We show that powerful contextualized word representations, which give high performance in several semantics-related tasks, deal less well with the subtle in-context similarity relationships needed for substitution. This is better handled by models trained with this objective in mind, where the inter-dependence between word and context representations is explicitly modeled during training.

pdf bib
Unsupervised Hierarchical Story Infilling
Daphne Ippolito | David Grangier | Chris Callison-Burch | Douglas Eck
Proceedings of the First Workshop on Narrative Understanding

Story infilling involves predicting words to go into a missing span from a story. This challenging task has the potential to transform interactive tools for creative writing. However, state-of-the-art conditional language models have trouble balancing fluency and coherence with novelty and diversity. We address this limitation with a hierarchical model which first selects a set of rare words and then generates text conditioned on that set. By relegating the high entropy task of picking rare words to a word-sampling model, the second-stage model conditioned on those words can achieve high fluency and coherence by searching for likely sentences, without sacrificing diversity.

pdf bib
Winter is here: Summarizing Twitter Streams related to Pre-Scheduled Events
Anietie Andy | Derry Tanti Wijaya | Chris Callison-Burch
Proceedings of the Second Workshop on Storytelling

Pre-scheduled events, such as TV shows and sports games, usually garner considerable attention from the public. Twitter captures large volumes of discussions and messages related to these events, in real-time. Twitter streams related to pre-scheduled events are characterized by the following: (1) spikes in the volume of published tweets reflect the highlights of the event and (2) some of the published tweets make reference to the characters involved in the event, in the context in which they are currently portrayed in a subevent. In this paper, we take advantage of these characteristics to identify the highlights of pre-scheduled events from tweet streams and we demonstrate a method to summarize these highlights. We evaluate our algorithm on tweets collected around 2 episodes of a popular TV show, Game of Thrones, Season 7.

pdf bib
Seeing Things from a Different Angle:Discovering Diverse Perspectives about Claims
Sihao Chen | Daniel Khashabi | Wenpeng Yin | Chris Callison-Burch | Dan Roth
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

One key consequence of the information revolution is a significant increase and a contamination of our information supply. The practice of fact checking won’t suffice to eliminate the biases in text data we observe, as the degree of factuality alone does not determine whether biases exist in the spectrum of opinions visible to us. To better understand controversial issues, one needs to view them from a diverse yet comprehensive set of perspectives. For example, there are many ways to respond to a claim such as “animals should have lawful rights”, and these responses form a spectrum of perspectives, each with a stance relative to this claim and, ideally, with evidence supporting it. Inherently, this is a natural language understanding task, and we propose to address it as such. Specifically, we propose the task of substantiated perspective discovery where, given a claim, a system is expected to discover a diverse set of well-corroborated perspectives that take a stance with respect to the claim. Each perspective should be substantiated by evidence paragraphs which summarize pertinent results and facts. We construct PERSPECTRUM, a dataset of claims, perspectives and evidence, making use of online debate websites to create the initial data collection, and augmenting it using search engines in order to expand and diversify our dataset. We use crowd-sourcing to filter out noise and ensure high-quality data. Our dataset contains 1k claims, accompanied with pools of 10k and 8k perspective sentences and evidence paragraphs, respectively. We provide a thorough analysis of the dataset to highlight key underlying language understanding challenges, and show that human baselines across multiple subtasks far outperform ma-chine baselines built upon state-of-the-art NLP techniques. This poses a challenge and opportunity for the NLP community to address.

pdf bib
Complexity-Weighted Loss and Diverse Reranking for Sentence Simplification
Reno Kriz | João Sedoc | Marianna Apidianaki | Carolina Zheng | Gaurav Kumar | Eleni Miltsakaki | Chris Callison-Burch
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Sentence simplification is the task of rewriting texts so they are easier to understand. Recent research has applied sequence-to-sequence (Seq2Seq) models to this task, focusing largely on training-time improvements via reinforcement learning and memory augmentation. One of the main problems with applying generic Seq2Seq models for simplification is that these models tend to copy directly from the original sentence, resulting in outputs that are relatively long and complex. We aim to alleviate this issue through the use of two main techniques. First, we incorporate content word complexities, as predicted with a leveled word complexity model, into our loss function during training. Second, we generate a large set of diverse candidate simplifications at test time, and rerank these to promote fluency, adequacy, and simplicity. Here, we measure simplicity through a novel sentence complexity model. These extensions allow our models to perform competitively with state-of-the-art systems while generating simpler sentences. We report standard automatic and human evaluation metrics.

pdf bib
ChatEval: A Tool for Chatbot Evaluation
João Sedoc | Daphne Ippolito | Arun Kirubarajan | Jai Thirani | Lyle Ungar | Chris Callison-Burch
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations)

Open-domain dialog systems (i.e. chatbots) are difficult to evaluate. The current best practice for analyzing and comparing these dialog systems is the use of human judgments. However, the lack of standardization in evaluation procedures, and the fact that model parameters and code are rarely published hinder systematic human evaluation experiments. We introduce a unified framework for human evaluation of chatbots that augments existing tools and provides a web-based hub for researchers to share and compare their dialog systems. Researchers can submit their trained models to the ChatEval web interface and obtain comparisons with baselines and prior work. The evaluation code is open-source to ensure standardization and transparency. In addition, we introduce open-source baseline models and evaluation datasets. ChatEval can be found at https://chateval.org.

2018

pdf bib
Learning Scalar Adjective Intensity from Paraphrases
Anne Cocos | Skyler Wharton | Ellie Pavlick | Marianna Apidianaki | Chris Callison-Burch
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Adjectives like “warm”, “hot”, and “scalding” all describe temperature but differ in intensity. Understanding these differences between adjectives is a necessary part of reasoning about natural language. We propose a new paraphrase-based method to automatically learn the relative intensity relation that holds between a pair of scalar adjectives. Our approach analyzes over 36k adjectival pairs from the Paraphrase Database under the assumption that, for example, paraphrase pair “really hot” <–> “scalding” suggests that “hot” < “scalding”. We show that combining this paraphrase evidence with existing, complementary pattern- and lexicon-based approaches improves the quality of systems for automatically ordering sets of scalar adjectives and inferring the polarity of indirect answers to “yes/no” questions.

pdf bib
Magnitude: A Fast, Efficient Universal Vector Embedding Utility Package
Ajay Patel | Alexander Sands | Chris Callison-Burch | Marianna Apidianaki
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Vector space embedding models like word2vec, GloVe, and fastText are extremely popular representations in natural language processing (NLP) applications. We present Magnitude, a fast, lightweight tool for utilizing and processing embeddings. Magnitude is an open source Python package with a compact vector storage file format that allows for efficient manipulation of huge numbers of embeddings. Magnitude performs common operations up to 60 to 6,000 times faster than Gensim. Magnitude introduces several novel features for improved robustness like out-of-vocabulary lookups.

pdf bib
Learning Translations via Images with a Massively Multilingual Image Dataset
John Hewitt | Daphne Ippolito | Brendan Callahan | Reno Kriz | Derry Tanti Wijaya | Chris Callison-Burch
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We conduct the most comprehensive study to date into translating words via images. To facilitate research on the task, we introduce a large-scale multilingual corpus of images, each labeled with the word it represents. Past datasets have been limited to only a few high-resource languages and unrealistically easy translation settings. In contrast, we have collected by far the largest available dataset for this task, with images for approximately 10,000 words in each of 100 languages. We run experiments on a dozen high resource languages and 20 low resources languages, demonstrating the effect of word concreteness and part-of-speech on translation quality. %We find that while image features work best for concrete nouns, they are sometimes effective on other parts of speech. To improve image-based translation, we introduce a novel method of predicting word concreteness from images, which improves on a previous state-of-the-art unsupervised technique. This allows us to predict when image-based translation may be effective, enabling consistent improvements to a state-of-the-art text-based word translation system. Our code and the Massively Multilingual Image Dataset (MMID) are available at http://multilingual-images.org/.

pdf bib
Introducing NIEUW: Novel Incentives and Workflows for Eliciting Linguistic Data
Christopher Cieri | James Fiumara | Mark Liberman | Chris Callison-Burch | Jonathan Wright
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
ChatEval: A Tool for the Systematic Evaluation of Chatbots
João Sedoc | Daphne Ippolito | Arun Kirubarajan | Jai Thirani | Lyle Ungar | Chris Callison-Burch
Proceedings of the Workshop on Intelligent Interactive Systems and Language Generation (2IS&NLG)

pdf bib
Simplification Using Paraphrases and Context-Based Lexical Substitution
Reno Kriz | Eleni Miltsakaki | Marianna Apidianaki | Chris Callison-Burch
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Lexical simplification involves identifying complex words or phrases that need to be simplified, and recommending simpler meaning-preserving substitutes that can be more easily understood. We propose a complex word identification (CWI) model that exploits both lexical and contextual features, and a simplification mechanism which relies on a word-embedding lexical substitution model to replace the detected complex words with simpler paraphrases. We compare our CWI and lexical simplification models to several baselines, and evaluate the performance of our simplification system against human judgments. The results show that our models are able to detect complex words with higher accuracy than other commonly used methods, and propose good simplification substitutes in context. They also highlight the limited contribution of context features for CWI, which nonetheless improve simplification compared to context-unaware models.

pdf bib
Comparing Constraints for Taxonomic Organization
Anne Cocos | Marianna Apidianaki | Chris Callison-Burch
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Building a taxonomy from the ground up involves several sub-tasks: selecting terms to include, predicting semantic relations between terms, and selecting a subset of relational instances to keep, given constraints on the taxonomy graph. Methods for this final step – taxonomic organization – vary both in terms of the constraints they impose, and whether they enable discovery of synonymous terms. It is hard to isolate the impact of these factors on the quality of the resulting taxonomy because organization methods are rarely compared directly. In this paper, we present a head-to-head comparison of six taxonomic organization algorithms that vary with respect to their structural and transitivity constraints, and treatment of synonymy. We find that while transitive algorithms out-perform their non-transitive counterparts, the top-performing transitive algorithm is prohibitively slow for taxonomies with as few as 50 entities. We propose a simple modification to a non-transitive optimum branching algorithm to explicitly incorporate synonymy, resulting in a method that is substantially faster than the best transitive algorithm while giving complementary performance.

pdf bib
Automated Paraphrase Lattice Creation for HyTER Machine Translation Evaluation
Marianna Apidianaki | Guillaume Wisniewski | Anne Cocos | Chris Callison-Burch
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

We propose a variant of a well-known machine translation (MT) evaluation metric, HyTER (Dreyer and Marcu, 2012), which exploits reference translations enriched with meaning equivalent expressions. The original HyTER metric relied on hand-crafted paraphrase networks which restricted its applicability to new data. We test, for the first time, HyTER with automatically built paraphrase lattices. We show that although the metric obtains good results on small and carefully curated data with both manually and automatically selected substitutes, it achieves medium performance on much larger and noisier datasets, demonstrating the limits of the metric for tuning and evaluation of current MT systems.

2017

pdf bib
Learning Translations via Matrix Completion
Derry Tanti Wijaya | Brendan Callahan | John Hewitt | Jie Gao | Xiao Ling | Marianna Apidianaki | Chris Callison-Burch
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Bilingual Lexicon Induction is the task of learning word translations without bilingual parallel corpora. We model this task as a matrix completion problem, and present an effective and extendable framework for completing the matrix. This method harnesses diverse bilingual and monolingual signals, each of which may be incomplete or noisy. Our model achieves state-of-the-art performance for both high and low resource languages.

pdf bib
KnowYourNyms? A Game of Semantic Relationships
Ross Mechanic | Dean Fulgoni | Hannah Cutler | Sneha Rajana | Zheyuan Liu | Bradley Jackson | Anne Cocos | Chris Callison-Burch | Marianna Apidianaki
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Semantic relation knowledge is crucial for natural language understanding. We introduce “KnowYourNyms?”, a web-based game for learning semantic relations. While providing users with an engaging experience, the application collects large amounts of data that can be used to improve semantic relation classifiers. The data also broadly informs us of how people perceive the relationships between words, providing useful insights for research in psychology and linguistics.

pdf bib
Word Sense Filtering Improves Embedding-Based Lexical Substitution
Anne Cocos | Marianna Apidianaki | Chris Callison-Burch
Proceedings of the 1st Workshop on Sense, Concept and Entity Representations and their Applications

The role of word sense disambiguation in lexical substitution has been questioned due to the high performance of vector space models which propose good substitutes without explicitly accounting for sense. We show that a filtering mechanism based on a sense inventory optimized for substitutability can improve the results of these models. Our sense inventory is constructed using a clustering method which generates paraphrase clusters that are congruent with lexical substitution annotations in a development set. The results show that lexical substitution can still benefit from senses which can improve the output of vector space paraphrase ranking models.

pdf bib
Constructing an Alias List for Named Entities during an Event
Anietie Andy | Mark Dredze | Mugizi Rwebangira | Chris Callison-Burch
Proceedings of the 3rd Workshop on Noisy User-generated Text

In certain fields, real-time knowledge from events can help in making informed decisions. In order to extract pertinent real-time knowledge related to an event, it is important to identify the named entities and their corresponding aliases related to the event. The problem of identifying aliases of named entities that spike has remained unexplored. In this paper, we introduce an algorithm, EntitySpike, that identifies entities that spike in popularity in tweets from a given time period, and constructs an alias list for these spiked entities. EntitySpike uses a temporal heuristic to identify named entities with similar context that occur in the same time period (within minutes) during an event. Each entity is encoded as a vector using this temporal heuristic. We show how these entity-vectors can be used to create a named entity alias list. We evaluated our algorithm on a dataset of temporally ordered tweets from a single event, the 2013 Grammy Awards show. We carried out various experiments on tweets that were published in the same time period and show that our algorithm identifies most entity name aliases and outperforms a competitive baseline.

pdf bib
Systematically Adapting Machine Translation for Grammatical Error Correction
Courtney Napoles | Chris Callison-Burch
Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications

n this work we adapt machine translation (MT) to grammatical error correction, identifying how components of the statistical MT pipeline can be modified for this task and analyzing how each modification impacts system performance. We evaluate the contribution of each of these components with standard evaluation metrics and automatically characterize the morphological and lexical transformations made in system output. Our model rivals the current state of the art using a fraction of the training data.

pdf bib
The Language of Place: Semantic Value from Geospatial Context
Anne Cocos | Chris Callison-Burch
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

There is a relationship between what we say and where we say it. Word embeddings are usually trained assuming that semantically-similar words occur within the same textual contexts. We investigate the extent to which semantically-similar words occur within the same geospatial contexts. We enrich a corpus of geolocated Twitter posts with physical data derived from Google Places and OpenStreetMap, and train word embeddings using the resulting geospatial contexts. Intrinsic evaluation of the resulting vectors shows that geographic context alone does provide useful information about semantic relatedness.

pdf bib
Learning Antonyms with Paraphrases and a Morphology-Aware Neural Network
Sneha Rajana | Chris Callison-Burch | Marianna Apidianaki | Vered Shwartz
Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017)

Recognizing and distinguishing antonyms from other types of semantic relations is an essential part of language understanding systems. In this paper, we present a novel method for deriving antonym pairs using paraphrase pairs containing negation markers. We further propose a neural network model, AntNET, that integrates morphological features indicative of antonymy into a path-based relation detection algorithm. We demonstrate that our model outperforms state-of-the-art models in distinguishing antonyms from other semantic relations and is capable of efficiently handling multi-word expressions.

pdf bib
Mapping the Paraphrase Database to WordNet
Anne Cocos | Marianna Apidianaki | Chris Callison-Burch
Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017)

WordNet has facilitated important research in natural language processing but its usefulness is somewhat limited by its relatively small lexical coverage. The Paraphrase Database (PPDB) covers 650 times more words, but lacks the semantic structure of WordNet that would make it more directly useful for downstream tasks. We present a method for mapping words from PPDB to WordNet synsets with 89% accuracy. The mapping also lays important groundwork for incorporating WordNet’s relations into PPDB so as to increase its utility for semantic reasoning in applications.

pdf bib
A Comprehensive Analysis of Bilingual Lexicon Induction
Ann Irvine | Chris Callison-Burch
Computational Linguistics, Volume 43, Issue 2 - June 2017

Bilingual lexicon induction is the task of inducing word translations from monolingual corpora in two languages. In this article we present the most comprehensive analysis of bilingual lexicon induction to date. We present experiments on a wide range of languages and data sizes. We examine translation into English from 25 foreign languages: Albanian, Azeri, Bengali, Bosnian, Bulgarian, Cebuano, Gujarati, Hindi, Hungarian, Indonesian, Latvian, Nepali, Romanian, Serbian, Slovak, Somali, Spanish, Swedish, Tamil, Telugu, Turkish, Ukrainian, Uzbek, Vietnamese, and Welsh. We analyze the behavior of bilingual lexicon induction on low-frequency words, rather than testing solely on high-frequency words, as previous research has done. Low-frequency words are more relevant to statistical machine translation, where systems typically lack translations of rare words that fall outside of their training data. We systematically explore a wide range of features and phenomena that affect the quality of the translations discovered by bilingual lexicon induction. We provide illustrative examples of the highest ranking translations for orthogonal signals of translation equivalence like contextual similarity and temporal similarity. We analyze the effects of frequency and burstiness, and the sizes of the seed bilingual dictionaries and the monolingual training corpora. Additionally, we introduce a novel discriminative approach to bilingual lexicon induction. Our discriminative model is capable of combining a wide variety of features that individually provide only weak indications of translation equivalence. When feature weights are discriminatively set, these signals produce dramatically higher translation quality than previous approaches that combined signals in an unsupervised fashion (e.g., using minimum reciprocal rank). We also directly compare our model’s performance against a sophisticated generative approach, the matching canonical correlation analysis (MCCA) algorithm used by Haghighi et al. (2008). Our algorithm achieves an accuracy of 42% versus MCCA’s 15%.

2016

pdf bib
The Gun Violence Database: A new task and data set for NLP
Ellie Pavlick | Heng Ji | Xiaoman Pan | Chris Callison-Burch
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Tense Manages to Predict Implicative Behavior in Verbs
Ellie Pavlick | Chris Callison-Burch
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
So-Called Non-Subsective Adjectives
Ellie Pavlick | Chris Callison-Burch
Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics

pdf bib
Clustering Paraphrases by Word Sense
Anne Cocos | Chris Callison-Burch
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Sentential Paraphrasing as Black-Box Machine Translation
Courtney Napoles | Chris Callison-Burch | Matt Post
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations

pdf bib
Optimizing Statistical Machine Translation for Text Simplification
Wei Xu | Courtney Napoles | Ellie Pavlick | Quanze Chen | Chris Callison-Burch
Transactions of the Association for Computational Linguistics, Volume 4

Most recent sentence simplification systems use basic machine translation models to learn lexical and syntactic paraphrases from a manually simplified parallel corpus. These methods are limited by the quality and quantity of manually simplified corpora, which are expensive to build. In this paper, we conduct an in-depth adaptation of statistical machine translation to perform text simplification, taking advantage of large-scale paraphrases learned from bilingual texts and a small amount of manual simplifications with multiple references. Our work is the first to design automatic metrics that are effective for tuning and evaluating simplification systems, which will facilitate iterative development for this task.

pdf bib
Most “babies” are “little” and most “problems” are “huge”: Compositional Entailment in Adjective-Nouns
Ellie Pavlick | Chris Callison-Burch
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Simple PPDB: A Paraphrase Database for Simplification
Ellie Pavlick | Chris Callison-Burch
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2015

pdf bib
Ideological Perspective Detection Using Semantic Features
Heba Elfardy | Mona Diab | Chris Callison-Burch
Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics

pdf bib
SemEval-2015 Task 1: Paraphrase and Semantic Similarity in Twitter (PIT)
Wei Xu | Chris Callison-Burch | Bill Dolan
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf bib
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing
Lluís Màrquez | Chris Callison-Burch | Jian Su
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Problems in Current Text Simplification Research: New Data Can Help
Wei Xu | Chris Callison-Burch | Courtney Napoles
Transactions of the Association for Computational Linguistics, Volume 3

Simple Wikipedia has dominated simplification research in the past 5 years. In this opinion paper, we argue that focusing on Wikipedia limits simplification research. We back up our arguments with corpus analysis and by highlighting statements that other researchers have made in the simplification literature. We introduce a new simplification dataset that is a significant improvement over Simple Wikipedia, and present a novel quantitative-comparative approach to study the quality of simplification data resources.

pdf bib
Cost Optimization in Crowdsourcing Translation: Low cost translations made even cheaper
Mingkun Gao | Wei Xu | Chris Callison-Burch
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Crowdsourcing for NLP
Chris Callison-Burch | Lyle Ungar | Ellie Pavlick
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Tutorial Abstracts

pdf bib
Adding Semantics to Data-Driven Paraphrasing
Ellie Pavlick | Johan Bos | Malvina Nissim | Charley Beller | Benjamin Van Durme | Chris Callison-Burch
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf bib
Domain-Specific Paraphrase Extraction
Ellie Pavlick | Juri Ganitkevitch | Tsz Ping Chan | Xuchen Yao | Benjamin Van Durme | Chris Callison-Burch
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

pdf bib
FrameNet+: Fast Paraphrastic Tripling of FrameNet
Ellie Pavlick | Travis Wolfe | Pushpendre Rastogi | Chris Callison-Burch | Mark Dredze | Benjamin Van Durme
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

pdf bib
PPDB 2.0: Better paraphrase ranking, fine-grained entailment relations, word embeddings, and style classification
Ellie Pavlick | Pushpendre Rastogi | Juri Ganitkevitch | Benjamin Van Durme | Chris Callison-Burch
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

pdf bib
Automatically Scoring Freshman Writing: A Preliminary Investigation
Courtney Napoles | Chris Callison-Burch
Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
Effectively Crowdsourcing Radiology Report Annotations
Anne Cocos | Aaron Masino | Ting Qian | Ellie Pavlick | Chris Callison-Burch
Proceedings of the Sixth International Workshop on Health Text Mining and Information Analysis

2014

pdf bib
Translations of the Callhome Egyptian Arabic corpus for conversational speech translation
Gaurav Kumar | Yuan Cao | Ryan Cotterell | Chris Callison-Burch | Daniel Povey | Sanjeev Khudanpur
Proceedings of the 11th International Workshop on Spoken Language Translation: Papers

Translation of the output of automatic speech recognition (ASR) systems, also known as speech translation, has received a lot of research interest recently. This is especially true for programs such as DARPA BOLT which focus on improving spontaneous human-human conversation across languages. However, this research is hindered by the dearth of datasets developed for this explicit purpose. For Egyptian Arabic-English, in particular, no parallel speechtranscription-translation dataset exists in the same domain. In order to support research in speech translation, we introduce the Callhome Egyptian Arabic-English Speech Translation Corpus. This supplements the existing LDC corpus with four reference translations for each utterance in the transcripts. The result is a three-way parallel dataset of Egyptian Arabic Speech, transcriptions and English translations.

pdf bib
Hallucinating Phrase Translations for Low Resource MT
Ann Irvine | Chris Callison-Burch
Proceedings of the Eighteenth Conference on Computational Natural Language Learning

pdf bib
Using Comparable Corpora to Adapt MT Models to New Domains
Ann Irvine | Chris Callison-Burch
Proceedings of the Ninth Workshop on Statistical Machine Translation

pdf bib
The Language Demographics of Amazon Mechanical Turk
Ellie Pavlick | Matt Post | Ann Irvine | Dmitry Kachaev | Chris Callison-Burch
Transactions of the Association for Computational Linguistics, Volume 2

We present a large scale study of the languages spoken by bilingual workers on Mechanical Turk (MTurk). We establish a methodology for determining the language skills of anonymous crowd workers that is more robust than simple surveying. We validate workers’ self-reported language skill claims by measuring their ability to correctly translate words, and by geolocating workers to see if they reside in countries where the languages are likely to be spoken. Rather than posting a one-off survey, we posted paid tasks consisting of 1,000 assignments to translate a total of 10,000 words in each of 100 languages. Our study ran for several months, and was highly visible on the MTurk crowdsourcing platform, increasing the chances that bilingual workers would complete it. Our study was useful both to create bilingual dictionaries and to act as census of the bilingual speakers on MTurk. We use this data to recommend languages with the largest speaker populations as good candidates for other researchers who want to develop crowdsourced, multilingual technologies. To further demonstrate the value of creating data via crowdsourcing, we hire workers to create bilingual parallel corpora in six Indian languages, and use them to train statistical machine translation systems.

pdf bib
Extracting Lexically Divergent Paraphrases from Twitter
Wei Xu | Alan Ritter | Chris Callison-Burch | William B. Dolan | Yangfeng Ji
Transactions of the Association for Computational Linguistics, Volume 2

We present MultiP (Multi-instance Learning Paraphrase Model), a new model suited to identify paraphrases within the short messages on Twitter. We jointly model paraphrase relations between word and sentence pairs and assume only sentence-level annotations during learning. Using this principled latent variable model alone, we achieve the performance competitive with a state-of-the-art method which combines a latent space model with a feature-based supervised classifier. Our model also captures lexically divergent paraphrases that differ from yet complement previous methods; combining our model with previous work significantly outperforms the state-of-the-art. In addition, we present a novel annotation methodology that has allowed us to crowdsource a paraphrase corpus from Twitter. We make this new dataset available to the research community.

pdf bib
Arabic Dialect Identification
Omar F. Zaidan | Chris Callison-Burch
Computational Linguistics, Volume 40, Issue 1 - March 2014

pdf bib
Are Two Heads Better than One? Crowdsourced Translation via a Two-Step Collaboration of Non-Professional Translators and Editors
Rui Yan | Mingkun Gao | Ellie Pavlick | Chris Callison-Burch
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
A Multi-Dialect, Multi-Genre Corpus of Informal Written Arabic
Ryan Cotterell | Chris Callison-Burch
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper presents a multi-dialect, multi-genre, human annotated corpus of dialectal Arabic. We collected utterances in five Arabic dialects: Levantine, Gulf, Egyptian, Iraqi and Maghrebi. We scraped newspaper websites for user commentary and Twitter for two distinct types of dialectal content. To the best of the authors’ knowledge, this work is the most diverse corpus of dialectal Arabic in both the source of the content and the number of dialects. Every utterance in the corpus was human annotated on Amazon’s Mechanical Turk; this stands in contrast to Al-Sabbagh and Girju (2012) where only a small subset was human annotated in order to train a classifier to automatically annotate the remainder of the corpus. We provide a discussion of the methodology used for the annotation in addition to the performance of the individual workers. We extend the Arabic dialect identification task to the Iraqi and Maghrebi dialects and improve the results of Zaidan and Callison-Burch (2011a) on Levantine, Gulf and Egyptian.

pdf bib
The Multilingual Paraphrase Database
Juri Ganitkevitch | Chris Callison-Burch
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We release a massive expansion of the paraphrase database (PPDB) that now includes a collection of paraphrases in 23 different languages. The resource is derived from large volumes of bilingual parallel data. Our collection is extracted and ranked using state of the art methods. The multilingual PPDB has over a billion paraphrase pairs in total, covering the following languages: Arabic, Bulgarian, Chinese, Czech, Dutch, Estonian, Finnish, French, German, Greek, Hungarian, Italian, Latvian, Lithuanian, Polish, Portugese, Romanian, Russian, Slovak, Slovenian, and Swedish.

pdf bib
The American Local News Corpus
Ann Irvine | Joshua Langfus | Chris Callison-Burch
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We present the American Local News Corpus (ALNC), containing over 4 billion words of text from 2,652 online newspapers in the United States. Each article in the corpus is associated with a timestamp, state, and city. All 50 U.S. states and 1,924 cities are represented. We detail our method for taking daily snapshots of thousands of local and national newspapers and present two example corpus analyses. The first explores how different sports are talked about over time and geography. The second compares per capita murder rates with news coverage of murders across the 50 states. The ALNC is about the same size as the Gigaword corpus and is growing continuously. Version 1.0 is available for research use.

pdf bib
PARADIGM: Paraphrase Diagnostics through Grammar Matching
Jonathan Weese | Juri Ganitkevitch | Chris Callison-Burch
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

2013

pdf bib
Semi-Markov Phrase-Based Monolingual Alignment
Xuchen Yao | Benjamin Van Durme | Chris Callison-Burch | Peter Clark
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
Dirt Cheap Web-Scale Parallel Text from the Common Crawl
Jason R. Smith | Herve Saint-Amand | Magdalena Plamada | Philipp Koehn | Chris Callison-Burch | Adam Lopez
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
PARMA: A Predicate Argument Aligner
Travis Wolfe | Benjamin Van Durme | Mark Dredze | Nicholas Andrews | Charley Beller | Chris Callison-Burch | Jay DeYoung | Justin Snyder | Jonathan Weese | Tan Xu | Xuchen Yao
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
A Lightweight and High Performance Monolingual Word Aligner
Xuchen Yao | Benjamin Van Durme | Chris Callison-Burch | Peter Clark
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Learning to translate with products of novices: a suite of open-ended challenge problems for teaching MT
Adam Lopez | Matt Post | Chris Callison-Burch | Jonathan Weese | Juri Ganitkevitch | Narges Ahmidi | Olivia Buzek | Leah Hanson | Beenish Jamil | Matthias Lee | Ya-Ting Lin | Henry Pao | Fatima Rivera | Leili Shahriyari | Debu Sinha | Adam Teichert | Stephen Wampler | Michael Weinberger | Daguang Xu | Lin Yang | Shang Zhao
Transactions of the Association for Computational Linguistics, Volume 1

Machine translation (MT) draws from several different disciplines, making it a complex subject to teach. There are excellent pedagogical texts, but problems in MT and current algorithms for solving them are best learned by doing. As a centerpiece of our MT course, we devised a series of open-ended challenges for students in which the goal was to improve performance on carefully constrained instances of four key MT tasks: alignment, decoding, evaluation, and reranking. Students brought a diverse set of techniques to the problems, including some novel solutions which performed remarkably well. A surprising and exciting outcome was that student solutions or their combinations fared competitively on some tasks, demonstrating that even newcomers to the field can help improve the state-of-the-art on hard NLP problems while simultaneously learning a great deal. The problems, baseline code, and results are freely available.

pdf bib
Supervised Bilingual Lexicon Induction with Multiple Monolingual Signals
Ann Irvine | Chris Callison-Burch
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
PPDB: The Paraphrase Database
Juri Ganitkevitch | Benjamin Van Durme | Chris Callison-Burch
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Answer Extraction as Sequence Tagging with Tree Edit Distance
Xuchen Yao | Benjamin Van Durme | Chris Callison-Burch | Peter Clark
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Improved speech-to-text translation with the Fisher and Callhome Spanish-English speech translation corpus
Matt Post | Gaurav Kumar | Adam Lopez | Damianos Karakos | Chris Callison-Burch | Sanjeev Khudanpur
Proceedings of the 10th International Workshop on Spoken Language Translation: Papers

Research into the translation of the output of automatic speech recognition (ASR) systems is hindered by the dearth of datasets developed for that explicit purpose. For SpanishEnglish translation, in particular, most parallel data available exists only in vastly different domains and registers. In order to support research on cross-lingual speech applications, we introduce the Fisher and Callhome Spanish-English Speech Translation Corpus, supplementing existing LDC audio and transcripts with (a) ASR 1-best, lattice, and oracle output produced by the Kaldi recognition system and (b) English translations obtained on Amazon’s Mechanical Turk. The result is a four-way parallel dataset of Spanish audio, transcriptions, ASR lattices, and English translations of approximately 38 hours of speech, with defined training, development, and held-out test sets. We conduct baseline machine translation experiments using models trained on the provided training data, and validate the dataset by corroborating a number of known results in the field, including the utility of in-domain (information, conversational) training data, increased performance translating lattices (instead of recognizer 1-best output), and the relationship between word error rate and BLEU score.

pdf bib
Proceedings of the Eighth Workshop on Statistical Machine Translation
Ondrej Bojar | Christian Buck | Chris Callison-Burch | Barry Haddow | Philipp Koehn | Christof Monz | Matt Post | Herve Saint-Amand | Radu Soricut | Lucia Specia
Proceedings of the Eighth Workshop on Statistical Machine Translation

pdf bib
Findings of the 2013 Workshop on Statistical Machine Translation
Ondřej Bojar | Christian Buck | Chris Callison-Burch | Christian Federmann | Barry Haddow | Philipp Koehn | Christof Monz | Matt Post | Radu Soricut | Lucia Specia
Proceedings of the Eighth Workshop on Statistical Machine Translation

pdf bib
Joshua 5.0: Sparser, Better, Faster, Server
Matt Post | Juri Ganitkevitch | Luke Orland | Jonathan Weese | Yuan Cao | Chris Callison-Burch
Proceedings of the Eighth Workshop on Statistical Machine Translation

pdf bib
Combining Bilingual and Comparable Corpora for Low Resource Machine Translation
Ann Irvine | Chris Callison-Burch
Proceedings of the Eighth Workshop on Statistical Machine Translation

2012

pdf bib
Monolingual Distributional Similarity for Text-to-Text Generation
Juri Ganitkevitch | Benjamin Van Durme | Chris Callison-Burch
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

pdf bib
Toward Statistical Machine Translation without Parallel Corpora
Alexandre Klementiev | Ann Irvine | Chris Callison-Burch | David Yarowsky
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Modality and Negation in SIMT Use of Modality and Negation in Semantically-Informed Syntactic MT
Kathryn Baker | Michael Bloodgood | Bonnie J. Dorr | Chris Callison-Burch | Nathaniel W. Filardo | Christine Piatko | Lori Levin | Scott Miller
Computational Linguistics, Volume 38, Issue 2 - June 2012

pdf bib
Processing Informal, Romanized Pakistani Text Messages
Ann Irvine | Jonathan Weese | Chris Callison-Burch
Proceedings of the Second Workshop on Language in Social Media

pdf bib
Proceedings of the Seventh Workshop on Statistical Machine Translation
Chris Callison-Burch | Philipp Koehn | Christof Monz | Matt Post | Radu Soricut | Lucia Specia
Proceedings of the Seventh Workshop on Statistical Machine Translation

pdf bib
Findings of the 2012 Workshop on Statistical Machine Translation
Chris Callison-Burch | Philipp Koehn | Christof Monz | Matt Post | Radu Soricut | Lucia Specia
Proceedings of the Seventh Workshop on Statistical Machine Translation

pdf bib
Using Categorial Grammar to Label Translation Rules
Jonathan Weese | Chris Callison-Burch | Adam Lopez
Proceedings of the Seventh Workshop on Statistical Machine Translation

pdf bib
Joshua 4.0: Packing, PRO, and Paraphrases
Juri Ganitkevitch | Yuan Cao | Jonathan Weese | Matt Post | Chris Callison-Burch
Proceedings of the Seventh Workshop on Statistical Machine Translation

pdf bib
Constructing Parallel Corpora for Six Indian Languages via Crowdsourcing
Matt Post | Chris Callison-Burch | Miles Osborne
Proceedings of the Seventh Workshop on Statistical Machine Translation

pdf bib
Machine Translation of Arabic Dialects
Rabih Zbib | Erika Malchiodi | Jacob Devlin | David Stallard | Spyros Matsoukas | Richard Schwartz | John Makhoul | Omar F. Zaidan | Chris Callison-Burch
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Expectations of Word Sense in Parallel Corpora
Xuchen Yao | Benjamin Van Durme | Chris Callison-Burch
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2011

pdf bib
Learning Sentential Paraphrases from Bilingual Parallel Corpora for Text-to-Text Generation
Juri Ganitkevitch | Chris Callison-Burch | Courtney Napoles | Benjamin Van Durme
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf bib
Incremental Syntactic Language Models for Phrase-based Translation
Lane Schwartz | Chris Callison-Burch | William Schuler | Stephen Wu
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Crowdsourcing Translation: Professional Quality from Non-Professionals
Omar F. Zaidan | Chris Callison-Burch
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
The Arabic Online Commentary Dataset: an Annotated Dataset of Informal Arabic with High Dialectal Content
Omar F. Zaidan | Chris Callison-Burch
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
WikiTopics: What is Popular on Wikipedia and Why
Byung Gyu Ahn | Benjamin Van Durme | Chris Callison-Burch
Proceedings of the Workshop on Automatic Summarization for Different Genres, Media, and Languages

pdf bib
Paraphrase Fragment Extraction from Monolingual Comparable Corpora
Rui Wang | Chris Callison-Burch
Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web

pdf bib
Paraphrastic Sentence Compression with a Character-based Metric: Tightening without Deletion
Courtney Napoles | Chris Callison-Burch | Juri Ganitkevitch | Benjamin Van Durme
Proceedings of the Workshop on Monolingual Text-To-Text Generation

pdf bib
Evaluating Sentence Compression: Pitfalls and Suggested Remedies
Courtney Napoles | Benjamin Van Durme | Chris Callison-Burch
Proceedings of the Workshop on Monolingual Text-To-Text Generation

pdf bib
Proceedings of the Sixth Workshop on Statistical Machine Translation
Chris Callison-Burch | Philipp Koehn | Christof Monz | Omar F. Zaidan
Proceedings of the Sixth Workshop on Statistical Machine Translation

pdf bib
Findings of the 2011 Workshop on Statistical Machine Translation
Chris Callison-Burch | Philipp Koehn | Christof Monz | Omar Zaidan
Proceedings of the Sixth Workshop on Statistical Machine Translation

pdf bib
Joshua 3.0: Syntax-based Machine Translation with the Thrax Grammar Extractor
Jonathan Weese | Juri Ganitkevitch | Chris Callison-Burch | Matt Post | Adam Lopez
Proceedings of the Sixth Workshop on Statistical Machine Translation

pdf bib
Reranking Bilingually Extracted Paraphrases Using Monolingual Distributional Similarity
Tsz Ping Chan | Chris Callison-Burch | Benjamin Van Durme
Proceedings of the GEMS 2011 Workshop on GEometrical Models of Natural Language Semantics

2010

pdf bib
Bucking the Trend: Large-Scale Cost-Focused Active Learning for Statistical Machine Translation
Michael Bloodgood | Chris Callison-Burch
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf bib
Cheap, Fast and Good Enough: Automatic Speech Recognition with Non-Expert Transcription
Scott Novotney | Chris Callison-Burch
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Predicting Human-Targeted Translation Edit Rate via Untrained Human Annotators
Omar F. Zaidan | Chris Callison-Burch
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Stream-based Translation Models for Statistical Machine Translation
Abby Levenberg | Chris Callison-Burch | Miles Osborne
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk
Chris Callison-Burch | Mark Dredze
Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk

pdf bib
Creating Speech and Language Data With Amazon’s Mechanical Turk
Chris Callison-Burch | Mark Dredze
Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk

pdf bib
Crowdsourced Accessibility: Elicitation of Wikipedia Articles
Scott Novotney | Chris Callison-Burch
Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk

pdf bib
Cheap Facts and Counter-Facts
Rui Wang | Chris Callison-Burch
Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk

pdf bib
Using Mechanical Turk to Build Machine Translation Evaluation Sets
Michael Bloodgood | Chris Callison-Burch
Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk

pdf bib
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Chris Callison-Burch | Philipp Koehn | Christof Monz | Kay Peterson | Omar Zaidan
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

pdf bib
Findings of the 2010 Joint Workshop on Statistical Machine Translation and Metrics for Machine Translation
Chris Callison-Burch | Philipp Koehn | Christof Monz | Kay Peterson | Mark Przybocki | Omar Zaidan
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

pdf bib
Joshua 2.0: A Toolkit for Parsing-Based Machine Translation with Syntax, Semirings, Discriminative Training and Other Goodies
Zhifei Li | Chris Callison-Burch | Chris Dyer | Juri Ganitkevitch | Ann Irvine | Sanjeev Khudanpur | Lane Schwartz | Wren Thornton | Ziyuan Wang | Jonathan Weese | Omar Zaidan
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

pdf bib
Semantically-Informed Syntactic Machine Translation: A Tree-Grafting Approach
Kathryn Baker | Michael Bloodgood | Chris Callison-Burch | Bonnie Dorr | Nathaniel Filardo | Lori Levin | Scott Miller | Christine Piatko
Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Research Papers

We describe a unified and coherent syntactic framework for supporting a semantically-informed syntactic approach to statistical machine translation. Semantically enriched syntactic tags assigned to the target-language training texts improved translation quality. The resulting system significantly outperformed a linguistically naive baseline model (Hiero), and reached the highest scores yet reported on the NIST 2009 Urdu-English translation task. This finding supports the hypothesis (posed by many researchers in the MT community, e.g., in DARPA GALE) that both syntactic and semantic information are critical for improving translation quality—and further demonstrates that large gains can be achieved for low-resource languages with different word order than English.

pdf bib
Transliterating From All Languages
Ann Irvine | Chris Callison-Burch | Alexandre Klementiev
Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Research Papers

Much of the previous work on transliteration has depended on resources and attributes specific to particular language pairs. In this work, rather than focus on a single language pair, we create robust models for transliterating from all languages in a large, diverse set to English. We create training data for 150 languages by mining name pairs from Wikipedia. We train 13 systems and analyze the effects of the amount of training data on transliteration performance. We also present an analysis of the types of errors that the systems make. Our analyses are particularly valuable for building machine translation systems for low resource languages, where creating and integrating a transliteration module for a language with few NLP resources may provide substantial gains in translation performance.

2009

pdf bib
Proceedings of the Fourth Workshop on Statistical Machine Translation
Chris Callison-Burch | Philipp Koehn | Christof Monz | Josh Schroeder
Proceedings of the Fourth Workshop on Statistical Machine Translation

pdf bib
Findings of the 2009 Workshop on Statistical Machine Translation
Chris Callison-Burch | Philipp Koehn | Christof Monz | Josh Schroeder
Proceedings of the Fourth Workshop on Statistical Machine Translation

pdf bib
Joshua: An Open Source Toolkit for Parsing-Based Machine Translation
Zhifei Li | Chris Callison-Burch | Chris Dyer | Sanjeev Khudanpur | Lane Schwartz | Wren Thornton | Jonathan Weese | Omar Zaidan
Proceedings of the Fourth Workshop on Statistical Machine Translation

pdf bib
Improving Translation Lexicon Induction from Monolingual Corpora via Dependency Contexts and Part-of-Speech Equivalences
Nikesh Garera | Chris Callison-Burch | David Yarowsky
Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL-2009)

pdf bib
Proceedings of the 2009 Workshop on Applied Textual Inference (TextInfer)
Chris Callison-Burch | Ido Dagan | Christopher Manning | Marco Pennacchiotti | Fabio Massimo Zanzotto
Proceedings of the 2009 Workshop on Applied Textual Inference (TextInfer)

pdf bib
Feasibility of Human-in-the-loop Minimum Error Rate Training
Omar F. Zaidan | Chris Callison-Burch
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
Fast, Cheap, and Creative: Evaluating Translation Quality Using Amazon’s Mechanical Turk
Chris Callison-Burch
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
Improved Statistical Machine Translation Using Monolingually-Derived Paraphrases
Yuval Marton | Chris Callison-Burch | Philip Resnik
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
Demonstration of Joshua: An Open Source Toolkit for Parsing-based Machine Translation
Zhifei Li | Chris Callison-Burch | Chris Dyer | Juri Ganitkevitch | Sanjeev Khudanpur | Lane Schwartz | Wren N. G. Thornton | Jonathan Weese | Omar F. Zaidan
Proceedings of the ACL-IJCNLP 2009 Software Demonstrations

2008

pdf bib
Constructing Corpora for the Development and Evaluation of Paraphrase Systems
Trevor Cohn | Chris Callison-Burch | Mirella Lapata
Computational Linguistics, Volume 34, Number 4, December 2008

pdf bib
Proceedings of the Third Workshop on Statistical Machine Translation
Chris Callison-Burch | Philipp Koehn | Christof Monz | Josh Schroeder | Cameron Shaw Fordyce
Proceedings of the Third Workshop on Statistical Machine Translation

pdf bib
Further Meta-Evaluation of Machine Translation
Chris Callison-Burch | Cameron Fordyce | Philipp Koehn | Christof Monz | Josh Schroeder
Proceedings of the Third Workshop on Statistical Machine Translation

pdf bib
Affinity Measures Based on the Graph Laplacian
Delip Rao | David Yarowsky | Chris Callison-Burch
Coling 2008: Proceedings of the 3rd Textgraphs workshop on Graph-based Algorithms for Natural Language Processing

pdf bib
Syntactic Constraints on Paraphrases Extracted from Parallel Corpora
Chris Callison-Burch
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

pdf bib
ParaMetric: An Automatic Evaluation Metric for Paraphrasing
Chris Callison-Burch | Trevor Cohn | Mirella Lapata
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

2007

pdf bib
Proceedings of the Second Workshop on Statistical Machine Translation
Chris Callison-Burch | Philipp Koehn | Cameron Shaw Fordyce | Christof Monz
Proceedings of the Second Workshop on Statistical Machine Translation

pdf bib
(Meta-) Evaluation of Machine Translation
Chris Callison-Burch | Cameron Fordyce | Philipp Koehn | Christof Monz | Josh Schroeder
Proceedings of the Second Workshop on Statistical Machine Translation

pdf bib
Moses: Open Source Toolkit for Statistical Machine Translation
Philipp Koehn | Hieu Hoang | Alexandra Birch | Chris Callison-Burch | Marcello Federico | Nicola Bertoldi | Brooke Cowan | Wade Shen | Christine Moran | Richard Zens | Chris Dyer | Ondřej Bojar | Alexandra Constantin | Evan Herbst
Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions

bib
Evaluating evaluation – lessons from the WMT 2007 shared task
Philipp Koehn | Chris Callison-Burch
Proceedings of the Workshop on Automatic procedures in MT evaluation

2006

pdf bib
Constraining the Phrase-Based, Joint Probability Statistical Translation Model
Alexandra Birch | Chris Callison-Burch | Miles Osborne | Philipp Koehn
Proceedings on the Workshop on Statistical Machine Translation

pdf bib
Improved Statistical Machine Translation Using Paraphrases
Chris Callison-Burch | Philipp Koehn | Miles Osborne
Proceedings of the Human Language Technology Conference of the NAACL, Main Conference

pdf bib
Constraining the Phrase-Based, Joint Probability Statistical Translation Model
Alexandra Birch | Chris Callison-Burch | Miles Osborne
Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers

The Joint Probability Model proposed by Marcu and Wong (2002) provides a probabilistic framework for modeling phrase-based statistical machine transla- tion (SMT). The model’s usefulness is, however, limited by the computational complexity of estimating parameters at the phrase level. We present a method of constraining the search space of the Joint Probability Model based on statistically and linguistically motivated word align- ments. This method reduces the complexity and size of the Joint Model and allows it to display performance superior to the standard phrase-based models for small amounts of training material.

pdf bib
Re-evaluating the Role of Bleu in Machine Translation Research
Chris Callison-Burch | Miles Osborne | Philipp Koehn
11th Conference of the European Chapter of the Association for Computational Linguistics

2005

pdf bib
Edinburgh System Description for the 2005 IWSLT Speech Translation Evaluation
Philipp Koehn | Amittai Axelrod | Alexandra Birch Mayne | Chris Callison-Burch | Miles Osborne | David Talbot
Proceedings of the Second International Workshop on Spoken Language Translation

pdf bib
Scaling Phrase-Based Statistical Machine Translation to Larger Corpora and Longer Phrases
Chris Callison-Burch | Colin Bannard | Josh Schroeder
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05)

pdf bib
Paraphrasing with Bilingual Parallel Corpora
Colin Bannard | Chris Callison-Burch
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05)

pdf bib
Proceedings of the ACL Student Research Workshop
Chris Callison-Burch | Stephen Wan
Proceedings of the ACL Student Research Workshop

pdf bib
A compact data structure for searchable translation memories
Chris Callison-Burch | Colin Bannard | Josh Schroeder
Proceedings of the 10th EAMT Conference: Practical applications of machine translation

2004

pdf bib
Improving statistical translation through editing
Chris Callison-Burch | Colin Bannard | Josh Schroeder
Proceedings of the 9th EAMT Workshop: Broadening horizons of machine translation and its applications

pdf bib
Searchable Translation Memories
Chris Callison-Burch
Proceedings of Translating and the Computer 26

pdf bib
Statistical Machine Translation with Word- and Sentence-Aligned Parallel Corpora
Chris Callison-Burch | David Talbot | Miles Osborne
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)

2003

pdf bib
Bootstrapping Parallel Corpora
Chris Callison-Burch | Miles Osborne
Proceedings of the HLT-NAACL 2003 Workshop on Building and Using Parallel Texts: Data Driven Machine Translation and Beyond

2001

pdf bib
A program for automatically selecting the best output from multiple machine translation engines
Chris Callison-Burch | Raymond S. Flournoy
Proceedings of Machine Translation Summit VIII

This paper describes a program that automatically selects the best translation from a set of translations produced by multiple commercial machine translation engines. The program is simplified by assuming that the most fluent item in the set is the best translation. Fluency is determined using a trigram language model. Results are provided illustrating how well the program performs for human ranked data as compared to each of its constituent engines.

pdf bib
Secondary benefits of feedback and user interaction in machine translation tools
Raymond S. Flournoy | Chris Callison-Burch
Workshop on MT2010: Towards a Road Map for MT

User feedback has often been proposed as a method for improving the accuracy of machine translation systems, but useful feedback can also serve a number of secondary benefits, including increasing user confidence in the MT technology and expanding the potential audience of users. Amikai, Inc. has produced a number of communication tools which embed translation technology and which attempt to improve the user experience by maximizing useful user interaction and feedback. As MT continues to develop, further attention needs to be paid to developing the overall user experience, which can improve the utility of translation tools even when translation quality itself plateaus.

pdf bib
Upping the Ante for ‘Best of Breed’ Machine Translation Providers
Chris Callison-Burch
Proceedings of Translating and the Computer 23

Search
Co-authors