Michael Bloodgood


2017

pdf bib
Acquisition of Translation Lexicons for Historically Unwritten Languages via Bridging Loanwords
Michael Bloodgood | Benjamin Strauss
Proceedings of the 10th Workshop on Building and Using Comparable Corpora

With the advent of informal electronic communications such as social media, colloquial languages that were historically unwritten are being written for the first time in heavily code-switched environments. We present a method for inducing portions of translation lexicons through the use of expert knowledge in these settings where there are approximately zero resources available other than a language informant, potentially not even large amounts of monolingual data. We investigate inducing a Moroccan Darija-English translation lexicon via French loanwords bridging into English and find that a useful lexicon is induced for human-assisted translation and statistical machine translation.

pdf bib
Using Global Constraints and Reranking to Improve Cognates Detection
Michael Bloodgood | Benjamin Strauss
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Global constraints and reranking have not been used in cognates detection research to date. We propose methods for using global constraints by performing rescoring of the score matrices produced by state of the art cognates detection systems. Using global constraints to perform rescoring is complementary to state of the art methods for performing cognates detection and results in significant performance improvements beyond current state of the art performance on publicly available datasets with different language pairs and various conditions such as different levels of baseline state of the art performance and different data size conditions, including with more realistic large data size conditions than have been evaluated with in the past.

2014

pdf bib
Translation memory retrieval methods
Michael Bloodgood | Benjamin Strauss
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

2013

pdf bib
Analysis of Stopping Active Learning based on Stabilizing Predictions
Michael Bloodgood | John Grothendieck
Proceedings of the Seventeenth Conference on Computational Natural Language Learning

2012

pdf bib
Modality and Negation in SIMT Use of Modality and Negation in Semantically-Informed Syntactic MT
Kathryn Baker | Michael Bloodgood | Bonnie J. Dorr | Chris Callison-Burch | Nathaniel W. Filardo | Christine Piatko | Lori Levin | Scott Miller
Computational Linguistics, Volume 38, Issue 2 - June 2012

pdf bib
A Random Forest System Combination Approach for Error Detection in Digital Dictionaries
Michael Bloodgood | Peng Ye | Paul Rodrigues | David Zajic | David Doermann
Proceedings of the Workshop on Innovative Hybrid Approaches to the Processing of Textual Data

pdf bib
Statistical Modality Tagging from Rule-based Annotations and Crowdsourcing
Vinodkumar Prabhakaran | Michael Bloodgood | Mona Diab | Bonnie Dorr | Lori Levin | Christine D. Piatko | Owen Rambow | Benjamin Van Durme
Proceedings of the Workshop on Extra-Propositional Aspects of Meaning in Computational Linguistics

2010

pdf bib
Bucking the Trend: Large-Scale Cost-Focused Active Learning for Statistical Machine Translation
Michael Bloodgood | Chris Callison-Burch
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf bib
Semantically-Informed Syntactic Machine Translation: A Tree-Grafting Approach
Kathryn Baker | Michael Bloodgood | Chris Callison-Burch | Bonnie Dorr | Nathaniel Filardo | Lori Levin | Scott Miller | Christine Piatko
Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Research Papers

We describe a unified and coherent syntactic framework for supporting a semantically-informed syntactic approach to statistical machine translation. Semantically enriched syntactic tags assigned to the target-language training texts improved translation quality. The resulting system significantly outperformed a linguistically naive baseline model (Hiero), and reached the highest scores yet reported on the NIST 2009 Urdu-English translation task. This finding supports the hypothesis (posed by many researchers in the MT community, e.g., in DARPA GALE) that both syntactic and semantic information are critical for improving translation quality—and further demonstrates that large gains can be achieved for low-resource languages with different word order than English.

pdf bib
Using Mechanical Turk to Build Machine Translation Evaluation Sets
Michael Bloodgood | Chris Callison-Burch
Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk

pdf bib
A Modality Lexicon and its use in Automatic Tagging
Kathryn Baker | Michael Bloodgood | Bonnie Dorr | Nathaniel W. Filardo | Lori Levin | Christine Piatko
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper describes our resource-building results for an eight-week JHU Human Language Technology Center of Excellence Summer Camp for Applied Language Exploration (SCALE-2009) on Semantically-Informed Machine Translation. Specifically, we describe the construction of a modality annotation scheme, a modality lexicon, and two automated modality taggers that were built using the lexicon and annotation scheme. Our annotation scheme is based on identifying three components of modality: a trigger, a target and a holder. We describe how our modality lexicon was produced semi-automatically, expanding from an initial hand-selected list of modality trigger words and phrases. The resulting expanded modality lexicon is being made publicly available. We demonstrate that one tagger―a structure-based tagger―results in precision around 86% (depending on genre) for tagging of a standard LDC data set. In a machine translation application, using the structure-based tagger to annotate English modalities on an English-Urdu training corpus improved the translation quality score for Urdu by 0.3 Bleu points in the face of sparse training data.

2009

pdf bib
Taking into Account the Differences between Actively and Passively Acquired Data: The Case of Active Learning with Support Vector Machines for Imbalanced Datasets
Michael Bloodgood | K. Vijay-Shanker
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers

pdf bib
A Method for Stopping Active Learning Based on Stabilizing Predictions and the Need for User-Adjustable Stopping
Michael Bloodgood | K. Vijay-Shanker
Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL-2009)

2008

pdf bib
An Approach to Reducing Annotation Costs for BioNLP
Michael Bloodgood | K. Vijay-Shanker
Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing

2006

pdf bib
Rapid Adaptation of POS Tagging for Domain Specific Uses
John E. Miller | Michael Bloodgood | Manabu Torii | K. Vijay-Shanker
Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology