Ulf Hermjakob


2018

pdf bib
Out-of-the-box Universal Romanization Tool uroman
Ulf Hermjakob | Jonathan May | Kevin Knight
Proceedings of ACL 2018, System Demonstrations

We present uroman, a tool for converting text in myriads of languages and scripts such as Chinese, Arabic and Cyrillic into a common Latin-script representation. The tool relies on Unicode data and other tables, and handles nearly all character sets, including some that are quite obscure such as Tibetan and Tifinagh. uroman converts digital numbers in various scripts to Western Arabic numerals. Romanization enables the application of string-similarity metrics to texts from different scripts without the need and complexity of an intermediate phonetic representation. The tool is freely and publicly available as a Perl script suitable for inclusion in data processing pipelines and as an interactive demo web page.

pdf bib
Translating a Language You Don’t Know In the Chinese Room
Ulf Hermjakob | Jonathan May | Michael Pust | Kevin Knight
Proceedings of ACL 2018, System Demonstrations

In a corruption of John Searle’s famous AI thought experiment, the Chinese Room (Searle, 1980), we twist its original intent by enabling humans to translate text, e.g. from Uyghur to English, even if they don’t have any prior knowledge of the source language. Our enabling tool, which we call the Chinese Room, is equipped with the same resources made available to a machine translation engine. We find that our superior language model and world knowledge allows us to create perfectly fluent and nearly adequate translations, with human expertise required only for the target language. The Chinese Room tool can be used to rapidly create small corpora of parallel data when bilingual translators are not readily available, in particular for low-resource languages.

pdf bib
Abstract Meaning Representation of Constructions: The More We Include, the Better the Representation
Claire Bonial | Bianca Badarau | Kira Griffitt | Ulf Hermjakob | Kevin Knight | Tim O’Gorman | Martha Palmer | Nathan Schneider
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
AMR Beyond the Sentence: the Multi-sentence AMR corpus
Tim O’Gorman | Michael Regan | Kira Griffitt | Ulf Hermjakob | Kevin Knight | Martha Palmer
Proceedings of the 27th International Conference on Computational Linguistics

There are few corpora that endeavor to represent the semantic content of entire documents. We present a corpus that accomplishes one way of capturing document level semantics, by annotating coreference and similar phenomena (bridging and implicit roles) on top of gold Abstract Meaning Representations of sentence-level semantics. We present a new corpus of this annotation, with analysis of its quality, alongside a plausible baseline for comparison. It is hoped that this Multi-Sentence AMR corpus (MS-AMR) may become a feasible method for developing rich representations of document meaning, useful for tasks such as information extraction and question answering.

2016

pdf bib
Generating English from Abstract Meaning Representations
Nima Pourdamghani | Kevin Knight | Ulf Hermjakob
Proceedings of the 9th International Natural Language Generation conference

2015

pdf bib
Parsing English into Abstract Meaning Representation Using Syntax-Based Machine Translation
Michael Pust | Ulf Hermjakob | Kevin Knight | Daniel Marcu | Jonathan May
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Unsupervised Entity Linking with Abstract Meaning Representation
Xiaoman Pan | Taylor Cassidy | Ulf Hermjakob | Heng Ji | Kevin Knight
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2014

pdf bib
Aligning English Strings with Abstract Meaning Representation Graphs
Nima Pourdamghani | Yang Gao | Ulf Hermjakob | Kevin Knight
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

2013

pdf bib
Abstract Meaning Representation for Sembanking
Laura Banarescu | Claire Bonial | Shu Cai | Madalina Georgescu | Kira Griffitt | Ulf Hermjakob | Kevin Knight | Philipp Koehn | Martha Palmer | Nathan Schneider
Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse

2009

pdf bib
Improved Word Alignment with Statistics and Linguistic Heuristics
Ulf Hermjakob
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

2008

pdf bib
Overcoming Vocabulary Sparsity in MT Using Lattices
Steve DeNeefe | Ulf Hermjakob | Kevin Knight
Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Research Papers

Source languages with complex word-formation rules present a challenge for statistical machine translation (SMT). In this paper, we take on three facets of this challenge: (1) common stems are fragmented into many different forms in training data, (2) rare and unknown words are frequent in test data, and (3) spelling variation creates additional sparseness problems. We present a novel, lightweight technique for dealing with this fragmentation, based on bilingual data, and we also present a combination of linguistic and statistical techniques for dealing with rare and unknown words. Taking these techniques together, we demonstrate +1.3 and +1.6 BLEU increases on top of strong baselines for Arabic-English machine translation.

pdf bib
Name Translation in Statistical Machine Translation - Learning When to Transliterate
Ulf Hermjakob | Kevin Knight | Hal Daumé III
Proceedings of ACL-08: HLT

2002

pdf bib
Using Knowledge to Facilitate Factoid Answer Pinpointing
Eduard Hovy | Ulf Hermjakob | Chin-Yew Lin | Deepak Ravichandran
COLING 2002: The 19th International Conference on Computational Linguistics

2001

pdf bib
Toward Semantics-Based Answer Pinpointing
Eduard Hovy | Laurie Gerber | Ulf Hermjakob | Chin-Yew Lin | Deepak Ravichandran
Proceedings of the First International Conference on Human Language Technology Research

pdf bib
Parsing and Question Classification for Question Answering
Ulf Hermjakob
Proceedings of the ACL 2001 Workshop on Open-Domain Question Answering

2000

pdf bib
Rapid Parser Development: A Machine Learning Approach for Korean
Ulf Hermjakob
1st Meeting of the North American Chapter of the Association for Computational Linguistics

1997

pdf bib
Learning Parse and Translation Decisions from Examples with Rich Context
Ulf Hermjakob | Raymond J. Mooney
35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics