Mary Harper

Also published as: M. P. Harper, Mary P. Harper


2014

pdf bib
Learning from 26 Languages: Program Management and Science in the Babel Program
Mary Harper
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

2011

pdf bib
Generalized Interpolation in Decision Tree LM
Denis Filimonov | Mary Harper
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Feature-Rich Log-Linear Lexical Model for Latent Variable PCFG Grammars
Zhongqiang Huang | Mary Harper
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
Syntactic Decision Tree LMs: Random Selection or Intelligent Design?
Denis Filimonov | Mary Harper
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

2010

pdf bib
Self-Training with Products of Latent Variable Grammars
Zhongqiang Huang | Mary Harper | Slav Petrov
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

pdf bib
Lessons Learned in Part-of-Speech Tagging of Conversational Speech
Vladimir Eidelman | Zhongqiang Huang | Mary Harper
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

pdf bib
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Ron Kaplan | Jill Burstein | Mary Harper | Gerald Penn
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Appropriately Handled Prosodic Breaks Help PCFG Parsing
Zhongqiang Huang | Mary Harper
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Non-Expert Correction of Automatically Generated Relation Annotations
Matthew R. Gormley | Adam Gerber | Mary Harper | Mark Dredze
Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk

2009

pdf bib
Improving A Simple Bigram HMM Part-of-Speech Tagger by Latent Annotation and Self-Training
Zhongqiang Huang | Vladimir Eidelman | Mary Harper
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers

pdf bib
Anchored Speech Recognition for Question Answering
Sibel Yaman | Gokhan Tur | Dimitra Vergyri | Dilek Hakkani-Tur | Mary Harper | Wen Wang
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers

pdf bib
Self-Training PCFG Grammars with Latent Annotations Across Languages
Zhongqiang Huang | Mary Harper
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
A Joint Language Model With Fine-grain Syntactic Tags
Denis Filimonov | Mary Harper
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
Transducing Logical Relations from Automatic and Manual GLARF
Adam Meyers | Michiko Kosaka | Heng Ji | Nianwen Xue | Mary Harper | Ang Sun | Wei Xu | Shasha Liao
Proceedings of the Third Linguistic Annotation Workshop (LAW III)

pdf bib
Who, What, When, Where, Why? Comparing Multiple Approaches to the Cross-Lingual 5W Task
Kristen Parton | Kathleen R. McKeown | Bob Coyne | Mona T. Diab | Ralph Grishman | Dilek Hakkani-Tür | Mary Harper | Heng Ji | Wei Yun Ma | Adam Meyers | Sara Stolbach | Ang Sun | Gokhan Tur | Wei Xu | Sibel Yaman
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

2007

pdf bib
Report on the NSF-sponsored Human Language Technology Workshop on Industrial Centers
Mary Harper | Alex Acero | Srinivas Bangalore | Jaime Carbonell | Jordan Cohen | Barbara Cuthill | Carol Espy-Wilson | Christiane Fellbaum | John Garofolo | Chin-Hui Lee | Jim Lester | Andrew McCallum | Nelson Morgan | Michael Picheney | Joe Picone | Lance Ramshaw | Jeff Reynar | Hadar Shemtov | Clare Voss
Proceedings of Machine Translation Summit XI: Papers

pdf bib
Recovery of Empty Nodes in Parse Structures
Denis Filimonov | Mary Harper
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

pdf bib
Mandarin Part-of-Speech Tagging and Discriminative Reranking
Zhongqiang Huang | Mary Harper | Wen Wang
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

2006

pdf bib
SParseval: Evaluation Metrics for Parsing Speech
Brian Roark | Mary Harper | Eugene Charniak | Bonnie Dorr | Mark Johnson | Jeremy Kahn | Yang Liu | Mari Ostendorf | John Hale | Anna Krasnyanskaya | Matthew Lease | Izhak Shafran | Matthew Snover | Robin Stewart | Lisa Yung
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

While both spoken and written language processing stand to benefit from parsing, the standard Parseval metrics (Black et al., 1991) and their canonical implementation (Sekine and Collins, 1997) are only useful for text. The Parseval metrics are undefined when the words input to the parser do not match the words in the gold standard parse tree exactly, and word errors are unavoidable with automatic speech recognition (ASR) systems. To fill this gap, we have developed a publicly available tool for scoring parses that implements a variety of metrics which can handle mismatches in words and segmentations, including: alignment-based bracket evaluation, alignment-based dependency evaluation, and a dependency evaluation that does not require alignment. We describe the different metrics, how to use the tool, and the outcome of an extensive set of experiments on the sensitivity.

pdf bib
An Open Source Prosodic Feature Extraction Tool
Zhongqiang Huang | Lei Chen | Mary Harper
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

There has been an increasing interest in utilizing a wide variety of knowledge sources in order to perform automatic tagging of speech events, such as sentence boundaries and dialogue acts. In addition to the word spoken, the prosodic content of the speech has been proved quite valuable in a variety of spoken language processing tasks such as sentence segmentation and tagging, disfluency detection, dialog act segmentation and tagging, and speaker recognition. In this paper, we report on an open source prosodic feature extraction tool based on Praat, with a description of the prosodic features and the implementation details, as well as a discussion of its extension capability. We also evaluate our tool on a sentence boundary detection task and report the system performance on the NIST RT04 CTS data.

pdf bib
Linguistic Resources for Speech Parsing
Ann Bies | Stephanie Strassel | Haejoong Lee | Kazuaki Maeda | Seth Kulick | Yang Liu | Mary Harper | Matthew Lease
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

We report on the success of a two-pass approach to annotating metadata, speech effects and syntactic structure in English conversational speech: separately annotating transcribed speech for structural metadata, or structural events, (fillers, speech repairs ( or edit dysfluencies) and SUs, or syntactic/semantic units) and for syntactic structure (treebanking constituent structure and shallow argument structure). The two annotations were then combined into a single representation. Certain alignment issues between the two types of annotation led to the discovery and correction of annotation errors in each, resulting in a more accurate and useful resource. The development of this corpus was motivated by the need to have both metadata and syntactic structure annotated in order to support synergistic work on speech parsing and structural event detection. Automatic detection of these speech phenomena would simultaneously improve parsing accuracy and provide a mechanism for cleaning up transcriptions for downstream text processing. Similarly, constraints imposed by text processing systems such as parsers can be used to help improve identification of disfluencies and sentence boundaries. This paper reports on our efforts to develop a linguistic resource providing both spoken metadata and syntactic structure information, and describes the resulting corpus of English conversational speech.

pdf bib
PCFGs with Syntactic and Prosodic Indicators of Speech Repairs
John Hale | Izhak Shafran | Lisa Yung | Bonnie J. Dorr | Mary Harper | Anna Krasnyanskaya | Matthew Lease | Yang Liu | Brian Roark | Matthew Snover | Robin Stewart
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf bib
Introducing Speech and Language Processing, by John Coleman
Mary Harper
Computational Linguistics, Volume 32, Number 1, March 2006

2005

pdf bib
Using Conditional Random Fields for Sentence Boundary Detection in Speech
Yang Liu | Andreas Stolcke | Elizabeth Shriberg | Mary Harper
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05)

2004

pdf bib
A Statistical Constraint Dependency Grammar (CDG) Parser
Wen Wang | Mary P. Harper
Proceedings of the Workshop on Incremental Parsing: Bringing Engineering and Cognition Together

pdf bib
Comparing and Combining Generative and Posterior Probability Models: Some Advances in Sentence Boundary Detection in Speech
Yang Liu | Andreas Stolcke | Elizabeth Shriberg | Mary Harper
Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing

pdf bib
Evaluating Factors Impacting the Accuracy of Forced Alignments in a Multimodal Corpus
Lei Chen | Yang Liu | Mary Harper | Eduardo Maia | Susan McRoy
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

People, when processing human-to-human communication, utilize everything they can in order to understand that communication, including speech and information such as the time and location of an interlocutor's gesture and gaze. Speech and gesture are known to exhibit a synchronous relationship in human communication; however, the precise nature of that relationship requires further investigation. The construction of computer models of multimodal human communication would be enabled by the availability of multimodal communication corpora annotated with synchronized gesture and speech features. To investigate the temporal relationships of these knowledge sources, we have collected and are annotating several multimodal corpora with time-aligned features. Forced alignment between a speech file and its transcription is a crucial part of multimodal corpus production. This paper investigates a number of factors that may contribute to highly accurate forced alignments to support the rapid production of these multimodal corpora including the acoustic model, the match between the speech used for training the system and that to be force aligned, the amount of data used to train the ASR system, the availability of speaker adaptation, and the duration of alignment segments.

2002

pdf bib
The SuperARV Language Model: Investigating the Effectiveness of Tightly Integrating Multiple Knowledge Sources
Wen Wang | Mary P. Harper
Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002)

2000

pdf bib
A Question Answering System Developed as a Project in a Natural Language Processing Course
W. Wang | J. Auer | R. Parasuraman | I. Zubarev | D. Brandyberry | M. P. Harper
ANLP-NAACL 2000 Workshop: Reading Comprehension Tests as Evaluation for Computer-Based Language Understanding Systems

pdf bib
The Effectiveness of Corpus-Induced Dependency Grammars for Post-processing Speech
M. P. Harper | C. M. White | W. Wang | M. T. Johnson | R. A. Helzerman
1st Meeting of the North American Chapter of the Association for Computational Linguistics

1999

pdf bib
A Second-Order Hidden Markov Model for Part-of-Speech Tagging
Scott M. Thede | Mary P. Harper
Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics

1997

pdf bib
Analysis of Unknown Lexical Items using Morphological and Syntactic Information with the TIMIT Corpus
Scott M. Thede | Mary Harper
Fifth Workshop on Very Large Corpora

1994

pdf bib
Squibs and Discussions: Storing Logical Form in a Shared-Packed Forest
Mary P. Harper
Computational Linguistics, Volume 20, Number 4, December 1994

1992

pdf bib
Ambiguous Noun Phrases in Logical Form
Mary P. Harper
Computational Linguistics, Volume 18, Number 4, December 1992

1990

pdf bib
Designer Definites in Logical Form
Mary P. Harper
28th Annual Meeting of the Association for Computational Linguistics

1986

pdf bib
Time and Tense in English
Mary P. Harper | Eugene Charniak
24th Annual Meeting of the Association for Computational Linguistics