Rebecca Hwa
2023
Decoding Symbolism in Language Models
Meiqi Guo | Rebecca Hwa | Adriana Kovashka
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Meiqi Guo | Rebecca Hwa | Adriana Kovashka
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
This work explores the feasibility of eliciting knowledge from language models (LMs) to decode symbolism, recognizing something (e.g.,roses) as a stand-in for another (e.g., love). We present our evaluative framework, Symbolism Analysis (SymbA), which compares LMs (e.g., RoBERTa, GPT-J) on different types of symbolism and analyze the outcomes along multiple metrics. Our findings suggest that conventional symbols are more reliably elicited from LMs while situated symbols are more challenging. Results also reveal the negative impact of the bias in pre-trained corpora. We further demonstrate that a simple re-ranking strategy can mitigate the bias and significantly improve model performances to be on par with human performances in some cases.
2021
Contrapositive Local Class Inference
Omid Kashefi | Rebecca Hwa
Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021)
Omid Kashefi | Rebecca Hwa
Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021)
Certain types of classification problems may be performed at multiple levels of granularity; for example, we might want to know the sentiment polarity of a document or a sentence, or a phrase. Often, the prediction at a greater-context (e.g., sentences or paragraphs) may be informative for a more localized prediction at a smaller semantic unit (e.g., words or phrases). However, directly inferring the most salient local features from the global prediction may overlook the semantics of this relationship. This work argues that inference along the contraposition relationship of the local prediction and the corresponding global prediction makes an inference framework that is more accurate and robust to noise. We show how this contraposition framework can be implemented as a transfer function that rewrites a greater-context from one class to another and demonstrate how an appropriate transfer function can be trained from a noisy user-generated corpus. The experimental results validate our insight that the proposed contrapositive framework outperforms the alternative approaches on resource-constrained problem domains.
2020
Inflating Topic Relevance with Ideology: A Case Study of Political Ideology Bias in Social Topic Detection Models
Meiqi Guo | Rebecca Hwa | Yu-Ru Lin | Wen-Ting Chung
Proceedings of the 28th International Conference on Computational Linguistics
Meiqi Guo | Rebecca Hwa | Yu-Ru Lin | Wen-Ting Chung
Proceedings of the 28th International Conference on Computational Linguistics
We investigate the impact of political ideology biases in training data. Through a set of comparison studies, we examine the propagation of biases in several widely-used NLP models and its effect on the overall retrieval accuracy. Our work highlights the susceptibility of large, complex models to propagating the biases from human-selected input, which may lead to a deterioration of retrieval accuracy, and the importance of controlling for these biases. Finally, as a way to mitigate the bias, we propose to learn a text representation that is invariant to political ideology while still judging topic relevance.
Quantifying the Evaluation of Heuristic Methods for Textual Data Augmentation
Omid Kashefi | Rebecca Hwa
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)
Omid Kashefi | Rebecca Hwa
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)
Data augmentation has been shown to be effective in providing more training data for machine learning and resulting in more robust classifiers. However, for some problems, there may be multiple augmentation heuristics, and the choices of which one to use may significantly impact the success of the training. In this work, we propose a metric for evaluating augmentation heuristics; specifically, we quantify the extent to which an example is “hard to distinguish” by considering the difference between the distribution of the augmented samples of different classes. Experimenting with multiple heuristics in two prediction tasks (positive/negative sentiment and verbosity/conciseness) validates our claims by revealing the connection between the distribution difference of different classes and the classification accuracy.
2018
Heuristically Informed Unsupervised Idiom Usage Recognition
Changsheng Liu | Rebecca Hwa
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Changsheng Liu | Rebecca Hwa
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Many idiomatic expressions can be interpreted figuratively or literally depending on their contexts. This paper proposes an unsupervised learning method for recognizing the intended usages of idioms. We treat the usages as a latent variable in probabilistic models and train them in a linguistically motivated feature space. Crucially, we show that distributional semantics is a helpful heuristic for distinguishing the literal usage of idioms, giving us a way to formulate a literal usage metric to estimate the likelihood that the idiom is intended literally. This information then serves as a form of distant supervision to guide the unsupervised training process for the probabilistic models. Experiments show that our overall model performs competitively against supervised methods.
Semantic Pleonasm Detection
Omid Kashefi | Andrew T. Lucas | Rebecca Hwa
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)
Omid Kashefi | Andrew T. Lucas | Rebecca Hwa
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)
Pleonasms are words that are redundant. To aid the development of systems that detect pleonasms in text, we introduce an annotated corpus of semantic pleonasms. We validate the integrity of the corpus with interannotator agreement analyses. We also compare it against alternative resources in terms of their effects on several automatic redundancy detection methods.
2017
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
Martha Palmer | Rebecca Hwa | Sebastian Riedel
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
Martha Palmer | Rebecca Hwa | Sebastian Riedel
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
A Corpus of Annotated Revisions for Studying Argumentative Writing
Fan Zhang | Homa B. Hashemi | Rebecca Hwa | Diane Litman
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Fan Zhang | Homa B. Hashemi | Rebecca Hwa | Diane Litman
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
This paper presents ArgRewrite, a corpus of between-draft revisions of argumentative essays. Drafts are manually aligned at the sentence level, and the writer’s purpose for each revision is annotated with categories analogous to those used in argument mining and discourse analysis. The corpus should enable advanced research in writing comparison and revision analysis, as demonstrated via our own studies of student revision behavior and of automatic revision purpose prediction.
2016
An Evaluation of Parser Robustness for Ungrammatical Sentences
Homa B. Hashemi | Rebecca Hwa
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing
Homa B. Hashemi | Rebecca Hwa
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing
Phrasal Substitution of Idiomatic Expressions
Changsheng Liu | Rebecca Hwa
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Changsheng Liu | Rebecca Hwa
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
ArgRewrite: A Web-based Revision Assistant for Argumentative Writings
Fan Zhang | Rebecca Hwa | Diane Litman | Homa B. Hashemi
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations
Fan Zhang | Rebecca Hwa | Diane Litman | Homa B. Hashemi
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations
2014
Redundancy Detection in ESL Writings
Huichao Xue | Rebecca Hwa
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics
Huichao Xue | Rebecca Hwa
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics
A Comparison of MT Errors and ESL Errors
Homa B. Hashemi | Rebecca Hwa
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Homa B. Hashemi | Rebecca Hwa
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Generating fluent and grammatical sentences is a major goal for both Machine Translation (MT) and second-language Grammar Error Correction (GEC), but there have not been a lot of cross-fertilization between the two research communities. Arguably, an automatic translate-to-English system might be seen as an English as a Second Language (ESL) writer whose native language is the source language. This paper investigates whether research findings from the GEC community may help with characterizing MT error analysis. We describe a method for the automatic classification of MT errors according to English as a Second Language (ESL) error categories and conduct a large comparison experiment that includes both high-performing and low-performing translate-to-English MT systems for several source languages. Comparing the distribution of MT error types for all the systems suggests that MT systems have fairly similar distributions regardless of their source languages, and the high-performing MT systems have error distributions that are more similar to those of the low-performing MT systems than to those of ESL learners with the same L1.
Improved Correction Detection in Revised ESL Sentences
Huichao Xue | Rebecca Hwa
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Huichao Xue | Rebecca Hwa
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
2012
Modeling ESL Word Choice Similarities By Representing Word Intensions and Extensions
Huichao Xue | Rebecca Hwa
Proceedings of COLING 2012
Huichao Xue | Rebecca Hwa
Proceedings of COLING 2012
Recognizing Arguing Subjectivity and Argument Tags
Alexander Conrad | Janyce Wiebe | Rebecca Hwa
Proceedings of the Workshop on Extra-Propositional Aspects of Meaning in Computational Linguistics
Alexander Conrad | Janyce Wiebe | Rebecca Hwa
Proceedings of the Workshop on Extra-Propositional Aspects of Meaning in Computational Linguistics
2010
Using Variable Decoding Weight for Language Model in Statistical Machine Translation
Behrang Mohit | Rebecca Hwa | Alon Lavie
Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Research Papers
Behrang Mohit | Rebecca Hwa | Alon Lavie
Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Research Papers
This paper investigates varying the decoder weight of the language model (LM) when translating different parts of a sentence. We determine the condition under which the LM weight should be adapted. We find that a better translation can be achieved by varying the LM weight when decoding the most problematic spot in a sentence, which we refer to as a difficult segment. Two adaptation strategies are proposed and compared through experiments. We find that adapting a different LM weight for every difficult segment resulted in the largest improvement in translation quality.
Syntax-Driven Machine Translation as a Model of ESL Revision
Huichao Xue | Rebecca Hwa
Coling 2010: Posters
Huichao Xue | Rebecca Hwa
Coling 2010: Posters
Improving Phrase-Based Translation with Prototypes of Short Phrases
Frank Liberato | Behrang Mohit | Rebecca Hwa
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Frank Liberato | Behrang Mohit | Rebecca Hwa
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
2009
Language Model Adaptation for Difficult to Translate Phrases
Behrang Mohit | Frank Liberato | Rebecca Hwa
Proceedings of the 13th Annual Conference of the European Association for Machine Translation
Behrang Mohit | Frank Liberato | Rebecca Hwa
Proceedings of the 13th Annual Conference of the European Association for Machine Translation
Correcting Automatic Translations through Collaborations between MT and Monolingual Target-Language Users
Joshua Albrecht | Rebecca Hwa | G. Elisabeta Marai
Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009)
Joshua Albrecht | Rebecca Hwa | G. Elisabeta Marai
Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009)
2008
The Role of Pseudo References in MT Evaluation
Joshua Albrecht | Rebecca Hwa
Proceedings of the Third Workshop on Statistical Machine Translation
Joshua Albrecht | Rebecca Hwa
Proceedings of the Third Workshop on Statistical Machine Translation
2007
A Re-examination of Machine Learning Approaches for Sentence-Level MT Evaluation
Joshua Albrecht | Rebecca Hwa
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics
Joshua Albrecht | Rebecca Hwa
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics
Regression for Sentence-Level MT Evaluation with Pseudo References
Joshua Albrecht | Rebecca Hwa
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics
Joshua Albrecht | Rebecca Hwa
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics
Localization of Difficult-to-Translate Phrases
Behrang Mohit | Rebecca Hwa
Proceedings of the Second Workshop on Statistical Machine Translation
Behrang Mohit | Rebecca Hwa
Proceedings of the Second Workshop on Statistical Machine Translation
2006
Corpus Variations for Translation Lexicon Induction
Rebecca Hwa | Carol Nichols | Khalil Sima’an
Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers
Rebecca Hwa | Carol Nichols | Khalil Sima’an
Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers
Lexical mappings (word translations) between languages are an invaluable resource for multilingual processing. While the problem of extracting lexical mappings from parallel corpora is well-studied, the task is more challenging when the language samples are from non-parallel corpora. The goal of this work is to investigate one such scenario: finding lexical mappings between dialects of a diglossic language, in which people conduct their written communications in a prestigious formal dialect, but they communicate verbally in a colloquial dialect. Because the two dialects serve different socio-linguistic functions, parallel corpora do not naturally exist between them. An example of a diglossic dialect pair is Modern Standard Arabic (MSA) and Levantine Arabic. In this paper, we evaluate the applicability of a standard algorithm for inducing lexical mappings between comparable corpora (Rapp, 1999) to such diglossic corpora pairs. The focus of the paper is an in-depth error analysis, exploring the notion of relatedness in diglossic corpora and scrutinizing the effects of various dimensions of relatedness (such as mode, topic, style, and statistics) on the quality of the resulting translation lexicon.
Proceedings of the COLING/ACL 2006 Student Research Workshop
Marine Carpuat | Kevin Duh | Rebecca Hwa
Proceedings of the COLING/ACL 2006 Student Research Workshop
Marine Carpuat | Kevin Duh | Rebecca Hwa
Proceedings of the COLING/ACL 2006 Student Research Workshop
2005
A Backoff Model for Bootstrapping Resources for Non-English Languages
Chenhai Xi | Rebecca Hwa
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing
Chenhai Xi | Rebecca Hwa
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing
Syntax-based Semi-Supervised Named Entity Tagging
Behrang Mohit | Rebecca Hwa
Proceedings of the ACL Interactive Poster and Demonstration Sessions
Behrang Mohit | Rebecca Hwa
Proceedings of the ACL Interactive Poster and Demonstration Sessions
Word Alignment and Cross-Lingual Resource Acquisition
Carol Nichols | Rebecca Hwa
Proceedings of the ACL Interactive Poster and Demonstration Sessions
Carol Nichols | Rebecca Hwa
Proceedings of the ACL Interactive Poster and Demonstration Sessions
2004
Sample Selection for Statistical Parsing
Rebecca Hwa
Computational Linguistics, Volume 30, Number 3, September 2004
Rebecca Hwa
Computational Linguistics, Volume 30, Number 3, September 2004
Co-training for Predicting Emotions with Spoken Dialogue Data
Beatriz Maeireizo | Diane Litman | Rebecca Hwa
Proceedings of the ACL Interactive Poster and Demonstration Sessions
Beatriz Maeireizo | Diane Litman | Rebecca Hwa
Proceedings of the ACL Interactive Poster and Demonstration Sessions
2003
Bootstrapping statistical parsers from small datasets
Mark Steedman | Miles Osborne | Anoop Sarkar | Stephen Clark | Rebecca Hwa | Julia Hockenmaier | Paul Ruhlen | Steven Baker | Jeremiah Crim
10th Conference of the European Chapter of the Association for Computational Linguistics
Mark Steedman | Miles Osborne | Anoop Sarkar | Stephen Clark | Rebecca Hwa | Julia Hockenmaier | Paul Ruhlen | Steven Baker | Jeremiah Crim
10th Conference of the European Chapter of the Association for Computational Linguistics
Example Selection for Bootstrapping Statistical Parsers
Mark Steedman | Rebecca Hwa | Stephen Clark | Miles Osborne | Anoop Sarkar | Julia Hockenmaier | Paul Ruhlen | Steven Baker | Jeremiah Crim
Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics
Mark Steedman | Rebecca Hwa | Stephen Clark | Miles Osborne | Anoop Sarkar | Julia Hockenmaier | Paul Ruhlen | Steven Baker | Jeremiah Crim
Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics
2002
DUSTer: a method for unraveling cross-language divergences for statistical word-level alignment
Bonnie Dorr | Lisa Pearl | Rebecca Hwa | Nizar Habash
Proceedings of the 5th Conference of the Association for Machine Translation in the Americas: Technical Papers
Bonnie Dorr | Lisa Pearl | Rebecca Hwa | Nizar Habash
Proceedings of the 5th Conference of the Association for Machine Translation in the Americas: Technical Papers
The frequent occurrence of divergenceS—structural differences between languages—presents a great challenge for statistical word-level alignment. In this paper, we introduce DUSTer, a method for systematically identifying common divergence types and transforming an English sentence structure to bear a closer resemblance to that of another language. Our ultimate goal is to enable more accurate alignment and projection of dependency trees in another language without requiring any training on dependency-tree data in that language. We present an empirical analysis comparing the complexities of performing word-level alignments with and without divergence handling. Our results suggest that our approach facilitates word-level alignment, particularly for sentence pairs containing divergences.
Evaluating Translational Correspondence using Annotation Projection
Rebecca Hwa | Philip Resnik | Amy Weinberg | Okan Kolak
Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics
Rebecca Hwa | Philip Resnik | Amy Weinberg | Okan Kolak
Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics
2001
On minimizing training corpus for parser acquisition
Rebecca Hwa
Proceedings of the ACL 2001 Workshop on Computational Natural Language Learning (ConLL)
Rebecca Hwa
Proceedings of the ACL 2001 Workshop on Computational Natural Language Learning (ConLL)
2000
Sample Selection for Statistical Grammar Induction
Rebecca Hwa
2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora
Rebecca Hwa
2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora
1999
Supervised Grammar Induction using Training Data with Limited Constituent Information
Rebecca Hwa
Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics
Rebecca Hwa
Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics
1998
Search
Fix author
Co-authors
- Behrang Mohit 5
- Joshua Albrecht 4
- Homa B. Hashemi 4
- Huichao Xue 4
- Omid Kashefi 3
- Diane Litman 3
- Steven Baker 2
- Stephen Clark 2
- Jeremiah Crim 2
- Meiqi Guo 2
- Julia Hockenmaier 2
- Frank Liberato 2
- Changsheng Liu 2
- Carol Nichols 2
- Miles Osborne 2
- Paul Ruhlen 2
- Anoop Sarkar 2
- Mark Steedman 2
- Fan Zhang 2
- Marine Carpuat 1
- Wen-Ting Chung 1
- Alexander Conrad 1
- Bonnie Dorr 1
- Kevin Duh 1
- Nizar Habash 1
- Okan Kolak 1
- Adriana Kovashka 1
- Alon Lavie 1
- Yu-Ru Lin 1
- Andrew T. Lucas 1
- Beatriz Maeireizo 1
- G. Elisabeta Marai 1
- Martha Palmer 1
- Lisa Pearl 1
- Philip Resnik 1
- Sebastian Riedel 1
- Khalil Sima’an 1
- Amy Weinberg 1
- Janyce Wiebe 1
- Chenhai Xi 1