Contrapositive Local Class Inference
Omid Kashefi | Rebecca Hwa
Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021)
Certain types of classification problems may be performed at multiple levels of granularity; for example, we might want to know the sentiment polarity of a document or a sentence, or a phrase. Often, the prediction at a greater-context (e.g., sentences or paragraphs) may be informative for a more localized prediction at a smaller semantic unit (e.g., words or phrases). However, directly inferring the most salient local features from the global prediction may overlook the semantics of this relationship. This work argues that inference along the contraposition relationship of the local prediction and the corresponding global prediction makes an inference framework that is more accurate and robust to noise. We show how this contraposition framework can be implemented as a transfer function that rewrites a greater-context from one class to another and demonstrate how an appropriate transfer function can be trained from a noisy user-generated corpus. The experimental results validate our insight that the proposed contrapositive framework outperforms the alternative approaches on resource-constrained problem domains.
Quantifying the Evaluation of Heuristic Methods for Textual Data Augmentation
Omid Kashefi | Rebecca Hwa
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)
Data augmentation has been shown to be effective in providing more training data for machine learning and resulting in more robust classifiers. However, for some problems, there may be multiple augmentation heuristics, and the choices of which one to use may significantly impact the success of the training. In this work, we propose a metric for evaluating augmentation heuristics; specifically, we quantify the extent to which an example is “hard to distinguish” by considering the difference between the distribution of the augmented samples of different classes. Experimenting with multiple heuristics in two prediction tasks (positive/negative sentiment and verbosity/conciseness) validates our claims by revealing the connection between the distribution difference of different classes and the classification accuracy.
Semantic Pleonasm Detection
Omid Kashefi | Andrew T. Lucas | Rebecca Hwa
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)
Pleonasms are words that are redundant. To aid the development of systems that detect pleonasms in text, we introduce an annotated corpus of semantic pleonasms. We validate the integrity of the corpus with interannotator agreement analyses. We also compare it against alternative resources in terms of their effects on several automatic redundancy detection methods.