Randy Goebel


2022

pdf bib
DeepBlues@LT-EDI-ACL2022: Depression level detection modelling through domain specific BERT and short text Depression classifiers
Nawshad Farruque | Osmar Zaiane | Randy Goebel | Sudhakar Sivapalan
Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion

We discuss a variety of approaches to build a robust Depression level detection model from longer social media posts (i.e., Reddit Depression forum posts) using a mental health text pre-trained BERT model. Further, we report our experimental results based on a strategy to select excerpts from long text and then fine-tune the BERT model to combat the issue of memory constraints while processing such texts. We show that, with domain specific BERT, we can achieve reasonable accuracy with fixed text size (in this case 200 tokens) for this task. In addition we can use short text classifiers to extract relevant text from the long text and achieve slightly better accuracy, albeit, trading off with the processing time for extracting such excerpts.

2021

pdf bib
DISK-CSV: Distilling Interpretable Semantic Knowledge with a Class Semantic Vector
Housam Khalifa Bashier | Mi-Young Kim | Randy Goebel
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Neural networks (NN) applied to natural language processing (NLP) are becoming deeper and more complex, making them increasingly difficult to understand and interpret. Even in applications of limited scope on fixed data, the creation of these complex “black-boxes” creates substantial challenges for debugging, understanding, and generalization. But rapid development in this field has now lead to building more straightforward and interpretable models. We propose a new technique (DISK-CSV) to distill knowledge concurrently from any neural network architecture for text classification, captured as a lightweight interpretable/explainable classifier. Across multiple datasets, our approach achieves better performance than the target black-box. In addition, our approach provides better explanations than existing techniques.

2020

pdf bib
RANCC: Rationalizing Neural Networks via Concept Clustering
Housam Khalifa Bashier | Mi-Young Kim | Randy Goebel
Proceedings of the 28th International Conference on Computational Linguistics

We propose a new self-explainable model for Natural Language Processing (NLP) text classification tasks. Our approach constructs explanations concurrently with the formulation of classification predictions. To do so, we extract a rationale from the text, then use it to predict a concept of interest as the final prediction. We provide three types of explanations: 1) rationale extraction, 2) a measure of feature importance, and 3) clustering of concepts. In addition, we show how our model can be compressed without applying complicated compression techniques. We experimentally demonstrate our explainability approach on a number of well-known text classification datasets.

2016

pdf bib
Paraphrase for Open Question Answering: New Dataset and Methods
Ying Xu | Pascual Martínez-Gómez | Yusuke Miyao | Randy Goebel
Proceedings of the Workshop on Human-Computer Question Answering

2015

pdf bib
A Lexicalized Tree Kernel for Open Information Extraction
Ying Xu | Christoph Ringlstetter | Mi-Young Kim | Grzegorz Kondrak | Randy Goebel | Yusuke Miyao
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

2013

pdf bib
Open Information Extraction with Tree Kernels
Ying Xu | Mi-Young Kim | Kevin Quinn | Randy Goebel | Denilson Barbosa
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2011

pdf bib
Using Visual Information to Predict Lexical Preference
Shane Bergsma | Randy Goebel
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011

2010

pdf bib
Application of the Tightness Continuum Measure to Chinese Information Retrieval
Ying Xu | Randy Goebel | Christoph Ringlstetter | Grzegorz Kondrak
Proceedings of the 2010 Workshop on Multiword Expressions: from Theory to Applications

2009

pdf bib
Glen, Glenda or Glendale: Unsupervised and Semi-supervised Learning of English Noun Gender
Shane Bergsma | Dekang Lin | Randy Goebel
Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL-2009)

pdf bib
A Continuum-Based Approach for Tightness Analysis of Chinese Semantic Units
Ying Xu | Christoph Ringlstetter | Randy Goebel
Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, Volume 2

2008

pdf bib
Targeting Chinese Nominal Compounds in Corpora
Weiruo Qu | Christoph Ringlstetter | Randy Goebel
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

For compounding languages, a great part of the topical semantics is transported via nominal compounds. Various applications of natural language processing can profit from explicit access to these compounds, provided by a lexicon. The best way to acquire such a resource is to harvest corpora that represent the domain in question. For Chinese, a significant difficulty lies in the fact that the text comes as a string of characters, only segmented by sentence boundaries. Extraction algorithms that solely rely on context variety do not perform precisely enough. We propose a pipeline of filters that starts from a candidate set established by accessor variety and then employs several methods to improve precision. For the experiments the Xinhua part of the Chinese Gigaword Corpus was used. We extracted a random sample of 200 story texts with 119,509 Hanzi characters. All compound words of this evaluation corpus were tagged, segmented into their morphemes, and augmented with the POS-information of their segments. A cascade of filters applied to a preliminary set of compound candidates led to a very high precision of over 90%, measured for the types. The result also holds for a small corpus where a solely contextual method introduces too much noise, even for the longer compounds. An introduction of MI into the basic candidacy algorithm led to a much higher recall with still reasonable precision for subsequent manual processing. Especially for the four-character compounds, that in our sample represent over 40% of the target data, the method has sufficient efficacy to support the rapid construction of compound dictionaries from domain corpora.

pdf bib
Distributional Identification of Non-Referential Pronouns
Shane Bergsma | Dekang Lin | Randy Goebel
Proceedings of ACL-08: HLT

pdf bib
Discriminative Learning of Selectional Preference from Unlabeled Text
Shane Bergsma | Dekang Lin | Randy Goebel
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing