Adithya Samavedhi
2023
Transformer-based Models for Long-Form Document Matching: Challenges and Empirical Analysis
Akshita Jha
|
Adithya Samavedhi
|
Vineeth Rakesh
|
Jaideep Chandrashekar
|
Chandan Reddy
Findings of the Association for Computational Linguistics: EACL 2023
Recent advances in the area of long document matching have primarily focused on using transformer-based models for long document encoding and matching. There are two primary challenges associated with these models. Firstly, the performance gain provided by transformer-based models comes at a steep cost – both in terms of the required training time and the resource (memory and energy) consumption. The second major limitation is their inability to handle more than a pre-defined input token length at a time. In this work, we empirically demonstrate the effectiveness of simple neural models (such as feed-forward networks, and CNNs) and simple embeddings (like GloVe, and Paragraph Vector) over transformer-based models on the task of document matching. We show that simple models outperform the more complex BERT-based models while taking significantly less training time, energy, and memory. The simple models are also more robust to variations in document length and text perturbations.
SELFOOD: Self-Supervised Out-Of-Distribution Detection via Learning to Rank
Dheeraj Mekala
|
Adithya Samavedhi
|
Chengyu Dong
|
Jingbo Shang
Findings of the Association for Computational Linguistics: EMNLP 2023
Deep neural classifiers trained with cross-entropy loss (CE loss) often suffer from poor calibration, necessitating the task of out-of-distribution (OOD) detection. Traditional supervised OOD detection methods require expensive manual annotation of in-distribution and OOD samples. To address the annotation bottleneck, we introduce SELFOOD, a self-supervised OOD detection method that requires only in-distribution samples as supervision. We cast OOD detection as an inter-document intra-label (IDIL) ranking problem and train the classifier with our pairwise ranking loss, referred to as IDIL loss. Specifically, given a set of in-distribution documents and their labels, for each label, we train the classifier to rank the softmax scores of documents belonging to that label to be higher than the scores of documents that belong to other labels. Unlike CE loss, our IDIL loss function reaches zero when the desired confidence ranking is achieved and gradients are backpropagated to decrease probabilities associated with incorrect labels rather than continuously increasing the probability of the correct label. Extensive experiments with several classifiers on multiple classification datasets demonstrate the effectiveness of our method in both coarse- and fine-grained settings.
Search
Fix data
Co-authors
- Jaideep Chandrashekar 1
- Chengyu Dong 1
- Akshita Jha 1
- Dheeraj Mekala 1
- Vineeth Rakesh 1
- show all...