2024
pdf
bib
abs
CeeBERT: Cross-Domain Inference in Early Exit BERT
Divya Jyoti Bajpai
|
Manjesh Hanawal
Findings of the Association for Computational Linguistics: ACL 2024
Pre-trained Language Models (PLMs), like BERT, with self-supervision objectives exhibit remarkable performance and generalization across various tasks. However, they suffer in inference latency due to their large size. To address this issue, side branches are attached at intermediate layers, enabling early inference of samples without requiring them to pass through all layers. However, the challenge is to decide which layer to infer and exit each sample so that the accuracy and latency are balanced. Moreover, the distribution of the samples to be inferred may differ from that used for training necessitating cross-domain adaptation. We propose an online learning algorithm named Cross-Domain Inference in Early Exit BERT (CeeBERT) that dynamically determines early exits of samples based on the level of confidence at each exit point. CeeBERT learns optimal thresholds from domain-specific confidence observed at intermediate layers on the fly, eliminating the need for labeled data. Experimental results on five distinct datasets with BERT and ALBERT models demonstrate CeeBERT’s ability to improve latency by reducing unnecessary computations with minimal drop in performance. By adapting to the threshold values, CeeBERT can speed up the BERT/ALBERT models by 2× - 3.1× with minimal drop in accuracy. The anonymized source code is available at https://github.com/Div290/CeeBERT.
pdf
bib
abs
DAdEE: Unsupervised Domain Adaptation in Early Exit PLMs
Divya Jyoti Bajpai
|
Manjesh Kumar Hanawal
Findings of the Association for Computational Linguistics: EMNLP 2024
Pre-trained Language Models (PLMs) exhibit good accuracy and generalization ability across various tasks using self-supervision, but their large size results in high inference latency. Early Exit (EE) strategies handle the issue by allowing the samples to exit from classifiers attached to the intermediary layers, but they do not generalize well, as exit classifiers can be sensitive to domain changes. To address this, we propose Unsupervised Domain Adaptation in EE framework (DAdEE) that employs multi-level adaptation using knowledge distillation. DAdEE utilizes GAN-based adversarial adaptation at each layer to achieve domain-invariant representations, reducing the domain gap between the source and target domain across all layers. The attached exits not only speed up inference but also enhance domain adaptation by reducing catastrophic forgetting and mode collapse, making it more suitable for real-world scenarios. Experiments on tasks such as sentiment analysis, entailment classification, and natural language inference demonstrate that DAdEE consistently outperforms not only early exit methods but also various domain adaptation methods under domain shift scenarios. The anonymized source code is available at https://github.com/Div290/DAdEE.
pdf
bib
abs
CapEEN: Image Captioning with Early Exits and Knowledge Distillation
Divya Jyoti Bajpai
|
Manjesh Kumar Hanawal
Findings of the Association for Computational Linguistics: EMNLP 2024
Deep neural networks (DNNs) have made significant progress in recognizing visual elements and generating descriptive text in image-captioning tasks. However, their improved performance comes from increased computational burden and inference latency. Early Exit (EE) strategies can be used to enhance their efficiency, but their adaptation presents challenges in image captioning as it requires varying levels of semantic information for accurate predictions. To overcome this, we introduce CapEEN to improve the performance of EE strategies using knowledge distillation. Inference in CapEEN is completed at intermediary layers if prediction confidence exceeds a predefined value learned from the training data. To account for real-world deployments, where target distributions could drift from that of training samples, we introduce a variant A-CapEEN to adapt the thresholds on the fly using Multi-armed bandits framework. Experiments on the MS COCO and Flickr30k datasets show that CapEEN gains speedup of 1.77× while maintaining competitive performance compared to the final layer, and A-CapEEN additionally offers robustness against distortions. The source code is available at https://github.com/Div290/CapEEN.
pdf
bib
abs
FAIR: Filtering of Automatically Induced Rules
Divya Jyoti Bajpai
|
Ayush Maheshwari
|
Manjesh Hanawal
|
Ganesh Ramakrishnan
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
The availability of large annotated data can be a critical bottleneck in training machine learning algorithms successfully, especially when applied to diverse domains. Weak supervision offers a promising alternative by accelerating the creation of labeled training data using domainspecific rules. However, it requires users to write a diverse set of high-quality rules to assign labels to the unlabeled data. Automatic Rule Induction (ARI) approaches circumvent this problem by automatically creating rules from features on a small labeled set and filtering a final set of rules from them. In the ARI approach, the crucial step is to filter out a set of a high-quality useful subset of rules from the large set of automatically created rules. In this paper, we propose an algorithm FAIR (Filtering of Automatically Induced Rules) to filter rules from a large number of automatically induced rules using submodular objective functions that account for the collective precision, coverage, and conflicts of the rule set. We experiment with three ARI approaches and five text classification datasets to validate the superior performance of our algorithm with respect to several semi-supervised label aggregation approaches. Further, we show that FAIR achieves statistically significant results in comparison to existing rule-filtering approaches. The source code is available at https://github.com/ ayushbits/FAIR-LF-Induction.