Less Is Better: Recovering Intended-Feature Subspace to Robustify NLU Models

Ting Wu, Tao Gui


Abstract
Datasets with significant proportions of bias present threats for training a trustworthy model on NLU tasks. Despite yielding great progress, current debiasing methods impose excessive reliance on the knowledge of bias attributes. Definition of the attributes, however, is elusive and varies across different datasets. In addition, leveraging these attributes at input level to bias mitigation may leave a gap between intrinsic properties and the underlying decision rule. To narrow down this gap and liberate the supervision on bias, we suggest extending bias mitigation into feature space. Therefore, a novel model, Recovering Intended-Feature Subspace with Knowledge-Free (RISK) is developed. Assuming that shortcut features caused by various biases are unintended for prediction, RISK views them as redundant features. When delving into a lower manifold to remove redundancies, RISK reveals that an extremely low-dimensional subspace with intended features can robustly represent the highly biased dataset. Empirical results demonstrate our model can consistently improve model generalization to out-of-distribution set, and achieves a new state-of-the-art performance.
Anthology ID:
2022.coling-1.143
Volume:
Proceedings of the 29th International Conference on Computational Linguistics
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
1666–1676
Language:
URL:
https://aclanthology.org/2022.coling-1.143
DOI:
Bibkey:
Cite (ACL):
Ting Wu and Tao Gui. 2022. Less Is Better: Recovering Intended-Feature Subspace to Robustify NLU Models. In Proceedings of the 29th International Conference on Computational Linguistics, pages 1666–1676, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Cite (Informal):
Less Is Better: Recovering Intended-Feature Subspace to Robustify NLU Models (Wu & Gui, COLING 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.coling-1.143.pdf
Code
 cuteythyme/risk
Data
FEVERMultiNLIPAWS