UU-Tax at SemEval-2022 Task 3: Improving the generalizability of language models for taxonomy classification through data augmentation

Injy Sarhan, Pablo Mosteiro, Marco Spruit


Abstract
This paper presents our strategy to address the SemEval-2022 Task 3 PreTENS: Presupposed Taxonomies Evaluating Neural Network Semantics. The goal of the task is to identify if a sentence is deemed acceptable or not, depending on the taxonomic relationship that holds between a noun pair contained in the sentence. For sub-task 1—binary classification—we propose an effective way to enhance the robustness and the generalizability of language models for better classification on this downstream task. We design a two-stage fine-tuning procedure on the ELECTRA language model using data augmentation techniques. Rigorous experiments are carried out using multi-task learning and data-enriched fine-tuning. Experimental results demonstrate that our proposed model, UU-Tax, is indeed able to generalize well for our downstream task. For sub-task 2 —regression—we propose a simple classifier that trains on features obtained from Universal Sentence Encoder (USE). In addition to describing the submitted systems, we discuss other experiments that employ pre-trained language models and data augmentation techniques. For both sub-tasks, we perform error analysis to further understand the behaviour of the proposed models. We achieved a global F1Binary score of 91.25% in sub-task 1 and a rho score of 0.221 in sub-task 2.
Anthology ID:
2022.semeval-1.35
Volume:
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)
Month:
July
Year:
2022
Address:
Seattle, United States
Editors:
Guy Emerson, Natalie Schluter, Gabriel Stanovsky, Ritesh Kumar, Alexis Palmer, Nathan Schneider, Siddharth Singh, Shyam Ratan
Venue:
SemEval
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
271–281
Language:
URL:
https://aclanthology.org/2022.semeval-1.35
DOI:
10.18653/v1/2022.semeval-1.35
Bibkey:
Cite (ACL):
Injy Sarhan, Pablo Mosteiro, and Marco Spruit. 2022. UU-Tax at SemEval-2022 Task 3: Improving the generalizability of language models for taxonomy classification through data augmentation. In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022), pages 271–281, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):
UU-Tax at SemEval-2022 Task 3: Improving the generalizability of language models for taxonomy classification through data augmentation (Sarhan et al., SemEval 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.semeval-1.35.pdf
Video:
 https://aclanthology.org/2022.semeval-1.35.mp4
Code
 is5882/uu-tax