Cross Encoding as Augmentation: Towards Effective Educational Text Classification

Hyun Seung Lee, Seungtaek Choi, Yunsung Lee, Hyeongdon Moon, Shinhyeok Oh, Myeongho Jeong, Hyojun Go, Christian Wallraven


Abstract
Text classification in education, usually called auto-tagging, is the automated process of assigning relevant tags to educational content, such as questions and textbooks. However, auto-tagging suffers from a data scarcity problem, which stems from two major challenges: 1) it possesses a large tag space and 2) it is multi-label. Though a retrieval approach is reportedly good at low-resource scenarios, there have been fewer efforts to directly address the data scarcity problem. To mitigate these issues, here we propose a novel retrieval approach CEAA that provides effective learning in educational text classification. Our main contributions are as follows: 1) we leverage transfer learning from question-answering datasets, and 2) we propose a simple but effective data augmentation method introducing cross-encoder style texts to a bi-encoder architecture for more efficient inference. An extensive set of experiments shows that our proposed method is effective in multi-label scenarios and low-resource tags compared to state-of-the-art models.
Anthology ID:
2023.findings-acl.137
Volume:
Findings of the Association for Computational Linguistics: ACL 2023
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2184–2195
Language:
URL:
https://aclanthology.org/2023.findings-acl.137
DOI:
10.18653/v1/2023.findings-acl.137
Bibkey:
Cite (ACL):
Hyun Seung Lee, Seungtaek Choi, Yunsung Lee, Hyeongdon Moon, Shinhyeok Oh, Myeongho Jeong, Hyojun Go, and Christian Wallraven. 2023. Cross Encoding as Augmentation: Towards Effective Educational Text Classification. In Findings of the Association for Computational Linguistics: ACL 2023, pages 2184–2195, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Cross Encoding as Augmentation: Towards Effective Educational Text Classification (Lee et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-acl.137.pdf