Tackling Data Drift with Adversarial Validation: An Application for German Text Complexity Estimation

Alejandro Mosquera


Abstract
This paper describes the winning approach in the first automated German text complexity assessment shared task as part of KONVENS 2022. To solve this difficult problem, the evaluated system relies on an ensemble of regression models that successfully combines both traditional feature engineering and pre-trained resources. Moreover, the use of adversarial validation is proposed as a method for countering the data drift identified during the development phase, thus helping to select relevant models and features and avoid leaderboard overfitting. The best submission reached 0.43 mapped RMSE on the test set during the final phase of the competition.
Anthology ID:
2022.germeval-1.7
Volume:
Proceedings of the GermEval 2022 Workshop on Text Complexity Assessment of German Text
Month:
September
Year:
2022
Address:
Potsdam, Germany
Venue:
GermEval
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
39–44
Language:
URL:
https://aclanthology.org/2022.germeval-1.7
DOI:
Bibkey:
Cite (ACL):
Alejandro Mosquera. 2022. Tackling Data Drift with Adversarial Validation: An Application for German Text Complexity Estimation. In Proceedings of the GermEval 2022 Workshop on Text Complexity Assessment of German Text, pages 39–44, Potsdam, Germany. Association for Computational Linguistics.
Cite (Informal):
Tackling Data Drift with Adversarial Validation: An Application for German Text Complexity Estimation (Mosquera, GermEval 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.germeval-1.7.pdf
Data
TextComplexityDE