Precog-LTRC-IIITH at GermEval 2021: Ensembling Pre-Trained Language Models with Feature Engineering

T. H. Arjun, Arvindh A., Kumaraguru Ponnurangam


Abstract
We describe our participation in all the subtasks of the Germeval 2021 shared task on the identification of Toxic, Engaging, and Fact-Claiming Comments. Our system is an ensemble of state-of-the-art pre-trained models finetuned with carefully engineered features. We show that feature engineering and data augmentation can be helpful when the training data is sparse. We achieve an F1 score of 66.87, 68.93, and 73.91 in Toxic, Engaging, and Fact-Claiming comment identification subtasks.
Anthology ID:
2021.germeval-1.6
Volume:
Proceedings of the GermEval 2021 Shared Task on the Identification of Toxic, Engaging, and Fact-Claiming Comments
Month:
September
Year:
2021
Address:
Duesseldorf, Germany
Editors:
Julian Risch, Anke Stoll, Lena Wilms, Michael Wiegand
Venue:
GermEval
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
39–46
Language:
URL:
https://aclanthology.org/2021.germeval-1.6
DOI:
Bibkey:
Cite (ACL):
T. H. Arjun, Arvindh A., and Kumaraguru Ponnurangam. 2021. Precog-LTRC-IIITH at GermEval 2021: Ensembling Pre-Trained Language Models with Feature Engineering. In Proceedings of the GermEval 2021 Shared Task on the Identification of Toxic, Engaging, and Fact-Claiming Comments, pages 39–46, Duesseldorf, Germany. Association for Computational Linguistics.
Cite (Informal):
Precog-LTRC-IIITH at GermEval 2021: Ensembling Pre-Trained Language Models with Feature Engineering (Arjun et al., GermEval 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.germeval-1.6.pdf
Code
 arjunth2001/GermEval-2021