UR@NLP_A_Team @ GermEval 2021: Ensemble-based Classification of Toxic, Engaging and Fact-Claiming Comments

Kwabena Odame Akomeah; Udo Kruschwitz; Bernd Ludwig

UR@NLP_A_Team @ GermEval 2021: Ensemble-based Classification of Toxic, Engaging and Fact-Claiming Comments

Kwabena Odame Akomeah, Udo Kruschwitz, Bernd Ludwig

Abstract

In this paper, we report on our approach to addressing the GermEval 2021 Shared Task on the Identification of Toxic, Engaging, and Fact-Claiming Comments for the German language. We submitted three runs for each subtask based on ensembles of three models each using contextual embeddings from pre-trained language models using SVM and neural-network-based classifiers. We include language-specific as well as language-agnostic language models – both with and without fine-tuning. We observe that for the runs we submitted that the SVM models overfitted the training data and this affected the aggregation method (simple majority voting) of the ensembles. The model records a lower performance on the test set than on the training set. Exploring the issue of overfitting we uncovered that due to a bug in the pipeline the runs we submitted had not been trained on the full set but only on a small training set. Therefore in this paper we also include the results we get when trained on the full training set which demonstrate the power of ensembles.

Anthology ID:: 2021.germeval-1.14
Volume:: Proceedings of the GermEval 2021 Shared Task on the Identification of Toxic, Engaging, and Fact-Claiming Comments
Month:: September
Year:: 2021
Address:: Duesseldorf, Germany
Editors:: Julian Risch, Anke Stoll, Lena Wilms, Michael Wiegand
Venue:: GermEval
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 95–99
Language:
URL:: https://aclanthology.org/2021.germeval-1.14
DOI:
Bibkey:
Cite (ACL):: Kwabena Odame Akomeah, Udo Kruschwitz, and Bernd Ludwig. 2021. UR@NLP_A_Team @ GermEval 2021: Ensemble-based Classification of Toxic, Engaging and Fact-Claiming Comments. In Proceedings of the GermEval 2021 Shared Task on the Identification of Toxic, Engaging, and Fact-Claiming Comments, pages 95–99, Duesseldorf, Germany. Association for Computational Linguistics.
Cite (Informal):: UR@NLP_A_Team @ GermEval 2021: Ensemble-based Classification of Toxic, Engaging and Fact-Claiming Comments (Akomeah et al., GermEval 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.germeval-1.14.pdf

PDF Cite Search