Simon Hegelich
2022
From BERT‘s Point of View: Revealing the Prevailing Contextual Differences
Carolin M. Schuster
|
Simon Hegelich
Findings of the Association for Computational Linguistics: ACL 2022
Though successfully applied in research and industry large pretrained language models of the BERT family are not yet fully understood. While much research in the field of BERTology has tested whether specific knowledge can be extracted from layer activations, we invert the popular probing design to analyze the prevailing differences and clusters in BERT’s high dimensional space. By extracting coarse features from masked token representations and predicting them by probing models with access to only partial information we can apprehend the variation from ‘BERT’s point of view’. By applying our new methodology to different datasets we show how much the differences can be described by syntax but further how they are to a great extent shaped by the most simple positional information.
2020
NLP-based Feature Extraction for the Detection of COVID-19 Misinformation Videos on YouTube
Juan Carlos Medina Serrano
|
Orestis Papakyriakopoulos
|
Simon Hegelich
Proceedings of the 1st Workshop on NLP for COVID-19 at ACL 2020
We present a simple NLP methodology for detecting COVID-19 misinformation videos on YouTube by leveraging user comments. We use transfer learning pre-trained models to generate a multi-label classifier that can categorize conspiratorial content. We use the percentage of misinformation comments on each video as a new feature for video classification.