Georgios Tziafas
2023
Improving BERT Pretraining with Syntactic Supervision
Georgios Tziafas
|
Konstantinos Kogkalidis
|
Gijs Wijnholds
|
Michael Moortgat
Proceedings of the 2023 CLASP Conference on Learning with Small Data (LSD)
Bidirectional masked Transformers have become the core theme in the current NLP landscape. Despite their impressive benchmarks, a recurring theme in recent research has been to question such models’ capacity for syntactic generalization. In this work, we seek to address this question by adding a supervised, token-level supertagging objective to standard unsupervised pretraining, enabling the explicit incorporation of syntactic biases into the network’s training dynamics. Our approach is straightforward to implement, induces a marginal computational overhead and is general enough to adapt to a variety of settings. We apply our methodology on Lassy Large, an automatically annotated corpus of written Dutch. Our experiments suggest that our syntax-aware model performs on par with established baselines, despite Lassy Large being one order of magnitude smaller than commonly used corpora.
2021
A Multilingual Approach to Identify and Classify Exceptional Measures against COVID-19
Georgios Tziafas
|
Eugenie de Saint-Phalle
|
Wietse de Vries
|
Clara Egger
|
Tommaso Caselli
Proceedings of the Natural Legal Language Processing Workshop 2021
The COVID-19 pandemic has witnessed the implementations of exceptional measures by governments across the world to counteract its impact. This work presents the initial results of an on-going project, EXCEPTIUS, aiming to automatically identify, classify and com- pare exceptional measures against COVID-19 across 32 countries in Europe. To this goal, we created a corpus of legal documents with sentence-level annotations of eight different classes of exceptional measures that are im- plemented across these countries. We evalu- ated multiple multi-label classifiers on a manu- ally annotated corpus at sentence level. The XLM-RoBERTa model achieves highest per- formance on this multilingual multi-label clas- sification task, with a macro-average F1 score of 59.8%.
Fighting the COVID-19 Infodemic with a Holistic BERT Ensemble
Georgios Tziafas
|
Konstantinos Kogkalidis
|
Tommaso Caselli
Proceedings of the Fourth Workshop on NLP for Internet Freedom: Censorship, Disinformation, and Propaganda
This paper describes the TOKOFOU system, an ensemble model for misinformation detection tasks based on six different transformer-based pre-trained encoders, implemented in the context of the COVID-19 Infodemic Shared Task for English. We fine tune each model on each of the task’s questions and aggregate their prediction scores using a majority voting approach. TOKOFOU obtains an overall F1 score of 89.7%, ranking first.
Search
Fix data
Co-authors
- Tommaso Caselli 2
- Konstantinos Kogkalidis 2
- Clara Egger 1
- Michael Moortgat 1
- Gijs Wijnholds 1
- show all...