Gerold Schneider


2023

pdf bib
Evaluating the Effectiveness of Natural Language Inference for Hate Speech Detection in Languages with Limited Labeled Data
Janis Goldzycher | Moritz Preisig | Chantal Amrhein | Gerold Schneider
The 7th Workshop on Online Abuse and Harms (WOAH)

Most research on hate speech detection has focused on English where a sizeable amount of labeled training data is available. However, to expand hate speech detection into more languages, approaches that require minimal training data are needed. In this paper, we test whether natural language inference (NLI) models which perform well in zero- and few-shot settings can benefit hate speech detection performance in scenarios where only a limited amount of labeled data is available in the target language. Our evaluation on five languages demonstrates large performance improvements of NLI fine-tuning over direct fine-tuning in the target language. However, the effectiveness of previous work that proposed intermediate fine-tuning on English data is hard to match. Only in settings where the English training data does not match the test domain, can our customised NLI-formulation outperform intermediate fine-tuning on English. Based on our extensive experiments, we propose a set of recommendations for hate speech detection in languages where minimal labeled training data is available.

2022

pdf bib
Hypothesis Engineering for Zero-Shot Hate Speech Detection
Janis Goldzycher | Gerold Schneider
Proceedings of the Third Workshop on Threat, Aggression and Cyberbullying (TRAC 2022)

Standard approaches to hate speech detection rely on sufficient available hate speech annotations. Extending previous work that repurposes natural language inference (NLI) models for zero-shot text classification, we propose a simple approach that combines multiple hypotheses to improve English NLI-based zero-shot hate speech detection. We first conduct an error analysis for vanilla NLI-based zero-shot hate speech detection and then develop four strategies based on this analysis. The strategies use multiple hypotheses to predict various aspects of an input text and combine these predictions into a final verdict. We find that the zero-shot baseline used for the initial error analysis already outperforms commercial systems and fine-tuned BERT-based hate speech detection models on HateCheck. The combination of the proposed strategies further increases the zero-shot accuracy of 79.4% on HateCheck by 7.9 percentage points (pp), and the accuracy of 69.6% on ETHOS by 10.0pp.

pdf bib
Scaling Native Language Identification with Transformer Adapters
Ahmet Yavuz Uluslu | Gerold Schneider
Proceedings of the 5th International Conference on Natural Language and Speech Processing (ICNLSP 2022)

2020

pdf bib
Using Multilingual Resources to Evaluate CEFRLex for Learner Applications
Johannes Graën | David Alfter | Gerold Schneider
Proceedings of the Twelfth Language Resources and Evaluation Conference

The Common European Framework of Reference for Languages (CEFR) defines six levels of learner proficiency, and links them to particular communicative abilities. The CEFRLex project aims at compiling lexical resources that link single words and multi-word expressions to particular CEFR levels. The resources are thought to reflect second language learner needs as they are compiled from CEFR-graded textbooks and other learner-directed texts. In this work, we investigate the applicability of CEFRLex resources for building language learning applications. Our main concerns were that vocabulary in language learning materials might be sparse, i.e. that not all vocabulary items that belong to a particular level would also occur in materials for that level, and, on the other hand, that vocabulary items might be used on lower-level materials if required by the topic (e.g. with a simpler paraphrasing or translation). Our results indicate that the English CEFRLex resource is in accordance with external resources that we jointly employ as gold standard. Together with other values obtained from monolingual and parallel corpora, we can indicate which entries need to be adjusted to obtain values that are even more in line with this gold standard. We expect that this finding also holds for the other languages

2018

pdf bib
NLP Corpus Observatory – Looking for Constellations in Parallel Corpora to Improve Learners’ Collocational Skills
Gerold Schneider | Johannes Graën
Proceedings of the 7th workshop on NLP for Computer Assisted Language Learning

2017

pdf bib
Crossing the border twice: Reimporting prepositions to alleviate L1-specific transfer errors
Johannes Graën | Gerold Schneider
Proceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition

pdf bib
Comparing Rule-based and SMT-based Spelling Normalisation for English Historical Texts
Gerold Schneider | Eva Pettersson | Michael Percillier
Proceedings of the NoDaLiDa 2017 Workshop on Processing Historical Language

2014

pdf bib
Measuring the Public Accountability of New Modes of Governance
Bruno Wueest | Gerold Schneider | Michael Amsler
Proceedings of the ACL 2014 Workshop on Language Technologies and Computational Social Science

2013

pdf bib
Exploiting Synergies Between Open Resources for German Dependency Parsing, POS-tagging, and Morphological Analysis
Rico Sennrich | Martin Volk | Gerold Schneider
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

pdf bib
UZH in BioNLP 2013
Gerold Schneider | Simon Clematide | Tilia Ellendorff | Don Tuggener | Fabio Rinaldi | Gintarė Grigonytė
Proceedings of the BioNLP Shared Task 2013 Workshop

2012

pdf bib
Dependency parsing for interaction detection in pharmacogenomics
Gerold Schneider | Fabio Rinaldi | Simon Clematide
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We give an overview of our approach to the extraction of interactions between pharmacogenomic entities like drugs, genes and diseases and suggest classes of interaction types driven by data from PharmGKB and partly following the top level ontology WordNet and biomedical types from BioNLP. Our text mining approach to the extraction of interactions is based on syntactic analysis. We use syntactic analyses to explore domain events and to suggest a set of interaction labels for the pharmacogenomics domain.

2011

pdf bib
An Incremental Model for the Coreference Resolution Task of BioNLP 2011
Don Tuggener | Manfred Klenner | Gerold Schneider | Simon Clematide | Fabio Rinaldi
Proceedings of BioNLP Shared Task 2011 Workshop

2009

pdf bib
UZurich in the BioNLP 2009 Shared Task
Kaarel Kaljurand | Gerold Schneider | Fabio Rinaldi
Proceedings of the BioNLP 2009 Workshop Companion Volume for Shared Task

2008

pdf bib
Dependency-Based Relation Mining for Biomedical Literature
Fabio Rinaldi | Gerold Schneider | Kaarel Kaljurand | Michael Hess
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We describe techniques for the automatic detection of relationships among domain entities (e.g. genes, proteins, diseases) mentioned in the biomedical literature. Our approach is based on the adaptive selection of candidate interactions sentences, which are then parsed using our own dependency parser. Specific syntax-based filters are used to limit the number of possible candidate interacting pairs. The approach has been implemented as a demonstrator over a corpus of 2000 richly annotated MedLine abstracts, and later tested by participation to a text mining competition. In both cases, the results obtained have proved the adequacy of the proposed approach to the task of interaction detection.

2007

pdf bib
Pro3Gres Parser in the CoNLL Domain Adaptation Shared Task
Gerold Schneider | Kaarel Kaljurand | Fabio Rinaldi | Tobias Kuhn
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

2004

pdf bib
Answering Questions in the Genomics Domain
Fabio Rinaldi | James Dowdall | Gerold Schneider | Andreas Persidis
Proceedings of the Conference on Question Answering in Restricted Domains

pdf bib
Fast, Deep-Linguistic Statistical Dependency Parsing
Gerold Schneider | Fabio Rinaldi | James Dowdall
Proceedings of the Workshop on Recent Advances in Dependency Grammar

pdf bib
A robust and hybrid deep-linguistic theory applied to large-scale parsing
Gerold Schneider | James Dowdall | Fabio Rinaldi
Proceedings of the 3rd workshop on RObust Methods in Analysis of Natural Language Data (ROMAND 2004)

2003

pdf bib
A low-complexity, broad-coverage probabilistic Dependency Parser for English
Gerold Schneider
Proceedings of the HLT-NAACL 2003 Student Research Workshop

2002

pdf bib
Using Syntactic Analysis to Increase Efficiency in Visualizing Text Collections
James Henderson | Paola Merlo | Ivan Petroff | Gerold Schneider
COLING 2002: The 19th International Conference on Computational Linguistics