Smatgrisene at SemEval-2020 Task 12: Offense Detection by AI - with a Pinch of Real I

This paper discusses how ML based classifiers can be enhanced disproportionately by adding small amounts of qualitative linguistic knowledge. As an example we present the Danish classifier Smatgrisene, our contribution to the recent OffensEval Challenge 2020. The classifier was trained on 3000 social media posts annotated for offensiveness, supplemented by rules extracted from the reference work on Danish offensive language (Rathje 2014b). Smatgrisene did surprisingly well in the competition in spite of its extremely simple design, showing an interesting trade-off between technological muscle and linguistic intelligence. Finally, we comment on the perspectives in combining qualitative and quantitative methods for NLP.


Introduction
Offense in the social media (SoMe) is a nuisance, not only for the targeted victim, but for editors, consumers, and stakeholders -even damaging the reputation of the internet as a whole. Could we just detect and eliminate offensive SoMe posts with the same security as we can syntax errors in python programs or even in human texts, much would be gained. The need for automatic offense detection motivated the recent OffensEval Challenge 2020 (Sigurbergsson et al. 2020(Sigurbergsson et al. , https://arxiv.org/pdf/2006. In this paper we present our Danish classifier (called Smatgrisene) and report on our participation in OffensEval.
After introducing OffensEval we discuss the annotation principles behind the shared training materials and comment on a possible consistency problem. We suggest, as a remedy, a classifier design based on a mixture of simple surface rules and synthetic linguistic knowledge. We then present our OffensEval results, and conclude with a discussion of knowledge-enhanced NLP for offense detection and beyond.

OffensEval 2020 -Danish chapter
The Danish training material consisted of 3290 SoMe posts each annotated with labels OFF ('offensive') or NOT ('not offensive'). Parts of the annotations were kept secret from the participants (the test corpus) while the largest part was available as training data (2961 posts). Each participating team trained an offense classifier, applied it to the test corpus, and submitted the output as their contribution. The Danish chapter of OffensEval 2020 had 39 contributing teams including Smatgrisene, the authors of this paper.

The meaning of OFF
As we gather, no instruction manual in Danish was made available to the OffensEval annotators 2 . This may be one of the reasons for the many inconsistencies in the annotations. To get an impression of the annotators' dilemmas, we translated the central term 'offend' to Danish. At least four different translations are offered in the major Danish dictionaries (Akselsen, 2000;Schwarz, 2009;Pedersen, 2007;Pedersen, 2008  No other translation appears in all of them. Thus, in a practical sense the table provides a closed semantic field. We maintain that the Danish lexemes 'fornaerme', 'støde', 'kraenke', and 'forarge' are clearly semantically distinct. To substantiate this claim we had a group of native speakers of Danish (4 linguists and 4 professional writers) evaluate the semantics of the four translations of offend. The subjects were asked to sort several bundles of carrier sentences by likelihood ("Which utterance is more likely to appear in an ordinary conversation?"). The details are published by Dansk Sprognaevn (https://dsn.dk/smatgrisene). The survey left no doubt: The four Danish lexemes are clearly semantically distinct, separated by intentionality (was the triggering action intended as an offense or not?) and more. In short, Danish translations of offend are highly ambiguous, as are related terms like abuse, insult, violate, and so on. It is not surprising, then, that the Danish annotation data are prone to inconsistency, the annotators having no way of resolving which translation of the English key words to prefer. Of course, most annotation projects have to cope with fuzzy critera, and even if this does not necessarily lead to useless training data, the general versatility is at stake. Even a classifier having learned to perfection an irrelevant concoction of annotations will fail when facing the real world. In the data at hand, 'fuck' provides a telling example. The word is quite frequent in current Danish vernacular and appears in some form in 74 OffensEval posts, all labeled OFF 3 . As shown in Rathje (2014b) the actual impact of offensive words vary markedly by the recepient's age, social background, role in the interchange, relation to the utterer (2nd or 3rd person), and more. Some readers take offense at the simple appearence of 'fuck' no matter the intended meaning while others hardly notice it (#2504 in fig. 1 is an example). Similar effects are seen for words like 'lort' (#507), 'bøsse', 'perker', 'kaelling', 'mullah', 'smatso', and so on (Rathje 2014b). An offense classifier not knowing such things will never surpass the IQ of a Pavlovian dog.

Regaining robustness
Faced with inconsistent training data and no context information we fight back by including some carefully crafted pieces of linguistic knowledge in the training procedure. This can be expected to enhance the classifier's performance, both in the narrow sense (more competitive in the OffensEval setting) and in a broader perspective (better prepared to face the world). To substantiate this claim we need to present a classifier either (i) superior to its competitors trained on the same data, or (ii) fully competitive even when using ML principles much simpler than its competitors while compensated by linguistic knowledge. For practical reasons we had to opt for the second option (alas), and so we adopted the simplest possible ML scheme based on unweighted regular expressions with a single-word scope (unigrams). Training was thus restricted to frequency analyses of the training data, leaving the lion's share to the linguist.

Danish groundwork
The data that form the basis for the classification of offensive language in this study, is collected in a sociolinguistic work on swear words and abusive terms (Rathje 2014a(Rathje , 2014b.

The survey
The list of swear words used in this study is generated from a research project on language differences in three generations (Rathje 2014b). The data was harvested from conversations between informants consisting of 24 Danish women in three different generations (young, middle-aged and elderly): 8 young girls (16-18 years old), 8 middle-aged women (37-46 years old) and 8 elderly women (68-78 years old). None of the informants knew each other beforehand. The informants' task was to talk in pairs for 30 minutes in a cafe. The dialogues were recorded and subsequently transcribed. Half of the dialogues were intra-generational, i.e. with two participants from each their respective generation, and the other half of the dialogues were inter-generational, i.e. with two participants form their respective generation. In this way, it was possible to investigate how the informants communicated with someone from another generation as compared to someone from their own generation and whether the generations spoke differently: The purpose was to identify generational language.
One of the investigated language characteristics was swear words as defined in Rathje 2014b and Rathje 2017, which also draws on earlier definitions of swear words by Andersson and Trudgill, 1990;Stenström, 1991;Stroh-Wollin, 2008;Allan and Burridge, 2006;Montagu, 1967;Ljung, 2011. "Swear words are words that refer to something that is taboo in the culture the language is used in, they must not be taken literally, and they are used to express emotions and attitudes, but they are not used for (other) people." (Rathje 2017) On the basis of this definition, a quantitative study was undertaken, and it was possible to determine the frequency and types of swear words in each generation. In the study it was found that the amount of swearing was the same in all three generations, but the types of swear words diverged generationally. The young participants primarily swore using English swear words (shit) and swear words stemming from the taboo area of 'the body's lower functions', i.e. sexuality and faeces (fuck, pis), while the middleaged and elderly generations more frequently used religious swear words (for fanden and du godeste).

Swear words and abusive terms
Consequently, in the present study, our basis is a list of swear words used in authentic Danish speech in and between three generations of women. A later study of attitudes to Danish swearing (Rathje 2014a) reveals that it was most often the diabolic religious swear words (for fanden, for satan) and the English (shit), sexual (fuck) of faecal (skide) swear words that were perceived as coarse, i.e. offensive (Culpeper, 2011), and these types of swear words are the ones that the young generation use in Rathje 2014b. This is the rationale for using these swear words in this study: (hvad) fanden, fuck, pisse, sgu, gud, (ikke en) skid, fandme, fucked up, (ad) helvedes (til), hulens, skide, sur røv, shit, holy shit, eddermaneme, (hvordan) fanden, røv-, søreme. In many ways, abusive terms are similar to swear words because they too express a feeling or an attitude, and they are also associated with taboos (Rathje 2014a). For example, luder (whore) has a connection with the taboo of 'prostitution', and other taboos can be 'homosexuality' (e.g. bøsse (gay)) and 'mental illness' (e.g. psykopat (psychopath)). Like swear words, abusive terms should not be understood literally: i.e. luder (whore) as an abusive term refers not to an actual prostitute, but it expresses an opinion about a particular person who is not a prostitute in the literal meaning. And precisely the fact that there is a person at whom the word has been directed separates swear words and abusive terms from one another: Abusive terms are used about people, while swear words are not. It is, therefore, abusive terms in particular that are experienced as being offensive.
The abusive terms used in the present study stem from a survey in which two generations expressed their conscious attitudes to Danish swear words (Rathje 2014a). This data consisted of 844 completed questionnaires about young people's and elderly people's attitudes toward swearing. Of those, 63% were answered by young people 13-14 years old, and 37% were answered by elderly people 65-93 years old. Among the younger participants, half were boys and the other half were girls, while the elderly respondents consisted of 18% men and 82% women.
For the question in the survey "What are the coarsest swear words you know?", terms of abuse constituted as much as 68% of the words that young people mentioned and 17% of the words from the elderly. Many of the informants clearly did not distinguish between swear words and terms of abuse, but these answers have provided the list below: the most often-mentioned terms of abuse by young and old informants.

Exporting data to NLP
These are the ones we have used in our offense detection.
The young people's top 10 terms of abuse are (Rathje 2014a): available in Rathje (2014b). Even this blunt use of linguistic quality data boosted the performance significantly.

Evaluation
Is Smatgrisene competitive? This depends on how 'competitiveness' is determined. We found these criteria to be natural: Firstly, the classifier must meet or surpass the state-of-the-art prior to OffensEval 2020 (Sigurbergsson et al. 2020). Secondly, it must compare favourably with its competitors, being  among the best-scoring half  above the average score for all participants  above the median score for all participants According to the official scoreboard (https://arxiv.org/pdf/2006.07235.pdf). Smatgrisene reached F1=0.759 earning a shared 14th place, meeting our success criteria comfortably. While Smatgrisene may not be the champion of OffensEval Challenge 2020, it is demonstrably competitive. Moreover, as the table shows, Smatgrisene clearly owes its competitiveness to the combination of AI and real I.

Conclusion
We have presented the classifier Smatgrisene (the name being a rarely, if ever, used lexical variant of 'smatso', one of the most abusive epithets in Danish), trained to detect offensive language in SoMe posts. The training algorithm combined simple surface rules (derived from the OffensEval training data) and deep linguistic knowledge (distilled from a comprehensive academic study of Danish offensive language, Rathje (2014b)). Despite its extreme formal simplicity, Smatgrisene did unexpectedly well in the OffensEval Challenge 2020 and came out in the top third of participants.
A real achievement, however, would be to utilize the rich metadata provided in Rathje (2014b) in a future classifier for offense detection. Rathje (2014b), like others in the same tradition, analyses the perception of potentially offensive language as a function of the interlocutors' age, social standing and societal background, and of the purpose and intentions of the interchange. We surmise that such synthetic knowledge could sharpen the focus of classifiers monitoring the communication channels in chat rooms and social media. Offensive communication is a complex social game. At the end of the day there is no such thing as an intrinsically offensive word.
"More data will solve any problem", "acribia is for sissies", "fire your linguists!". Such fresh attitudes are currently shared by many manufacturers of language technology. We invite the serious developer to rediscover the power of linguistic insight.