Targets and Aspects in Social Media Hate Speech

Mainstream research on hate speech focused so far predominantly on the task of classifying mainly social media posts with respect to predefined typologies of rather coarse-grained hate speech categories. This may be sufficient if the goal is to detect and delete abusive language posts. However, removal is not always possible due to the legislation of a country. Also, there is evidence that hate speech cannot be successfully combated by merely removing hate speech posts; they should be countered by education and counter-narratives. For this purpose, we need to identify (i) who is the target in a given hate speech post, and (ii) what aspects (or characteristics) of the target are attributed to the target in the post. As the first approximation, we propose to adapt a generic state-of-the-art concept extraction model to the hate speech domain. The outcome of the experiments is promising and can serve as inspiration for further work on the task


Introduction
Online hate speech and, in particular, hate speech in social media, is the cause for growing concern. Already six years ago, 73% of adult internet users have seen someone harassed online, and 40% have personally experienced it (Duggan, 2014). Therefore, research on hate speech identification is of increasing importance. A significant body of work has been conducted over the last decade; cf., e.g., (Waseem and Hovy, 2016a;Schmidt and Wiegand, 2017;Davidson et al., 2017a;Fortuna and Nunes, 2018;Kennedy et al., 2020). Most of this work focused on the task of classifying, for instance, social media posts, with respect to predefined typologies of rather coarse-grained hate speech categories, such as 'hate speech', 'racism', 'sexism', 'offense', etc. This may be sufficient if the task is to detect and remove abusive language posts. However, for instance, in the US, hate speech has been repeatedly judged as being covered by the First Amendment. 1 Furthermore, a number of studies suggest that hate speech cannot be successfully combated by merely removing identified hate speech posts 2 and should be countered by education and counternarratives (Tekiroglu et al., 2020;Mathew et al., 2019). But to provide a basis for education and counter-narratives, we need a more detailed analysis of hate speech. In particular, we need to identify (i) who is the target in the identified hate speech post, and (ii) what aspects of the target are referred to or what aspects are attributed to the target in the post. For instance, we need to be able to determine that post (1) below targets Muslims of Palestine and that it attributes to them to be terrorists. Similarly, for post (2), we need to determine that it targets female sports reporters and that they "should come to an end" (i.e., that they should be removed from their jobs). 3 The analogy to aspect-oriented sentiment analysis (Schouten and Frasincar, 2016) is evident.
(1) I'm standing outside and looking in and there isn't a shadow of doubt that the Muslims of Palestine are the terrorists.
(2) I'm not sexist but female sports reporters need to come to an end. different interpretation. In this paper, we present an approach that is different from these works and that aims to identify (i) the entity (most often, a group of individuals or an individual) who is targeted in the post, without drawing upon a predefined range of categories (which will necessarily be always limited and coarse-grained and will not cover new or intersecting categories, like 'black women'), (ii) the aspect (or characteristics) assigned to the targeted entity.
We use an open-domain neural network-based concept extraction model for the identification of target and aspect candidates in each post. The obtained candidates are then further processed taking into account the idiosyncrasies of the codification of both targets and aspects in the domain. The remainder of the paper is structured as follows. In the next section, we introduce the notions of target and aspect we are working with. Section 3 summarizes the work that is related to ours, including aspect-oriented sentiment analysis, to which our proposal shows some clear analogies. Section 4 outlines the generic concept extraction model from which we start and presents its adaptation to the problem of target and aspect extraction from hate speech data, while Section 5 describes the experiments we carried out and discusses their outcome. Section 6, finally, draws some conclusions and outlines several lines of research that we aim to address in the future.

Targets and Aspects
Let us define more precisely what we mean by'target' and 'aspect' in the context of our work.
Definition 1 (Target). A target is the entity that is in the focus of a hate post, i.e., the entity that incurs the hate of the author.
Very often, the target is an individual or a group of individuals, e.g., women, people of color, refugees, Muslims, Jews, etc.: (3) Bruh im tired of niggas retweetin Miley Cyrus naked that bitch aint no types of bad.
However, the target can also be a specific political conviction, a religion, an object related to an individual or a group of individuals, etc.; see, e.g., feminist novels in (4): 4 (4) I'm not sexist, but nothing bores me more than feminist novels.
Definition 2 (Aspect). Aspect is a characteristic, attitude, or behavior or the lack of it (as a rule, with a pejorative connotation) that the author attributes to the target.
The aspect is often expressed as a modifier(e.g., boring, stupid, lazy, not funny, etc.) of the target in the focus of the author, as in: (5) I'm not sexist, but female comedians just aren't funny (6) I'm not sexist but *most girls are fucking stupid.
where not funny is the aspect of female comedians (5) and fucking stupid of (most) girls (6). It can also be a verbal group, as can't cook in (7): (7) Scoring like a Cunt because you can't cook for shit isn't fighting hard Kat.
In some posts, no targets and/or aspects can be identified; see, e.g., (8).
(8) I asked that question recently and actually got an answer http://t.co/oD98sptcGT.
We discard such posts in our current experiments.

Related Work
As mentioned in Section 1, most of the works on online hate speech focused on the task of classifying social media posts with respect to predefined typologies of rather coarse-grained hate speech categories, such as 'hate speech', 'racism', 'sexism', 'offense', etc. (Schmidt and Wiegand, 2017;Davidson et al., 2017a;Fortuna and Nunes, 2018;Swamy et al., 2019;Arango et al., 2019;Salminen et al., 2020;Kennedy et al., 2020;Rajamanickam et al., 2020). 5 Vidgen and Derczynski (2020) distinguish between binary classification (as in (Alfina et al., 2017;Ross et al., 2017)), multi-class classification into several hate speech categories (e.g., 'racism', 'sexism', and 'none' in (Waseem and Hovy, 2016b)), different strengths of abuse classification (e.g., 'hateful', 'offensive' and 'neutral' contents as in (Davidson et al., 2017b)), classification into different types of statements (e.g., 'denouncing', 'facts', 'humor', 'hypocrisy' and others) and themes (e.g., 'crimes', 'culture', 'islamization', 'rapism' and others) as in (Chung et al., 2019)), and classification of different focuses of abuse (e.g., 'stereotype & objectification', 'dominance', 'derailing', 'sexual harassment', 'threats of violence', and 'discredit' as in (Fersini et al., 2018)). All these works do not aim to identify the specific targeted group of individuals or the individual and neither do they aim to identify characteristics of the targets that provoked hate. Rather, they identify posts related to hate speech in general or to one of its more specific categories -which is a step prior to detection of targets and aspects, where we start. Some previous works use a similar terminology as we do, but with a different meaning. For instance, Zainuddin et al. (2017Zainuddin et al. ( , 2018Zainuddin et al. ( , 2019 aim to identify the sentiment (positive or negative) of the author of a given post towards a range of specific hate speech categories (e.g., 'race' and 'gender'), which they call "aspect". In (Gautam et al., 2020), tweets related to the MeToo movement are annotated manually with respect to five different linguistic "aspects": relevance, stance, hate speech, sarcasm, and dialogue acts. In this case, too, the interpretation of the notion of aspect is different from ours. Ousidhoum et al. (2019) define five different "aspects" that include specific targets, among others: (i) whether the text is direct or indirect; (ii) whether it is offensive, disrespectful, hateful, fearful out of ignorance, abusive, or normal; (iii) whether it is against an individual or a group of people; (iv) the name of the targeted group (16 common target groups are identified); and (v) the annotators' sentiment. Fersini et al. (2018) are also concerned with target detection in that they determine whether the messages were purposely sent to a specific target or to many potential receivers (e.g., groups of women). In (Silva et al., 2016), targets are identified using a short list of offensive words built drawing upon Hatebase 6 and a single template "<one word> people" to capture "black people", "stupid people", "rude people", etc.
Our work also aligns with Mathew et al. (2020) and Sap et al. (2020) in the sense that Mathew et al. (2020) annotate a hate speech dataset at the word and phrase level, capturing human rationales for the labelling (which is similar to the target-aspect labelling), while Sap et al. (2020) propose to understand and fight hate speech prejudices with accurate underlying explanations. However, Mathew et al. (2020) take into account only three labels ('hate', 'offensive', and 'normal') and ten target communities performing supervised classification, while we aim at retrieving and distinguishing open-class targets and aspects in a semi-supervised manner. Sap et al. (2020) perform supervised training of a conditional language generation model that often results in generic stereotypes about the targeted groups rather than in implications meant in the post, while we use a language generation model only to produce candidates and further expand, rank, and select them such that a connection of a target and an aspect to the text is guaranteed.
To summarize, although the identification of the targets and characteristics of hate speech in the above works are significant advancements compared to the more traditional hate speech classification, all of these works still assume predefined target categories and do not identify which characteristics of the targets are concerned. In contrast, open-class target and aspect extraction may allow for modeling of the particular forms of discrimination and hate experienced by individuals or groups of individuals covered or not covered by previously identified target categories.
As already mentioned in Section 1, our work is also related to aspect-oriented sentiment analysis, in which "targets" are specific entities (e.g., products, sights, celebrities) and "aspects" are characteristics or components of a given entity (Kobayashi et al., 2007;Nikolić et al., 2020). For each identified aspect, the "sentiment value" aligned with it is extracted; see, e.g., (Nazir et al., 2020) for a recent comprehensive survey of aspect-oriented sentiment analysis. In some (more traditional) works, aspects and their values are identified in separate stages (Hu and Liu, 2004;Hai et al., 2011). In more recent works, both tasks are addressed by one model, with aspects being partially identified by attention mechanisms realized, e.g., in an LSTM (Wang et al., 2016), CNN (Liu and Shen, 2020) or an alternative common deep NN model. The targets are, as a rule, predefined, such that the challenge consists in analysing the sentiment of tweets towards these predefined targets; cf., e.g., (Tang et al., 2016;Dong et al., 2014). The problem of open-class target identification has not been broadly investigated and sometimes solved simply as a named entity recognition problem due to the nature of the data in wihch the targets are often represented by proper names (Mitchell et al., 2013;. However, targets in hate speech texts go far beyond named entities, and the overall task is inverse to targetoriented sentiment classification: given a known category (hate speech of negative sentiment as a rule), we have to identify the hate target and its corresponding "opinioned" aspect. Still, our proposal is similar to the modern approaches to aspectoriented sentiment analysis in the sense that we also use an NN model (in our case, LSTM-based encoder) with attention mechanisms for initial hate speech target and aspect candidates identification, before a domain-adaptation post-processing stage.

Outline of the Model
The study of social media hate speech posts reveals that targets are entities that are, as a rule, verbalized in terms of classifying nominal groups (Halliday, 2013). Aspects may also be expressed by classifying nominal groups, but adjectival (attributive) and participle groups (actions) are also common. In other words, overall, targets can be considered concepts (Waldis et al., 2018). Therefore, we envision the detection of surface forms of targets in the posts primarily as a concept extraction (CE) task. For aspects, it is often not sufficient to apply concept extraction if we want to also capture the adjectival and verbal group aspects.
Given that hate speech datasets are, in general, too small to serve for training neural networks for reliable concept extraction, we opt for applying an open-domain-oriented concept extraction model with a follow-up algorithmic domain adaptation.

Generic Concept Extraction
As an open-domain concept extraction model, we use an open-source state-of-the-art model that comprises two pointer-generator networks pretrained on different concept-annotated datasets within distant supervision (Shvets and Wanner, 2020). Given a sentence, each network generates a list of concepts which are then merged and aligned with the sequence of tokens of a sentence. In case of ambiguity due to the overlap of surface forms of concepts, the first detected and the longest spans are selected as the resulting positions; see the implementation in the original publicly available code published along with the released models. 7 The model is a sequence-to-sequence model; cf. Figure 1. The pointer mechanism makes it possible to copy out-of-vocabulary words directly to the out-come, which is especially relevant to our work, as the hate speech dataset includes specific words unseen during generic training, such as proper names, hashtags, and Twitter names. The generator implies the ability to adjust internal vocabulary distribution for selecting the next word (which might be a termination token "*") based on weights of global attention a t (Luong et al., 2015), which are updated at each generation step t. The probability of generating the next word instead of copying one is defined as follows: where h * t is the sum of the hidden states of the encoder weighted with the attention distribution a t , x t is the decoder input, s t is the decoder state, w h * , w x , w s , b ptr are learnable parameters, and σ is the sigmoid function. The encoder is a stacked bidirectional LSTM, while the decoder is a stacked unidirectional LSTM (Hochreiter and Schmidhuber, 1997).

Domain Adaptation
The goal of the domain adaptation with respect to target and aspect determination in hate speech posts is to take into account the most relevant idiosyncrasies of the genre into account. In the case of targets, the following observations can be made with respect to such idiosyncrasies: (i) While in generic discourse, targets can be assumed to be classifying nominal groups (see above), in hate speech, we observe also adjectival and participle targets that need to be captured.
(ii) Some targets form part of compounds and would thus be skipped by Shvets and Wanner (2020)'s generic concept extraction algorithm since it was trained to generate tokens from the input sentence without compound decomposition; cf., e.g., Daeshbags ≡ Daesh+bags. The consideration of "subwords" instead of entire words has already proved to be beneficial for many NLP applications, including, e.g., machine translation (Sennrich et al., 2016). We thus consider also subwords of tokens.
(iii) As a rule, a single post contains one target only; multiple targets are very seldom in short posts. 8 This means that all target candidates in a post must be ranked in terms of their probability to be a target. A high term frequency of a candidate across the posts implies a higher probability that this candidate is a common target, such that we favour candidates with a higher term frequency in a reference corpus. However, this is not the only criterion as this would introduce a strong bias towards frequent terms and contradict the idea of having unseen open-class targets. If no nominal candidates have been identified, we favour the adjectival/participle candidate with the highest tf*idf, with the term frequency (tf ) being calculated over a reference corpus and the inverse document frequency (idf ) being calculated over the English Gigaword v.5 corpus (Parker et al., 2011). The same idea applies to aspects: aspect candidates should be ranked with respect to their likeliness to be a real aspect. To determine aspect candidates, we take into account the PoS and their position with respect to the previously determined target. Candidate aspects are: (i) concepts, which are detected by Shvets and Wanner (2020)'s generic concept extraction algorithm and which precede or follow the target; (ii) adjectival or participle modifiers either preceding or following the target.
Similarly to non-nominal target candidates, we favour aspect candidates with the highest tf*idf, but regardless of their PoS. In addition to frequency terms, we chose several variables that give priority to different target and aspect candidates, depending on the weight assigned to them. They are listed in Table 1. Learning the weights within the domain adaptation stage using target-aspect expert annotated posts results in ranking criteria that are further used for the selection of target and aspect candidates in other (unseen) posts.
Three algorithms carry out the target and aspect identification. Algorithm 1 fine-tunes the weight variables from Table 1 for domain-specific targetaspect identification. An exhaustive weight vari-8 Only about 2% of the posts in our dataset contain two targets. In order to expand the coverage of our algorithm, we plan to consider in our future work also datasets with longer texts; see the discussion of Figure 2 for details. able fine-tuning procedure is run over all variable weight combinations. Algorithm 1 takes as input a domain reference dataset from which nominal concepts are extracted using the generic concept extraction model (reference target candidates T ref ), and a development dataset from which new domainspecific targets and aspects are extracted (not necessarily nominal) using Algorithms 2 and 3. Expert annotation of the development dataset TA TRUE d serves as a reference during the weight variable tuning procedure.
Algorithm 2 outputs the target-aspect pair of a given post, extracted using the weight variables. It calls Algorithm 3 for the first stage target identification by a ranking based on variables a 1 -a 5 , then refines the delivered target and identifies the aspect by a ranking based on variables v 1 -v 5 .
Var Weight of a1 nominal target candidate a2 proper name target candidate a3 target candidate comprises entire words a4 position of the candidate in p a5 expansion of the detected target v1 temporal expansion of the detected target within aspect detection v2 expansion of the detected aspect v3 nominal concept aspect candidate located in a span following the target v4 adjectival/participle aspect candidates regardless of their location in a post v5 nominal concept aspect candidate located in a span prior to the target The development set (of 100 posts) and test set (of 440 posts) are annotated in terms of targets and aspects by three annotators. For this purpose, the annotators were provided with the definitions of the notions of 'target' and 'aspect' (see Section 2) and the instruction to first identify the target (which // Select discrete values for components of a and v iteratively on a grid to extract target-aspect pairs from S dev ; foreach p d : post ∈ S dev do TA d ← TA d ∪ GetTA Pair (p d , T ref , a, v) // Get target-aspect pairs using Algorithm 2: GetTA Pair; end rav ← Score(TA d , TA TRUE d ) // Score resulting pairs TA d ; if tin ≡ modifier ∈ AP M + concept then tin ← concept; a best ← modif ier // Select concept in tin as updated target tin and its modifier as a best ; else Ac ← {c | ∀cs : cs IS subword(c), c ∈ C OR c ∈ AP M ∧ ts : ts IS subword(tin) ∧ cs = ts}; // Identify concepts and modifiers in p which do not have common subwords with the extracted target tin; A * ← Order(W eight(Ac, v)); // Weight concepts and modifiers in p according to v and order them in descending weight order a best ← F irstElement(A * ) // the top-ranked aspect candidate; tout ← tin; aout ← SelectIF (a best , Expand(a best ), v) // Output a best or a best expanded to a complete group depending on v.
Algorithm 3: GetTarget: Target determination Input: p: post, T ref : target candidates in reference data, C: concepts in p, AP M : adj/participle modifiers in p, a: fine-tuned weight variables Output: tout: identified target Tp ← {t | t ∈ C ∧ t ∈ T ref } // Identify concepts in p already seen as target candidates in the reference data; Tc ← {c | (c ∈ C ∧ tp ∈ Tp : tp = c)} // Identify other concepts in p; T sub ← SubwordConcepts(C) ∪ SubwordConcepts(AP M ) ; // Identify concepts in p which are subwords in nominal compounds or adjectival/participle modifiers; // Collect subword concepts in p that overlap with the target candidates seen in the reference data; T disj ← {c | c ∈ T sub ∧ c / ∈ T ref }; // Collect subword concepts in p that do not overlap with the target candidates seen in the reference data; T * 1 ← Order(W eight(Tp ∪ T overlap , a)); // Weight concepts + subword concepts in p seen as target candidates in the reference data according to a and order // them in descending weight order T * 2 ← Order(W eight(Tc ∪ T disj ∪ AP M, a)); // Weight other concepts + subword concepts in p according to a and order them in descending weight order T * ← AP P EN D(T * 1 , T * 2 ) ; t best ← F irstElement(T * ) // the top-ranked target candidate; tout ← SelectIF (t best , Expand(t best ), a) // Output t best or t best expanded to a complete group depending on a.

Domain adaptation
Our domain adaptation consists in applying Algorithms 1-3 to the reference dataset (S ref ) of 5205 posts and the development set (S develop ) of 100 posts from 'racism' and 'sexism' categories of (Waseem and Hovy, 2016a). Shvets and Wanner (2020)'s concept extraction detects in S ref about 7K concepts in the 'sexism' subset (e.g., 'dinner', 'iq', 'wings', 'abortion', 'female commentator', 'women', 'girls', etc.), and about 4K concepts in the 'racism' subset (e.g., 'hypocrite', 'armies', 'death cult', 'countries', 'honor killings'). Already at the first glance, we reckon that not all of them can be targets in the sense defined in Section 2. This shows the importance of the proposed domain adaptation. The concepts with the highest tf in the S ref (and thus the candidates to be targets) are shown in Table 2. Note that for the tf figures, we used only concepts from the S ref that appear in the subject position in S ref , as we observed that 94% of the targets in S develop are subjects in S ref . It is also worth noting that this list of generic targets provides only candidates that are further dynamically extended by other concepts for each new post, such that generic candidates may appear in a compound target or can even be dropped altogether.
The fine-tuning procedure of Algorithm 1 provides a 1 = 10 6 a 2 = 10 3 max (tf a 3 = 10 −3 a 4 = 10 −6 = 0; a 5 = 0, and v 1 = 1, v 2 = 1; v 3 = 10 9 v 4 = 10 6 v 5 = 10 3 Length(p) (p being the post under consideration). Thus, the importance of variables for target detection is the following: nominal target candidate > proper name target candidate > target candidate comprises entire words > position of the candidate in p. For aspects, this procedure results in: nominal concept candidate following target > adjectival/participle candidate > nominal concept candidate preceding the target.

Target and Aspect Extraction
After the adaptation, we identify the targets and aspects using the fine-tuned weight variables a best and v best (specified in Section 4.2) in the test set (S test ) of 440 posts. Consider a few examples, with the identified targets and aspects marked in bold. (10) There's something wrong when a girl (Target) wins Wayne Rooney street striker (Aspect) #NotSexist.
(12) But why propagandize your bigotry when Pakistani Muslims (Target) are murdering Christians and Hindus for blasphemy (Aspect)?
(13) Why haven't you stopped the sick Muslims (Target) from trying to exterminate Israel (Aspect)?
(14) Kat (Target) is a sociopath (Aspect) #mkr We can observe that the identified targets are nominal entities, while the aspects are mainly verbal groups that have been obtained by the expansion of an initial nominal aspect candidate (Christian world, women, etc.) to a full verbal group. However, as (11) and (14) show, we cannot reduce aspect identification to verbal group extraction: an aspect can readily be also a nominal group.
We evaluated the performance of the proposed model along with several baselines for target identification on S develop (dev) and S test (test) with respect to accuracy in terms of the Jaccard index, partial and exact match and with respect to precision, recall and F1 for ROUGE-L (Lin, 2004); cf. Table 3. The first baseline takes the first noun as a target. This baseline already provides many correct matches due to the reduced lengths of the posts in our dataset. The second baseline identifies a noun  Table 3: Evaluation of the quality of the detected targets and aspects on the development and test set with a hypernym person or group that is a relevant candidate entity according to the definition of a target. We also fine-tuned a BERT model (Devlin et al., 2019) on the development set for target recognition in order to compare our pointer-generatorbased model to transformer-based models.
We can observe that target identification as invoked by the GetTA Pair (Algorithm 2) achieves a rather good performance. Thus, the accuracy for the exact match between the ground truth targets and predicted targets is 0.65 for the development set and 0.57 for the test set. With BERT, we achieve somewhat lower accuracy. It is interesting to observe that combining GetTA Pair with BERT results in lower accuracy for the exact match, but in considerably higher accuracy (of 0.82) for a partial match, i.e., the match between the semantic head of the predicted target and the semantic head of the ground truth target. This is likely due to the limited amount of material in the development set, which seems to be sufficient to learn the essence of what an aspect is, but is not sufficient to learn well the composition of the aspect in terms of lexico-syntactic patterns. 10 The performance for aspect recognition is, in general, lower, which can be explained by the higher complexity of the task. However, for the partial aspect match, the accuracy is still 0.74, and the ROUGE-L F1 score is 0.45. Table 4 shows the performance of our target detection algorithm with different variable settings. As can be observed, just the use of the tf*idf feature 10 Pretraining BERT on concept annotated datasets may improve the figures for the exact match. If this proves to be the case, transformer-based models are likely to outperform other models on the overall target identification task.  When only concepts in the subject position are taken into account as target candidates, the Jaccard index improves significantly; the best performance is achieved when all variables are set as indicated in the description of the Algorithms 2 and 3. In addition, we assessed the performance of the model when Algorithm 3 is applied successively several times, excluding targets predicted at previous steps from consideration. Similarly, for each detected target we ran several times Algorithm 2. The improvement in ROUGE-L score with each run is shown in Figure 2, when the best of the predicted top n targets and the best corresponding top n aspects are scored. Figures provided for aspects correspond to the second run of the Algorithm 3, but this does not distort the overall picture since they are at the same scale for any number of predicted targets. We can observe a steady increase in performance already for small values of n, which shows the potential of our model. This strategy of selecting top n targets can also be used for detecting multiple targets in longer texts.
To verify that the proposed fine-tuning procedure of the weight variables is not dataset-specific, we ran it also on the negative sentiment subset of (Dong et al., 2014) as T ref , with the targets originally obtained through dictionary search as test set targets. 11 To avoid a bias in the evaluation by "seen" targets, we ensured that 50% of the targets in  the test set are unseen by removing a number of examples with targets appearing in both the reference set and the test set from the reference set. Table 5 shows the scores obtained in this experiment for targets. We can observe that the evaluation figures are even considerably higher than those in Tables 3  and 4. This is likely because of the high percentage of named entities in this dataset, which facilitates an accurate detection of concepts.

Conclusions and Future Work
Classification of hate speech in terms of broad categories is not sufficient; in order to effectively combat hate speech, a detailed target-aspect analysis is necessary. We presented a model that adapts a generic concept extraction model and showed that it is able to reach a reasonable quality for target and aspect identification in the 'sexism' and 'racism' categories of the (Waseem and Hovy, 2016a) hate speech dataset. The model is semi-supervised and works already with a small annotated dataset. This is an advantage in view of the absence of large hate speech datasets annotated with the target-aspect information.
Despite the promising figures, our model still has some limitations. Thus, aspect identification quality should be further improved. Furthermore, we plan to use distance learning in order to make the model language-independent, which will be an advantage compared to the presented implementation, which is to a certain extent language-specific. In addition, experiments on other hate speech datasets should be carried out in order to demonstrate that the proposed variable tuning and implemented syntactic target and aspect patterns generalize well across datasets. Finally, although the vast majority of posts indeed contains just one target, to capture multiple targets would be desirable.
The annotated development and test sets and the code are available in the following GitHub repository: https://github.com/ TalnUPF/HateSpeechTargetsAspects/.