Hierarchical Enhancement Framework for Aspect-based Argument Mining

,


Introduction
Argument mining, a critical task within computational argumentation, has gained considerable attention, evident from available datasets (Stab et al., 2018;Trautmann et al., 2020), emerging tasks (Wachsmuth et al., 2017;Al-Khatib et al., 2020), and machine learning models associated to this domain (Kuribayashi et al., 2019;Chakrabarty et al., 2019).As aspect-level sentiment analysis tasks have flourished, Aspect-Based Argument Mining (ABAM) has recognized the need to decompose argument units into smaller attributes and define aspect terms as components with specific meanings in arguments (Trautmann, 2020).Example 2: Granted, the initial construction costs of a nuclear plant are huge, but the ongoing maintenance and fuel costs have proven to be far lower than that of other energy sources.
Example 3: While uniforms may help limit bullying within a school, they can also cause bullying by students from other schools.
Figure 1: Example annotation of the argument units, the corresponding stances (yellow: supporting/pro; blue: opposing/con) and the aspect term (italics framed in red) for the topics nuclear energy and school uniforms.
Previous works have attempted to combine aspects and arguments, but they often lack a clear definition of the relevant task.For instance, Fujii and Ishikawa (2006) primarily focuses on summarizing viewpoints and defining auguring points.Similarly, Misra et al. (2015) further groups the arguments under discussion into aspects of the argument.Furthermore, Gemechu and Reed (2019) considers aspects in argument relations as part of four functional components.Only recently, a study by Trautmann (2020) specifically addresses aspect term extraction and emphasizes the definition, introducing the concept of Aspect-Based Argument Mining (ABAM).The main objective of ABAM is to identify the argument units that support corresponding stances under a controversial topic, along with the aspect terms mentioned within these argument units.In this context, an argument unit is typically defined as a short text or span, providing evidence or reasoning about the topic, supporting or opposing it (Stab et al., 2018).On the other hand, an aspect term is defined as the crucial facet/part the argument unit is trying to address, representing a core aspect of the argument (Trautmann, 2020).
Figure 1 illustrates four examples within the topic of nuclear energy and school uniforms.For the first set of examples (example 1 and example 2), argument unit (yellow or blue) opinion are expressed around aspect terms (italics framed in red), such as cost, nuclear plant, and maintenance.Similarly, the second set of examples (example 3 and example 4) revolves around several aspect terms such as students, and bullying.This targeted approach enables a more precise analysis by zooming in on specific aspects that are essential to the argument unit.Moreover, these aspect terms enable the comparison of support or opposing opinions at the aspect level, thereby facilitating the acquisition of more nuanced and fine-grained conclusions.ABAM, as defined by Trautmann (2020), is treated as a Nested Named Entity Recognition (NNER) task which presents three key challenges: 1) How to construct a robust underlying representation to effectively encode contextual information?In the realm of Natural Language Processing (NLP), the significance of a robust underlying representation serves as a cornerstone for achieving excellent model performance.2) How to mine the correlation between opinion expressions corresponding to different stances under the same topic?Since different users may give different expressions of viewpoints on same stances within a given topic.As shown in Figure 1, authors express different opinions around similar aspect terms under the same topic.Exploring the relationship between these opinion expressions can greatly assist in determining the corresponding stances of different argument units accurately.3) How to leverage task-specific features to improve the extraction of argument units and aspect terms?By investigating the unique task characteristics and data properties of the ABAM task, we aim to enhance the model's performance significantly.
Overall, we propose a novel Hierarchical Enhancement Framework (HEF), consisting of four modules: basic module, argument unit enhancement module, aspect term enhancement module, and decision module.With regard to the three challenges above, the paper presents three key components accordingly.In the basic module, we propose the Semantic and Syntactic Fusion (SSF) component to fine-tune the representation of the pretrained language model.This fine-tuning helps us complete the initial recognition stage of argument units and aspect terms.Next, in the argument unit enhancement module, we leverage the argument unit boundary information provided by the basic module.By integrating a span-based method and utilizing the proposed Batch-Level Heterogeneous Graph Attention Network (BHGAT) component, we are able to judge the stance of the argument unit, thereby refining the categorization of the initially recognized argument units.Moving on to the aspect term enhancement module, we introduce the Span Mask Interactive Attention (SMIA) component.By incorporating span masks and interactive guidance through attention mechanisms, we can better capture and identify aspect terms within the specified boundaries.Finally, in the decision module, we combine the initial recognition results with the enhancement results to produce the final output.Our contribution can be summarized as follows:

Related Work
The objective of Misra et al. (2015) is to identify specific arguments and counter-arguments in social media texts, categorize them into different aspects, and utilize this aspect information to generate argument summaries.Similarly, Misra et al. (2016) focus on inducing and identifying argument aspects across multiple conversations, ranking the extracted arguments based on their similarity, and generating corresponding summaries.However, these earlier works have been limited to a few specific topics.In recent research, the approach has been extended to cover a broader range of 28 topics, introducing a novel corpus for aspect-based argument clustering (Reimers et al., 2019).Furthermore, Gemechu and Reed (2019) decompose propositions into four functional components: aspects, target concepts, and opinions on aspects and target concepts.By leveraging the relationships among these components, they infer argument relations and gain a deeper understanding of the argument structure.In a different study, Bar-Haim et al. (2020) focus on summarizing the arguments, supporting each side of a debate, mapping them to a concise list of key points, which are similar to the aspect terms highlighted earlier.Lastly, Trautmann (2020) redefines the aspect-based argument mining task based on clause-level argument unit recognition and classification in heterogeneous document collections (Trautmann et al., 2020).
3 Framework , where y AU ∈ {B con , I con , E con , B pro , I pro , E pro , O} and y AT ∈ {B asp , I asp , E asp , O}.

Basic Module
The sentence and topic are concatenated as the input to BERT: , where [CLS] and [SEP] are special tokens.The contextualized representations of each token X = [x w 1 , x w 2 , ..., x w n ] can be given as: (1) Note we also incorporate orthographic and morphological features of words by combining character representations (Xu et al., 2021).The characters representation with in w text i as w char i .Then we use LSTM to learn the final hidden state x c t as the character representation of w text i : The final token representation is obtained as follows: where [; ] denotes concatenation, and x p t is the partof-speech tagging of w text t .
Encoder.The LSTM is widely used for capturing sequential information in either the forward or backward direction.However, it faces challenges when dealing with excessively long sentences, as it may struggle to retain long-distance dependencies between words.To address this limitation and exploit syntactic information, we propose the Semantic and Syntactic Fusion (SSF) component by sentence-level GNNs, aiming to bridge the gap between distant words by effectively encoding both sequential semantic information and spatial syntactic information.
The input of SSF: previous cell state c t−1 , previous hidden state h t−1 , current cell input x t , and an additional graph-encoded representation g t , where c 0 and h 0 are initialized to zero vectors, g t is a graph-encoded representation generated using Graph Attention Network (GAT), which are capable of bringing in structured information through graph structure (Hamilton et al., 2017).
The hidden state h t of SSF are computed as follows: where, f t , i t , o t , c t are equivalent to the forget gate, input gate, output gate, and cell unit in the traditional LSTM respectively, m t and s t are used to control the information flow of g (l) t .Finally h t is the output of SSF.
Star-Transformer (Guo et al., 2019) can measure the position information more explicitly and is more sensitive to the order of the input sequence.Building upon this insight, we use the output of SSF component as the input of Star-Transformer to re-encode the context to complete the encoder part.Decoder.CRF has been widely used in NER task (Xu et al., 2021;Li et al., 2021).For an input sentence, the probability scores z AU t and z AT t for all tokens x i ∈ X over the argument unit tags and aspect term tags are calculated by CRF decoder:

Argument Unit Enhancement Module
Motivated by span-based methods, we utilize argument unit labels z AU t that predicted by the basic module to obtain boundary information of argument unit spans, and re-evaluating the stance of each span, thereby correcting the z AU t labels, which is argument unit enhancement module (AUE).We observe that different users often express similar opinions, when discussing similar aspect terms.Learning these similar opinion expressions can assist in distinguishing the corresponding stances of argument units.Furthermore, pre-trained language models are widely adopted due to their ability to generate robust contextual representations.However, different contexts can yield different representations for the same word.Exploring the correlation between different context representations of the same word can aid in optimizing the underlying representations.To this end, we propose the Batch-level Heterogeneous Graph Attention Network (BHGAT) component.BHGAT combines the strengths of sentence-level GNNs (Zhang et al., 2019;Wang et al., 2020) and corpus-level GNNs (Wang et al., 2019;Yao et al., 2019;Linmei et al., 2019).While utilizing the deep contextual word representations generated by pre-trained language models, BHGAT facilitates communication of opinion expressions among different samples and establishes correlations between different representations of the same word.This enables the optimization of the representations of various heterogeneous nodes within the graph.Constructing a Graph Neural Network involves defining an initial representation for each node, an adjacency matrix, and a node update method.
Node initialization.In our proposed BHGAT, we distinguish between two types of nodes: argument unit nodes h au i and word nodes h star t .The specific operation is as follows: where h star start i and h star end i are the starting and ending word representation of i-th argument unit au i , and au i (topic) is the topic of au i .
The (au i , au j ) captures the relationships between argument units within a batch, facilitating communication and understanding among argument units that share the same topic, and allowing us to learn similar opinion expressions: The (au i , word j ) represents the association between argument units and words, using the attention mechanism between nodes to complete the update of argument unit nodes and word nodes, which is: Similarly, the (word i , au j ) is denoted as: The (word i , word j ) focuses on node representations of the same word in different contexts.This part facilitates the information interaction of word nodes between different argument units, which is conducive to optimize the dynamic representations of the underlying words.
Furthermore, the diagonal values of A are all ones.
Node update.We adopt the method of information aggregation to complete node updates, similar to GAT (Veličković et al., 2018).The specific operation is as follows: where hg i is the representation of nodes for l-th layer.
Finally, we perform stance classification on the representations of argument unit nodes.The probability p au i of different stance belonging to au i is calculated as follows: where p au i = [p con au i , p pro au i , p non au i ] is the stance probability distribution of the argument unit au i .
Through BHGAT, we first obtain the stance classes of the corresponding argument units.Then, we map the obtained stance classes to the enhanced argument unit recognition space, resulting in the vector z AU E t .z AU E t can be divided into two parts: the boundary part and the stance part.In the mapping process, first, according to the boundary of the argument unit label z AU t , it is judged whether the boundary of z AU E t corresponding to the token t is B-*, I-*, E-* or O. Then according to p con s i , p pro s i , p non s i , we determine the stance part in z AU E t (*-con, *-pro, O) as follows: where s au i is the start position of argument unit au i and e au i is the end position.

Aspect Term Enhancement Module
To enhance the label sequence results z AT t of initially identified aspect terms, we introduce the Aspect Term Enhancement (ATE) module.Since an aspect term is specific to a corresponding argument unit, it is essential to establish a component that constrains the recognition range of aspect terms within the text.Building upon this concept, we propose the Span Mask Interactive Attention (SMIA) component, which ensures that the attention mechanism focuses on the relevant argument unit spans while effectively disregarding irrelevant text.The overall process can be formulated as: Once we obtain the new context representation, we proceed to feed it into the decoder model, which is responsible for generating the aspect term enhanced label sequence z AT E t .

Decision Module
Through the AUE and ATE module, we can obtain the enhanced argument unit label probability z AU E t and the enhanced aspect term label probability z AT E t .Finally, we fuse the probabilities in the two label spaces (initial, enhanced) as the final output.

Objective Function
The first part aims to minimize two negative logprobability of the correct sequence of labels in basic module.
where z AU t and z AT t represent the predicted sequence, y AU t and y AT t represent the correct sequence.
The second part loss is the cross-entropy loss for stance classification of argument unit span in AUE module, denoted as: where n is the number of argument units, and m is the number of stance classes.
Similar to the first part, the third part uses the negative log-likelihood loss in ATE module.
Finally, the fourth part also aim to minimize negative log-likelihood for enhanced label probability distribution.
The final loss function is defined as follows:

Experiments
To evaluate the effectiveness of HEF framework and the corresponding components, we conducted experiments on four datasets.

Datasets
ABAM2 .We employ the latest ABAM dataset, which was released in 2020 and comprises 8 topics (Trautmann, 2020).The statistics are presented in the table 1.We followed the inner dataset split (2268 / 307 / 636 for train / dev / test) defined in the ABAM corpus (Trautmann, 2020).AURC-83 .The argument unit recognition and classification (AURC) dataset published in 2020, consists of 8 topics (Trautmann et al., 2020).The statistics are shown in the table 2. We used the inner dataset split (4000 / 800 / 2000 for train / dev / test) given by Trautmann et al. (2020).SemEval-2016 Task 6A4 .The dataset has been divided into training and test set for each of the five claims.Each sample can be classified three categories: against, none and favor.
ABAM argument segment5 .The dataset is a collection of argument units in ABAM.Each argument unit can be classified into two categories: PRO and CON.
Table 3 shows the distribution of SemEval-2016 Task 6A and ABAM argument segment.

The experimental setup
Evaluation Metrics: For different tasks, we provide specific evaluation metrics to assess the performance.For aspect-based argument mining task, we conduct model evaluation using the ABAM dataset at the segment-level and token-level.In segment-level evaluation, we consider a prediction as correct only if the model correctly identifies the boundary and category of an entity.We use exact matching F16 to measure the accuracy.At the token-level, we proposed two evaluation methods: Token-Nested evaluation and Token-Flat evaluation.In the Token-Nested evaluation, we extend the stance labels by incorporating aspect information, resulting in six possible combinations: , and [CON, ASP].We report both Macro-F1 and Micro-F1 scores for this evaluation.In Token-Flat evaluation, we concatenate the sequence labels of aspect term recognition and argument unit recognition to generate a label sequence twice the sentence length, containing four label categories: [ASP, PRO, CON, O].We report both Macro-F1 and Micro-F1 scores for this evaluation.For the argument unit recognition and classification task, we employ the segment-level evaluation metric on the AURC-8 dataset.The Macro-F1, Micro-F1 and separate F1 scores for each category are provided.Finally, for the stance detection task, we report the Macro-F1 score, Micro-F1 score, and F1 scores on the SemEval-2016 Task 6A and ABAM argument segment datasets.
Compared Models: We compare the HEF framework with the following state-of-the-art methods: • CNN-NER (Yan et al., 2022) utilizes a convolutional neural network to capture the interactions among neighboring entity spans.
• W 2 NER (Li et al., 2022) introduces a novel approach to tackle named NER by framing it as a word-word relation classification problem.
• Span, BPE, Word (Yan et al., 2021) present a new formulation of the NER task as an entityspan sequence generation problem.

Comparison results of different methods for ABAM
We show the performance comparison of different methods in table 4. All comparison methods use the code provided by their respective original papers, and undergo testing and evaluation on the ABAM dataset.
Based on the results presented in Table 4, our proposed method demonstrates superior performance compared to all the existing methods, both in segment-level and token-level evaluation metrics.This improvement can be attributed to the inclusion of two key modules: Argument Unit Enhancement (AUE) and Aspect Term Enhancement (ATE).Specifically, our method shows substantial improvements in MicF1, MacF1, Token-Flat, and Token-Nested evaluation metrics, with gains of at least 0.0767, 0.1274, 0.0647, and 0.0745, respectively, compared to the other comparison methods.The ATE module effectively constrains the recognition range of aspect terms, leading to a significant improvement in the F1 score for aspect term recognition (ASP column).

Ablation experiments for ABAM
To evaluate the individual impact of each functional module or component within the HEF framework on model performance, we conduct a series of ablation experiments.The results of these experiments are presented in The experimental results in the table above clearly demonstrate that the removal of different modules or components has a significant impact on the performance of the HEF framework.In particular, the absence of the AUE module has a substantial negative effect on overall performance.The utilization of BHGAT to re-judge the category of argument unit spans has proven to be an effective strategy for correcting samples in the basic module where the boundary is correctly identified but the category judgment is incorrect.Moreover, the inclusion of the topic information in BERT

Effectiveness of SSF component
To evaluate the effectiveness of SSF component, we conducted experiments on two sequence labeling datasets, AURC-8 and ABAM.For the sequence labeling task, we use LSTM and SSF as the encoder and CRF as the decoder.The experimental results are presented in Table 6.
Based on the experimental results in Table 6, we observe the scores on the ABAM dataset are significantly higher compared to those in the AURC-8 dataset.This discrepancy can be attributed to the inherent dissimilarities between the two datasets.In the AURC-8 dataset, argument units may not exist in every sample, while the ABAM dataset ensures the presence of at least one argument unit in each sample.By integrating spatial syntactic information with the LSTM-encoded sequential semantic information, the SSF component demonstrates a clear performance advantage, leading to significant improvements on both datasets.

Effectiveness of BHGAT component
To comprehensively assess the superiority of this component, we conduct experiments on two datasets: SemEval 2016 Task 6A and ABAM ar-gument segment datasets.In our experiments, we incorporate the BHGAT component into the BERTbased framework and compare the experimental results, as shown in Table 7.
Table 7 clearly demonstrate that the inclusion of the BHGAT component has resulted in significant performance improvements.This improvement can be attributed to several key factors.Firstly, the BH-GAT component has the ability to capture and leverage information from multiple samples that share the same topic.By considering the expressions of different stances and the embedding representations of words in different contexts, the BHGAT component enhances the model's discriminative power and facilitates more accurate stance detection.Furthermore, the versatility of the BHGAT component is noteworthy.It can be seamlessly integrated into various frameworks, enabling performance enhancements across different models.This flexibility makes the BHGAT component highly adaptable, particularly in classification tasks that involve topic information, such as stance detection.

Effectiveness of SMIA component
The SMIA component introduced in the ATE module aims to restrict the range of aspect term recognition.To assess its effectiveness, we present the confusion matrix based on the Token-Nested evaluation index in figure 3.
In figure 3, the confusion matrix is presented with a dimension of 6x6.It is important to note that in real data, there is no combination of the aspect label ASP and the stance label NON, as aspect terms only exist within argument units.As a result, the  fifth row of the confusion matrix is always zero.In the confusion matrix, we focus on the fifth column of each confusion matrix.The model identifies aspect terms and argument units separately, during the prediction process, if the prediction range of term is not constrained, it may generate a wrong match between the aspect term label ASP and the stance label NON.However, by observing the fifth column of each confusion matrix, we can observe a significant reduction in misjudgment after adding the span mask constraints imposed by the SMIA component.This outcome reinforces the effectiveness of the SMIA component in constraining the recognition of aspect terms.

Conclusion
This paper presents a novel layer-based approach for the aspect-based argument mining task, utiliz-ing a hierarchical enhancement framework consisting of four modules: basic module, argument unit enhancement module, aspect term enhancement module, and decision module.The SSF component plays a crucial role in optimizing underlying representations, which can be utilized across various tasks.It enhances the framework's capability by incorporating syntactic information into the encoding process, improving performance on sequence labeling tasks.The BHGAT component, effective for classification tasks involving topic information, enhances the framework's generalization capabilities.The SMIA component is specifically designed for aspect-based argument mining tasks, aiming to constrain the recognition range of aspect terms.It effectively improves the accuracy of aspect term recognition and contributes to the overall performance of the framework.

Limitations
However, it should be noted that the proposed BH-GAT is currently only suitable for classification tasks with topic information.Its generalization to more general tasks needs further investigation in our future work.In addition, our current framework has primarily focused on adopting a layerbased method for Nested Named Entity Recognition (NNER), without extensively exploring how to mine the correlation between argument units and aspect terms.In future work, it is essential to delve deeper into the correlation between these two entities and fully utilize the guiding information between them.

Figure 2 :
Figure 2: The overall architecture of HEF for ABAM.

Figure 3 :
Figure 3: The confusion matrices for different models Students not wearing the latest fashions may feel inadequate if their parents cannot afford to purchase them or some students may become targets of bullying for the same reason.
where HG(0)is initial representation of nodes in BHGAT, n au is the number of argument units, and n w is the number of words.Adjacency matrix.[word,au] [word, word]

Table 1 :
The proportion of ABAM.

Table 4 :
Performance comparison of different methods in ABAM.

Table 5 :
Performance comparison of results without different module or components.

Table 6 :
Performance comparison of results in ABAM and AURC-8.