2024
pdf
bib
abs
BengaliLCP: A Dataset for Lexical Complexity Prediction in the Bengali Texts
Nabila Ayman
|
Md. Akram Hossain
|
Abdul Aziz
|
Rokan Uddin Faruqui
|
Abu Nowshed Chy
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Encountering intricate or ambiguous terms within a sentence produces distress for the reader during comprehension. Lexical Complexity Prediction (LCP) deals with predicting the complexity score of a word or a phrase considering its context. This task poses several challenges including ambiguity, context sensitivity, and subjectivity in perceiving complexity. Despite having 300 million native speakers and ranking as the seventh most spoken language in the world, Bengali falls behind in the research on lexical complexity when compared to other languages. To bridge this gap, we introduce the first annotated Bengali dataset, that assists in performing the task of LCP in this language. Besides, we propose a transformer-based deep neural approach with a pairwise multi-head attention mechanism and LSTM model to predict the lexical complexity of Bengali tokens. The outcomes demonstrate that the proposed neural approach outperformed the existing state-of-the-art models for the Bengali language.
2023
pdf
bib
abs
CSECU-DSG at SemEval-2023 Task 4: Fine-tuning DeBERTa Transformer Model with Cross-fold Training and Multi-sample Dropout for Human Values Identification
Abdul Aziz
|
Md. Akram Hossain
|
Abu Nowshed Chy
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
Human values identification from a set of argument is becoming a prominent area of research in argument mining. Among some options, values convey what may be the most desirable and widely accepted answer. The diversity of human beliefs, random texture and implicit meaning within the arguments makes it more difficult to identify human values from the arguments. To address these challenges, SemEval-2023 Task 4 introduced a shared task ValueEval focusing on identifying human values categories based on given arguments. This paper presents our participation in this task where we propose a finetuned DeBERTa transformers-based classification approach to identify the desire human value category. We utilize different training strategy with the finetuned DeBERTa model to enhance contextual representation on this downstream task. Our proposed method achieved competitive performance among the participants’ methods.
pdf
bib
abs
CSECU-DSG at SemEval-2023 Task 10: Exploiting Transformers with Stacked LSTM for the Explainable Detection of Online Sexism
Afrin Sultana
|
Radiathun Tasnia
|
Nabila Ayman
|
Abu Nowshed Chy
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
Sexism is a harmful phenomenon that provokes gender inequalities and social imbalances. The expanding application of sexist content on social media platforms creates an unwelcoming and discomforting environment for many users. The implication of sexism is a multi-faceted subject as it can be integrated with other categories of discrimination. Binary classification tools are frequently employed to identify sexist content, but most of them provide extensive, generic categories with no further insights. SemEval-2023 introduced the Explainable Detection of Online Sexism (EDOS) task that emphasizes detecting and explaining the category of sexist content. The content of this paper details our involvement in this task where we present a neural network architecture employing document embeddings from a fine-tuned transformer-based model into stacked long short-term memory (LSTM) and a fully connected linear (FCL) layer . Our proposed methodology obtained an F1 score of 0.8218 (ranked 51st) in Task A. It achieved an F1 score of 0.5986 (ranked 40th) and 0.4419 (ranked 28th) in Tasks B and C, respectively.
pdf
bib
abs
CSECU-DSG at SemEval-2023 Task 6: Segmenting Legal Documents into Rhetorical Roles via Fine-tuned Transformer Architecture
Fareen Tasneem
|
Tashin Hossain
|
Jannatun Naim
|
Abu Nowshed Chy
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
Automated processing of legal documents is essential to manage the enormous volume of legal corpus and to make it easily accessible to a broad spectrum of people. But due to the amorphous and variable nature of legal documents, it is very challenging to directly proceed with complicated processes such as summarization, analysis, and query. Segmenting the documents as per the rhetorical roles can aid and accelerate such procedures. This paper describes our participation in SemEval-2023 task 6: Sub-task A: Rhetorical Roles Prediction. We utilize a finetuned Legal-BERT to address this task. We also conduct an error analysis to illustrate the shortcomings of our deployed approach.
pdf
bib
abs
CSECU-DSG@Multimodal Hate Speech Event Detection 2023: Transformer-based Multimodal Hierarchical Fusion Model For Multimodal Hate Speech Detection
Abdul Aziz
|
MD. Akram Hossain
|
Abu Nowshed Chy
Proceedings of the 6th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text
The emergence of social media and e-commerce platforms enabled the perpetrator to spread negativity and abuse individuals or organisations worldwide rapidly. It is critical to detect hate speech in both visual and textual content so that it may be moderated or excluded from online platforms to keep it sound and safe for users. However, multimodal hate speech detection is a complex and challenging task as people sarcastically present hate speech and different modalities i.e., image and text are involved in their content. This paper describes our participation in the CASE 2023 multimodal hate speech event detection task. In this task, the objective is to automatically detect hate speech and its target from the given text-embedded image. We proposed a transformer-based multimodal hierarchical fusion model to detect hate speech present in the visual content. We jointly fine-tune a language and a vision pre-trained transformer models to extract the visual-contextualized features representation of the text-embedded image. We concatenate these features and fed them to the multi-sample dropout strategy. Moreover, the contextual feature vector is fed into the BiLSTM module and the output of the BiLSTM module also passes into the multi-sample dropout. We employed arithmetic mean fusion to fuse all sample dropout outputs that predict the final label of our proposed method. Experimental results demonstrate that our model obtains competitive performance and ranked 5th among the participants
pdf
bib
abs
CSECU-DSG @ Causal News Corpus 2023: Leveraging RoBERTa and DeBERTa Transformer Model with Contrastive Learning for Causal Event Classification
MD. Akram Hossain
|
Abdul Aziz
|
Abu Nowshed Chy
Proceedings of the 6th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text
Cause-effect relationships play a crucial role in human cognition, and distilling cause-effect relations from text helps in ameliorating causal networks for predictive tasks. There are many NLP applications that can benefit from this task, including natural language-based financial forecasting, text summarization, and question-answering. However, due to the lack of syntactic clues, the ambivalent semantic meaning of words, complex sentence structure, and implicit meaning of numerical entities in the text make it one of the challenging tasks in NLP. To address these challenges, CASE-2023 introduced a shared task 3 task focusing on event causality identification with causal news corpus. In this paper, we demonstrate our participant systems for this task. We leverage two transformers models including DeBERTa and Twitter-RoBERTa along with the weighted average fusion technique to tackle the challenges of subtask 1 where we need to identify whether a text belongs to either causal or not. For subtask 2 where we need to identify the cause, effect, and signal tokens from the text, we proposed a unified neural network of DeBERTa and DistilRoBERTa transformer variants with contrastive learning techniques. The experimental results showed that our proposed method achieved competitive performance among the participants’ systems.
2022
pdf
bib
Enhancing the DeBERTa Transformers Model for Classifying Sentences from Biomedical Abstracts
Abdul Aziz
|
Md. Akram Hossain
|
Abu Nowshed Chy
Proceedings of the 20th Annual Workshop of the Australasian Language Technology Association
pdf
bib
abs
CSECU-DSG @ Causal News Corpus 2022: Fusion of RoBERTa Transformers Variants for Causal Event Classification
Abdul Aziz
|
Md. Akram Hossain
|
Abu Nowshed Chy
Proceedings of the 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE)
Identifying cause-effect relationships in sentences is one of the formidable tasks to tackle the challenges of inference and understanding of natural language. However, the diversity of word semantics and sentence structure makes it challenging to determine the causal relationship effectively. To address these challenges, CASE-2022 shared task 3 introduced a task focusing on event causality identification with causal news corpus. This paper presents our participation in this task, especially in subtask 1 which is the causal event classification task. To tackle the task challenge, we propose a unified neural model through exploiting two fine-tuned transformer models including RoBERTa and Twitter-RoBERTa. For the score fusion, we combine the prediction scores of each component model using weighted arithmetic mean to generate the probability score for class label identification. The experimental results showed that our proposed method achieved the top performance (ranked 1st) among the participants.
pdf
bib
abs
CSECU-DSG at SemEval-2022 Task 3: Investigating the Taxonomic Relationship Between Two Arguments using Fusion of Multilingual Transformer Models
Abdul Aziz
|
Md. Akram Hossain
|
Abu Nowshed Chy
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)
Recognizing lexical relationships between words is one of the formidable tasks in computational linguistics. It plays a vital role in the improvement of various NLP tasks. However, the diversity of word semantics, sentence structure as well as word order information make it challenging to distill the relationship effectively. To address these challenges, SemEval-2022 Task 3 introduced a shared task PreTENS focusing on semantic competence to determine the taxonomic relations between two nominal arguments. This paper presents our participation in this task where we proposed an approach through exploiting an ensemble of multilingual transformer methods. We employed two fine-tuned multilingual transformer models including XLM-RoBERTa and mBERT to train our model. To enhance the performance of individual models, we fuse the predicted probability score of these two models using weighted arithmetic mean to generate a unified probability score. The experimental results showed that our proposed method achieved competitive performance among the participants’ methods.
pdf
bib
abs
CSECU-DSG at SemEval-2022 Task 11: Identifying the Multilingual Complex Named Entity in Text Using Stacked Embeddings and Transformer based Approach
Abdul Aziz
|
Md. Akram Hossain
|
Abu Nowshed Chy
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)
Recognizing complex and ambiguous named entities (NEs) is one of the formidable tasks in the NLP domain. However, the diversity of linguistic constituents, syntactic structure, semantic ambiguity as well as differences from traditional NEs make it challenging to identify the complex NEs. To address these challenges, SemEval-2022 Task 11 introduced a shared task MultiCoNER focusing on complex named entity recognition in multilingual settings. This paper presents our participation in this task where we propose two different approaches including a BiLSTM-CRF model with stacked-embedding strategy and a transformer-based approach. Our proposed method achieved competitive performance among the participants’ methods in a few languages.
pdf
bib
abs
CSECU-DSG@SMM4H’22: Transformer based Unified Approach for Classification of Changes in Medication Treatments in Tweets and WebMD Reviews
Afrin Sultana
|
Nihad Karim Chowdhury
|
Abu Nowshed Chy
Proceedings of The Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task
Medications play a vital role in medical treatment as medication non-adherence reduces clinical benefit, results in morbidity, and medication wastage. Self-declared changes in drug treatment and their reasons are automatically extracted from tweets and user reviews, helping to determine the effectiveness of drugs and improve treatment care. SMM4H 2022 Task 3 introduced a shared task focusing on the identification of non-persistent patients from tweets and WebMD reviews. In this paper, we present our participation in this task. We propose a neural approach that integrates the strengths of the transformer model, the Long Short-Term Memory (LSTM) model, and the fully connected layer into a unified architecture. Experimental results demonstrate the competitive performance of our system on test data with 61% F1-score on task 3a and 86% F1-score on task 3b. Our proposed neural approach ranked first in task 3b.
2021
pdf
bib
abs
CSECU-DSG at SemEval-2021 Task 1: Fusion of Transformer Models for Lexical Complexity Prediction
Abdul Aziz
|
MD. Akram Hossain
|
Abu Nowshed Chy
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)
Lexical complexity prediction (LCP) conveys the anticipation of the complexity level of a token or a set of tokens in a sentence. It plays a vital role in the improvement of various NLP tasks including lexical simplification, translations, and text generation. However, multiple meaning of a word in multiple circumstances, grammatical complex structure, and the mutual dependency of words in a sentence make it difficult to estimate the lexical complexity. To address these challenges, SemEval-2021 Task 1 introduced a shared task focusing on LCP and this paper presents our participation in this task. We proposed a transformer-based approach with sentence pair regression. We employed two fine-tuned transformer models. Including BERT and RoBERTa to train our model and fuse their predicted score to the complexity estimation. Experimental results demonstrate that our proposed method achieved competitive performance compared to the participants’ systems.
pdf
bib
abs
CSECU-DSG at SemEval-2021 Task 5: Leveraging Ensemble of Sequence Tagging Models for Toxic Spans Detection
Tashin Hossain
|
Jannatun Naim
|
Fareen Tasneem
|
Radiathun Tasnia
|
Abu Nowshed Chy
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)
The upsurge of prolific blogging and microblogging platforms enabled the abusers to spread negativity and threats greater than ever. Detecting the toxic portions substantially aids to moderate or exclude the abusive parts for maintaining sound online platforms. This paper describes our participation in the SemEval 2021 toxic span detection task. The task requires detecting spans that convey toxic remarks from the given text. We explore an ensemble of sequence labeling models including the BiLSTM-CRF, spaCy NER model with custom toxic tags, and fine-tuned BERT model to identify the toxic spans. Finally, a majority voting ensemble method is used to determine the unified toxic spans. Experimental results depict the competitive performance of our model among the participants.
pdf
bib
abs
CSECU-DSG at SemEval-2021 Task 6: Orchestrating Multimodal Neural Architectures for Identifying Persuasion Techniques in Texts and Images
Tashin Hossain
|
Jannatun Naim
|
Fareen Tasneem
|
Radiathun Tasnia
|
Abu Nowshed Chy
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)
Inscribing persuasion techniques in memes is the most impactful way to influence peoples’ mindsets. People are more inclined to memes as they are more stimulating and convincing and hence memes are often exploited by tactfully engraving propaganda in its context with the intent of attaining specific agenda. This paper describes our participation in the three subtasks featured by SemEval 2021 task 6 on the detection of persuasion techniques in texts and images. We utilize a fusion of logistic regression, decision tree, and fine-tuned DistilBERT for tackling subtask 1. As for subtask 2, we propose a system that consolidates a span identification model and a multi-label classification model based on pre-trained BERT. We address the multi-modal multi-label classification of memes defined in subtask 3 by utilizing a ResNet50 based image model, DistilBERT based text model, and a multi-modal architecture based on multikernel CNN+LSTM and MLP model. The outcomes illustrated the competitive performance of our systems.
pdf
bib
abs
CSECU-DSG at SemEval-2021 Task 7: Detecting and Rating Humor and Offense Employing Transformers
Afrin Sultana
|
Nabila Ayman
|
Abu Nowshed Chy
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)
With the emerging trends of using online platforms, peoples are increasingly interested in express their opinion through humorous texts. Identifying and rating humorous texts poses unique challenges to NLP due to subjective phenomena i.e. humor may vary to gender, profession, age, and classes of people. Besides, words with multiple senses, cultural domain, and pragmatic competence also need to be considered. A humorous text may be offensive to others. To address these challenges SemEval-2021 introduced a HaHackathon task focusing on detecting and rating humorous and offensive texts. This paper describes our participation in this task. We employed a stacked embedding and fine-tuned transformer models based classification and regression approach from the features from GPT2 medium, BERT, and RoBERTa transformer models. Besides, we utilized the fine-tuned BERT and RoBERTa models to examine the performances. Our method achieved competitive performances in this task.
2020
pdf
bib
abs
CSECU-DSG at WNUT-2020 Task 2: Exploiting Ensemble of Transfer Learning and Hand-crafted Features for Identification of Informative COVID-19 English Tweets
Fareen Tasneem
|
Jannatun Naim
|
Radiathun Tasnia
|
Tashin Hossain
|
Abu Nowshed Chy
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)
COVID-19 pandemic has become the trending topic on twitter and people are interested in sharing diverse information ranging from new cases, healthcare guidelines, medicine, and vaccine news. Such information assists the people to be updated about the situation as well as beneficial for public safety personnel for decision making. However, the informal nature of twitter makes it challenging to refine the informative tweets from the huge tweet streams. To address these challenges WNUT-2020 introduced a shared task focusing on COVID-19 related informative tweet identification. In this paper, we describe our participation in this task. We propose a neural model that adopts the strength of transfer learning and hand-crafted features in a unified architecture. To extract the transfer learning features, we utilize the state-of-the-art pre-trained sentence embedding model BERT, RoBERTa, and InferSent, whereas various twitter characteristics are exploited to extract the hand-crafted features. Next, various feature combinations are utilized to train a set of multilayer perceptron (MLP) as the base-classifier. Finally, a majority voting based fusion approach is employed to determine the informative tweets. Our approach achieved competitive performance and outperformed the baseline by 7% (approx.).
pdf
bib
abs
CSECU_KDE_MA at SemEval-2020 Task 8: A Neural Attention Model for Memotion Analysis
Abu Nowshed Chy
|
Umme Aymun Siddiqua
|
Masaki Aono
Proceedings of the Fourteenth Workshop on Semantic Evaluation
A meme is a pictorial representation of an idea or theme. In the age of emerging volume of social media platforms, memes are spreading rapidly from person to person and becoming a trending ways of opinion expression. However, due to the multimodal characteristics of meme contents, detecting and analyzing the underlying emotion of a meme is a formidable task. In this paper, we present our approach for detecting the emotion of a meme defined in the SemEval-2020 Task 8. Our team CSECU_KDE_MA employs an attention-based neural network model to tackle the problem. Upon extracting the text contents from a meme using an optical character reader (OCR), we represent it using the distributed representation of words. Next, we perform the convolution based on multiple kernel sizes to obtain the higher-level feature sequences. The feature sequences are then fed into the attentive time-distributed bidirectional LSTM model to learn the long-term dependencies effectively. Experimental results show that our proposed neural model obtained competitive performance among the participants’ systems.
2019
pdf
bib
abs
KDEHatEval at SemEval-2019 Task 5: A Neural Network Model for Detecting Hate Speech in Twitter
Umme Aymun Siddiqua
|
Abu Nowshed Chy
|
Masaki Aono
Proceedings of the 13th International Workshop on Semantic Evaluation
In the age of emerging volume of microblog platforms, especially twitter, hate speech propagation is now of great concern. However, due to the brevity of tweets and informal user generated contents, detecting and analyzing hate speech on twitter is a formidable task. In this paper, we present our approach for detecting hate speech in tweets defined in the SemEval-2019 Task 5. Our team KDEHatEval employs different neural network models including multi-kernel convolution (MKC), nested LSTMs (NLSTMs), and multi-layer perceptron (MLP) in a unified architecture. Moreover, we utilize the state-of-the-art pre-trained sentence embedding models including DeepMoji, InferSent, and BERT for effective tweet representation. We analyze the performance of our method and demonstrate the contribution of each component of our architecture.
pdf
bib
abs
Tweet Stance Detection Using an Attention based Neural Ensemble Model
Umme Aymun Siddiqua
|
Abu Nowshed Chy
|
Masaki Aono
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
Stance detection in twitter aims at mining user stances expressed in a tweet towards a single or multiple target entities. To tackle this problem, most of the prior studies have been explored the traditional deep learning models, e.g., LSTM and GRU. However, in compared to these traditional approaches, recently proposed densely connected Bi-LSTM and nested LSTMs architectures effectively address the vanishing-gradient and overfitting problems as well as dealing with long-term dependencies. In this paper, we propose a neural ensemble model that adopts the strengths of these two LSTM variants to learn better long-term dependencies, where each module coupled with an attention mechanism that amplifies the contribution of important elements in the final representation. We also employ a multi-kernel convolution on top of them to extract the higher-level tweet representations. Results of extensive experiments on single and multi-target stance detection datasets show that our proposed method achieves substantial improvement over the current state-of-the-art deep learning based methods.