Symom Shohan
2024
SemanticCuetSync at AraFinNLP2024: Classification of Cross-Dialect Intent in the Banking Domain using Transformers
Ashraful Paran
|
Symom Shohan
|
Md. Hossain
|
Jawad Hossain
|
Shawly Ahsan
|
Mohammed Moshiul Hoque
Proceedings of The Second Arabic Natural Language Processing Conference
Intention detection is a crucial aspect of natural language understanding (NLU), focusing on identifying the primary objective underlying user input. In this work, we present a transformer-based method that excels in determining the intent of Arabic text within the banking domain. We explored several machine learning (ML), deep learning (DL), and transformer-based models on an Arabic banking dataset for intent detection. Our findings underscore the challenges that traditional ML and DL models face in understanding the nuances of various Arabic dialects, leading to subpar performance in intent detection. However, the transformer-based methods, designed to tackle such complexities, significantly outperformed the other models in classifying intent across different Arabic dialects. Notably, the AraBERTv2 model achieved the highest micro F1 score of 82.08% in ArBanking77 dataset, a testament to its effectiveness in this context. This achievement, which contributed to our work being ranked 5th in the shared task, AraFinNLP2024, highlights the importance of developing models that can effectively handle the intricacies of Arabic language processing and intent detection.
SemanticCuetSync at ArAIEval Shared Task: Detecting Propagandistic Spans with Persuasion Techniques Identification using Pre-trained Transformers
Symom Shohan
|
Md. Hossain
|
Ashraful Paran
|
Shawly Ahsan
|
Jawad Hossain
|
Mohammed Moshiul Hoque
Proceedings of The Second Arabic Natural Language Processing Conference
Detecting propagandistic spans and identifying persuasion techniques are crucial for promoting informed decision-making, safeguarding democratic processes, and fostering a media environment characterized by integrity and transparency. Various machine learning (Logistic Regression, Random Forest, and Multinomial Naive Bayes), deep learning (CNN, CNN+LSTM, CNN+BiLSTM), and transformer-based (AraBERTv2, AraBERT-NER, CamelBERT, BERT-Base-Arabic) models were exploited to perform the task. The evaluation results indicate that CamelBERT achieved the highest micro-F1 score (24.09%), outperforming CNN+LSTM and AraBERTv2. The study found that most models struggle to detect propagandistic spans when multiple spans are present within the same article. Overall, the model’s performance secured a 6th place ranking in the ArAIEval Shared Task-1.
Search