Usman Naseem


pdf bib
A Multi-Modal Dataset for Hate Speech Detection on Social Media: Case-study of Russia-Ukraine Conflict
Surendrabikram Thapa | Aditya Shah | Farhan Jafri | Usman Naseem | Imran Razzak
Proceedings of the 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE)

This paper presents a new multi-modal dataset for identifying hateful content on social media, consisting of 5,680 text-image pairs collected from Twitter, labeled across two labels. Experimental analysis of the presented dataset has shown that understanding both modalities is essential for detecting these techniques. It is confirmed in our experiments with several state-of-the-art multi-modal models. In future work, we plan to extend the dataset in size. We further plan to develop new multi-modal models tailored explicitly to hate-speech detection, aiming for a deeper understanding of the text and image relation. It would also be interesting to perform experiments in a direction that explores what social entities the given hate speech tweet targets.

pdf bib
Incorporating Medical Knowledge to Transformer-based Language Models for Medical Dialogue Generation
Usman Naseem | Ajay Bandi | Shaina Raza | Junaid Rashid | Bharathi Raja Chakravarthi
Proceedings of the 21st Workshop on Biomedical Language Processing

Medical dialogue systems have the potential to assist doctors in expanding access to medical care, improving the quality of patient experiences, and lowering medical expenses. The computational methods are still in their early stages and are not ready for widespread application despite their great potential. Existing transformer-based language models have shown promising results but lack domain-specific knowledge. However, to diagnose like doctors, an automatic medical diagnosis necessitates more stringent requirements for the rationality of the dialogue in the context of relevant knowledge. In this study, we propose a new method that addresses the challenges of medical dialogue generation by incorporating medical knowledge into transformer-based language models. We present a method that leverages an external medical knowledge graph and injects triples as domain knowledge into the utterances. Automatic and human evaluation on a publicly available dataset demonstrates that incorporating medical knowledge outperforms several state-of-the-art baseline methods.

pdf bib
Accuracy meets Diversity in a News Recommender System
Shaina Raza | Syed Raza Bashir | Usman Naseem
Proceedings of the 29th International Conference on Computational Linguistics

News recommender systems face certain challenges. These challenges arise due to evolving users’ preferences over dynamically created news articles. The diversity is necessary for a news recommender system to expose users to a variety of information. We propose a deep neural network based on a two-tower architecture that learns news representation through a news item tower and users’ representations through a query tower. We customize an augmented vector for each query and news item to introduce information interaction between the two towers. We introduce diversity in the proposed architecture by considering a category loss function that aligns items’ representation of uneven news categories. Experimental results on two news datasets reveal that our proposed architecture is more effective compared to the state-of-the-art methods and achieves a balance between accuracy and diversity.

pdf bib
A DistilBERTopic Model for Short Text Documents
Junaid Rashid | Jungeun Kim | Usman Naseem | Amir Hussain
Proceedings of the The 20th Annual Workshop of the Australasian Language Technology Association

pdf bib
Benchmarking for Public Health Surveillance tasks on Social Media with a Domain-Specific Pretrained Language Model
Usman Naseem | Byoung Chan Lee | Matloob Khushi | Jinman Kim | Adam Dunn
Proceedings of NLP Power! The First Workshop on Efficient Benchmarking in NLP

A user-generated text on social media enables health workers to keep track of information, identify possible outbreaks, forecast disease trends, monitor emergency cases, and ascertain disease awareness and response to official health correspondence. This exchange of health information on social media has been regarded as an attempt to enhance public health surveillance (PHS). Despite its potential, the technology is still in its early stages and is not ready for widespread application. Advancements in pretrained language models (PLMs) have facilitated the development of several domain-specific PLMs and a variety of downstream applications. However, there are no PLMs for social media tasks involving PHS. We present and release PHS-BERT, a transformer-based PLM, to identify tasks related to public health surveillance on social media. We compared and benchmarked the performance of PHS-BERT on 25 datasets from different social medial platforms related to 7 different PHS tasks. Compared with existing PLMs that are mainly evaluated on limited tasks, PHS-BERT achieved state-of-the-art performance on all 25 tested datasets, showing that our PLM is robust and generalizable in the common PHS tasks. By making PHS-BERT available, we aim to facilitate the community to reduce the computational cost and introduce new baselines for future works across various PHS-related tasks.