Jatin Bedi


2024

pdf bib
Transformers at HSD-2Lang 2024: Hate Speech Detection in Arabic and Turkish Tweets Using BERT Based Architectures
Kriti Singhal | Jatin Bedi
Proceedings of the 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2024)

Over the past years, researchers across the globe have made significant efforts to develop systems capable of identifying the presence of hate speech in different languages. This paper describes the team Transformers’ submission to the subtasks: Hate Speech Detection in Turkish across Various Contexts and Hate Speech Detection with Limited Data in Arabic, organized by HSD-2Lang in conjunction with CASE at EACL 2024. A BERT based architecture was employed in both the subtasks. We achieved an F1 score of 0.63258 using XLM RoBERTa and 0.48101 using mBERT, hence securing the 6th rank and the 5th rank in the first and the second subtask, respectively.

pdf bib
Transformers@DravidianLangTech-EACL2024: Sentiment Analysis of Code-Mixed Tamil Using RoBERTa
Kriti Singhal | Jatin Bedi
Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

In recent years, there has been a persistent focus on developing systems that can automatically identify the hate speech content circulating on diverse social media platforms. This paper describes the team Transformers’ submission to the Caste/Immigration Hate Speech Detection in Tamil shared task by LT-EDI 2024 workshop at EACL 2024. We used an ensemble approach in the shared task, combining various transformer-based pre-trained models using majority voting. The best macro average F1-score achieved was 0.82. We secured the 1st rank in the Caste/Immigration Hate Speech in Tamil shared task.

pdf bib
Transformers@LT-EDI-EACL2024: Caste and Migration Hate Speech Detection in Tamil Using Ensembling on Transformers
Kriti Singhal | Jatin Bedi
Proceedings of the Fourth Workshop on Language Technology for Equality, Diversity, Inclusion

In recent years, there has been a persistent focus on developing systems that can automatically identify the hate speech content circulating on diverse social media platforms. This paper describes the team “Transformers” submission to the Caste and Migration Hate Speech Detection in Tamil shared task by LT-EDI 2024 workshop at EACL 2024. We used an ensemble approach in the shared task, combining various transformer-based pre-trained models using majority voting. The best macro average F1-score achieved was 0.82. We secured the 1st rank in the Caste and Migration Hate Speech in Tamil shared task.

2023

pdf bib
MLModeler5 @ Causal News Corpus 2023: Using RoBERTa for Casual Event Classification
Amrita Bhatia | Ananya Thomas | Nitansh Jain | Jatin Bedi
Proceedings of the 6th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text

Identifying cause-effect relations plays an integral role in the understanding and interpretation of natural languages. Furthermore, automated mining of causal relations from news and text about socio-political events is a stepping stone in gaining critical insights, including analyzing the scale, frequency and trends across timelines of events, as well as anticipating future ones. The Shared Task 3, part of the 6th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE @ RANLP 2023), involved the task of Event Causality Identification with Causal News Corpus. We describe our approach to Subtask 1, dealing with causal event classification, a supervised binary classification problem to annotate given event sentences with whether they contained any cause-effect relations. To help achieve this task, a BERT based architecture - RoBERTa was implemented. The results of this model are validated on the dataset provided by the organizers of this task.

pdf bib
MLModeler5 at SemEval-2023 Task 3: Detecting the Category and the Framing Techniques in Online News in a Multi-lingual Setup
Arjun Khanchandani | Nitansh Jain | Jatin Bedi
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

System Description Paper for Task 3 Subtask 1 and 2 of Semeval 2023. The paper describes our approach to handling the News Genre Categorisation and Framing Detection using RoBERTa and ALBERT models.

2022

pdf bib
Raccoons at SemEval-2022 Task 11: Leveraging Concatenated Word Embeddings for Named Entity Recognition
Atharvan Dogra | Prabsimran Kaur | Guneet Kohli | Jatin Bedi
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

Named Entity Recognition (NER), an essential subtask in NLP that identifies text belonging to predefined semantics such as a person, location, organization, drug, time, clinical procedure, biological protein, etc. NER plays a vital role in various fields such as informationextraction, question answering, and machine translation. This paper describes our participating system run to the Named entity recognitionand classification shared task SemEval-2022. The task is motivated towards detecting semantically ambiguous and complex entities in shortand low-context settings. Our team focused on improving entity recognition by improving the word embeddings. We concatenated the word representations from State-of-the-art language models and passed them to find the best representation through a reinforcement trainer. Our results highlight the improvements achieved by various embedding concatenations.

pdf bib
ARGUABLY@SMM4H’22: Classification of Health Related Tweets using Ensemble, Zero-Shot and Fine-Tuned Language Model
Prabsimran Kaur | Guneet Kohli | Jatin Bedi
Proceedings of The Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task

With the increase in the use of social media, people have become more outspoken and are using platforms like Reddit, Facebook, and Twitter to express their views and share the medical challenges they are facing. This data is a valuable source of medical insight and is often used for healthcare research. This paper describes our participation in Task 1a, 2a, 2b, 3, 5, 6, 7, and 9 organized by SMM4H 2022. We have proposed two transformer-based approaches to handle the classification tasks. The first approach is fine-tuning single language models. The second approach is ensembling the results of BERT, RoBERTa, and ERNIE 2.0.

pdf bib
ARGUABLY @ Causal News Corpus 2022: Contextually Augmented Language Models for Event Causality Identification
Guneet Kohli | Prabsimran Kaur | Jatin Bedi
Proceedings of the 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE)

Causal (a cause-effect relationship between two arguments) has become integral to various NLP domains such as question answering, summarization, and event prediction. To understand causality in detail, Event Causality Identification with Causal News Corpus (CASE-2022) has organized shared tasks. This paper defines our participation in Subtask 1, which focuses on classifying event causality. We used sentence-level augmentation based on contextualized word embeddings of distillBERT to construct new data. This data was then trained using two approaches. The first technique used the DeBERTa language model, and the second used the RoBERTa language model in combination with cross-attention. We obtained the second-best F1 score (0.8610) in the competition with the Contextually Augmented DeBERTa model.

pdf bib
Adversarial Perturbations Augmented Language Models for Euphemism Identification
Guneet Kohli | Prabsimran Kaur | Jatin Bedi
Proceedings of the 3rd Workshop on Figurative Language Processing (FLP)

Euphemisms are mild words or expressions used instead of harsh or direct words while talking to someone to avoid discussing something unpleasant, embarrassing, or offensive. However, they are often ambiguous, thus making it a challenging task. The Third Workshop on Figurative Language Processing, colocated with EMNLP 2022 organized a shared task on Euphemism Detection to better understand euphemisms. We have used the adversarial augmentation technique to construct new data. This augmented data was then trained using two language models: BERT and longformer. To further enhance the overall performance, various combinations of the results obtained using longformer and BERT were passed through a voting ensembler. We achieved an F1 score of 71.5 using the combination of two adversarial longformers, two adversarial BERT, and one non-adversarial BERT.

2021

pdf bib
ARGUABLY at ComMA@ICON: Detection of Multilingual Aggressive, Gender Biased, and Communally Charged Tweets Using Ensemble and Fine-Tuned IndicBERT
Guneet Kohli | Prabsimran Kaur | Jatin Bedi
Proceedings of the 18th International Conference on Natural Language Processing: Shared Task on Multilingual Gender Biased and Communal Language Identification

The proliferation in Social Networking has increased offensive language, aggression, and hate-speech detection, which has drawn the focus of the NLP community. However, people’s difference in perception makes it difficult to distinguish between acceptable content and aggressive/hateful content, thus making it harder to create an automated system. In this paper, we propose multi-class classification techniques to identify aggressive and offensive language used online. Two main approaches have been developed for the classification of data into aggressive, gender-biased, and communally charged. The first approach is an ensemble-based model comprising of XG-Boost, LightGBM, and Naive Bayes applied on vectorized English data. The data used was obtained using an Indic Transliteration on the original data comprising of Meitei, Bangla, Hindi, and English language. The second approach is a BERT-based architecture used to detect misogyny and aggression. The proposed model employs IndicBERT Embeddings to define contextual understanding. The results of the models are validated on the ComMA v 0.2 dataset.