Guneet Kohli


2022

pdf bib
ARGUABLY @ Causal News Corpus 2022: Contextually Augmented Language Models for Event Causality Identification
Guneet Kohli | Prabsimran Kaur | Jatin Bedi
Proceedings of the 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE)

Causal (a cause-effect relationship between two arguments) has become integral to various NLP domains such as question answering, summarization, and event prediction. To understand causality in detail, Event Causality Identification with Causal News Corpus (CASE-2022) has organized shared tasks. This paper defines our participation in Subtask 1, which focuses on classifying event causality. We used sentence-level augmentation based on contextualized word embeddings of distillBERT to construct new data. This data was then trained using two approaches. The first technique used the DeBERTa language model, and the second used the RoBERTa language model in combination with cross-attention. We obtained the second-best F1 score (0.8610) in the competition with the Contextually Augmented DeBERTa model.

pdf bib
Raccoons at SemEval-2022 Task 11: Leveraging Concatenated Word Embeddings for Named Entity Recognition
Atharvan Dogra | Prabsimran Kaur | Guneet Kohli | Jatin Bedi
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

Named Entity Recognition (NER), an essential subtask in NLP that identifies text belonging to predefined semantics such as a person, location, organization, drug, time, clinical procedure, biological protein, etc. NER plays a vital role in various fields such as informationextraction, question answering, and machine translation. This paper describes our participating system run to the Named entity recognitionand classification shared task SemEval-2022. The task is motivated towards detecting semantically ambiguous and complex entities in shortand low-context settings. Our team focused on improving entity recognition by improving the word embeddings. We concatenated the word representations from State-of-the-art language models and passed them to find the best representation through a reinforcement trainer. Our results highlight the improvements achieved by various embedding concatenations.

pdf bib
Adversarial Perturbations Augmented Language Models for Euphemism Identification
Guneet Kohli | Prabsimran Kaur | Jatin Bedi
Proceedings of the 3rd Workshop on Figurative Language Processing (FLP)

Euphemisms are mild words or expressions used instead of harsh or direct words while talking to someone to avoid discussing something unpleasant, embarrassing, or offensive. However, they are often ambiguous, thus making it a challenging task. The Third Workshop on Figurative Language Processing, colocated with EMNLP 2022 organized a shared task on Euphemism Detection to better understand euphemisms. We have used the adversarial augmentation technique to construct new data. This augmented data was then trained using two language models: BERT and longformer. To further enhance the overall performance, various combinations of the results obtained using longformer and BERT were passed through a voting ensembler. We achieved an F1 score of 71.5 using the combination of two adversarial longformers, two adversarial BERT, and one non-adversarial BERT.

pdf bib
ARGUABLY@SMM4H’22: Classification of Health Related Tweets using Ensemble, Zero-Shot and Fine-Tuned Language Model
Prabsimran Kaur | Guneet Kohli | Jatin Bedi
Proceedings of The Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task

With the increase in the use of social media, people have become more outspoken and are using platforms like Reddit, Facebook, and Twitter to express their views and share the medical challenges they are facing. This data is a valuable source of medical insight and is often used for healthcare research. This paper describes our participation in Task 1a, 2a, 2b, 3, 5, 6, 7, and 9 organized by SMM4H 2022. We have proposed two transformer-based approaches to handle the classification tasks. The first approach is fine-tuning single language models. The second approach is ensembling the results of BERT, RoBERTa, and ERNIE 2.0.

2021

pdf bib
ARGUABLY at ComMA@ICON: Detection of Multilingual Aggressive, Gender Biased, and Communally Charged Tweets Using Ensemble and Fine-Tuned IndicBERT
Guneet Kohli | Prabsimran Kaur | Jatin Bedi
Proceedings of the 18th International Conference on Natural Language Processing: Shared Task on Multilingual Gender Biased and Communal Language Identification

The proliferation in Social Networking has increased offensive language, aggression, and hate-speech detection, which has drawn the focus of the NLP community. However, people’s difference in perception makes it difficult to distinguish between acceptable content and aggressive/hateful content, thus making it harder to create an automated system. In this paper, we propose multi-class classification techniques to identify aggressive and offensive language used online. Two main approaches have been developed for the classification of data into aggressive, gender-biased, and communally charged. The first approach is an ensemble-based model comprising of XG-Boost, LightGBM, and Naive Bayes applied on vectorized English data. The data used was obtained using an Indic Transliteration on the original data comprising of Meitei, Bangla, Hindi, and English language. The second approach is a BERT-based architecture used to detect misogyny and aggression. The proposed model employs IndicBERT Embeddings to define contextual understanding. The results of the models are validated on the ComMA v 0.2 dataset.