Tanik Saikh

2024

Hope ‘The Paragraph Guy’ explains the rest : Introducing MeSum, the Meme Summarizer
Anas Anwarul Haq Khan | Tanik Saikh | Arpan Phukan | Asif Ekbal
Findings of the Association for Computational Linguistics: EMNLP 2024

pdf bib abs

Emojis Trash or Treasure: Utilizing Emoji to Aid Hate Speech Detection
Tanik Saikh | Soham Barman | Harsh Kumar | Saswat Sahu | Souvick Palit
Proceedings of the 21st International Conference on Natural Language Processing (ICON)

In this study, we delve into the fascinating realm of emojis and their impact on identifying hate speech in both Bengali and English languages. Through extensive exploration of various techniques, particularly the integration of Multilingual BERT (MBert) and Emoji2Vec embeddings, we strive to shed light on the immense potential of emojis in this detection process. By meticulously comparing these advanced models with conventional approaches, we uncover the intricate contextual cues that emojis bring to the table. Ultimately, our discoveries underscore the invaluable role of emojis in hate speech detection, thereby providing valuable insights for the creation of resilient and context-aware systems to combat online toxicity. Our findings showcase the potential of emojis as valuable assets rather than mere embellishments in the realm of hate speech detection. By leveraging the combined strength of MBert and Emoji2Vec, our models exhibit enhanced capabilities in deciphering the emotional subtleties often intertwined with hate speech expressions.

2022

pdf bib abs

Novelty Detection: A Perspective from Natural Language Processing
Tirthankar Ghosal | Tanik Saikh | Tameesh Biswas | Asif Ekbal | Pushpak Bhattacharyya
Computational Linguistics, Volume 48, Issue 1 - March 2022

The quest for new information is an inborn human trait and has always been quintessential for human survival and progress. Novelty drives curiosity, which in turn drives innovation. In Natural Language Processing (NLP), Novelty Detection refers to finding text that has some new information to offer with respect to whatever is earlier seen or known. With the exponential growth of information all across the Web, there is an accompanying menace of redundancy. A considerable portion of the Web contents are duplicates, and we need efficient mechanisms to retain new information and filter out redundant information. However, detecting redundancy at the semantic level and identifying novel text is not straightforward because the text may have less lexical overlap yet convey the same information. On top of that, non-novel/redundant information in a document may have assimilated from multiple source documents, not just one. The problem surmounts when the subject of the discourse is documents, and numerous prior documents need to be processed to ascertain the novelty/non-novelty of the current one in concern. In this work, we build upon our earlier investigations for document-level novelty detection and present a comprehensive account of our efforts toward the problem. We explore the role of pre-trained Textual Entailment (TE) models to deal with multiple source contexts and present the outcome of our current investigations. We argue that a multipremise entailment task is one close approximation toward identifying semantic-level non-novelty. Our recent approach either performs comparably or achieves significant improvement over the latest reported results on several datasets and across several related tasks (paraphrasing, plagiarism, rewrite). We critically analyze our performance with respect to the existing state of the art and show the superiority and promise of our approach for future investigations. We also present our enhanced dataset TAP-DLND 2.0 and several baselines to the community for further research on document-level novelty detection.

pdf bib

Novelty Detection in Community Question Answering Forums
Tirthankar Ghosal | Vignesh Edithal | Tanik Saikh | Saprativa Bhattacharjee | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 36th Pacific Asia Conference on Language, Information and Computation

pdf bib abs

A Deep Transfer Learning Method for Cross-Lingual Natural Language Inference
Dibyanayan Bandyopadhyay | Arkadipta De | Baban Gain | Tanik Saikh | Asif Ekbal
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Natural Language Inference (NLI), also known as Recognizing Textual Entailment (RTE), has been one of the central tasks in Artificial Intelligence (AI) and Natural Language Processing (NLP). RTE between the two pieces of texts is a crucial problem, and it adds further challenges when involving two different languages, i.e., in the cross-lingual scenario. This paper proposes an effective transfer learning approach for cross-lingual NLI. We perform experiments on English-Hindi language pairs in the cross-lingual setting to find out that our novel loss formulation could enhance the performance of the baseline model by up to 2%. To assess the effectiveness of our method further, we perform additional experiments on every possible language pair using four European languages, namely French, German, Bulgarian, and Turkish, on top of XNLI dataset. Evaluation results yield up to 10% performance improvement over the respective baseline models, in some cases surpassing the state-of-the-art (SOTA). It is also to be noted that our proposed model has 110M parameters which is much lesser than the SOTA model having 220M parameters. Finally, we argue that our transfer learning-based loss objective is model agnostic and thus can be used with other deep learning-based architectures for cross-lingual NLI.

2020

pdf bib abs

ScholarlyRead: A New Dataset for Scientific Article Reading Comprehension
Tanik Saikh | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the Twelfth Language Resources and Evaluation Conference

We present ScholarlyRead, span-of-word-based scholarly articles’ Reading Comprehension (RC) dataset with approximately 10K manually checked passage-question-answer instances. ScholarlyRead was constructed in semi-automatic way. We consider the articles from two popular journals of a reputed publishing house. Firstly, we generate questions from these articles in an automatic way. Generated questions are then manually checked by the human annotators. We propose a baseline model based on Bi-Directional Attention Flow (BiDAF) network that yields the F1 score of 37.31%. The framework would be useful for building Question-Answering (QA) systems on scientific articles.

2019

pdf bib abs

IITP at MEDIQA 2019: Systems Report for Natural Language Inference, Question Entailment and Question Answering
Dibyanayan Bandyopadhyay | Baban Gain | Tanik Saikh | Asif Ekbal
Proceedings of the 18th BioNLP Workshop and Shared Task

This paper presents the experiments accomplished as a part of our participation in the MEDIQA challenge, an (Abacha et al., 2019) shared task. We participated in all the three tasks defined in this particular shared task. The tasks are viz. i. Natural Language Inference (NLI) ii. Recognizing Question Entailment(RQE) and their application in medical Question Answering (QA). We submitted runs using multiple deep learning based systems (runs) for each of these three tasks. We submitted five system results in each of the NLI and RQE tasks, and four system results for the QA task. The systems yield encouraging results in all the three tasks. The highest performance obtained in NLI, RQE and QA tasks are 81.8%, 53.2%, and 71.7%, respectively.

pdf bib abs

A Deep Learning Approach for Automatic Detection of Fake News
Tanik Saikh | Arkadipta De | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 16th International Conference on Natural Language Processing

Fake news detection is a very prominent and essential task in the field of journalism. This challenging problem is seen so far in the field of politics, but it could be even more challenging when it is to be determined in the multi-domain platform. In this paper, we propose two effective models based on deep learning for solving fake news detection problem in online news contents of multiple domains. We evaluate our techniques on the two recently released datasets, namely Fake News AMT and Celebrity for fake news detection. The proposed systems yield encouraging performance, outperforming the current hand-crafted feature engineering based state-of-the-art system with a significant margin of 3.08% and 9.3% by the two models, respectively. In order to exploit the datasets, available for the related tasks, we perform cross-domain analysis (model trained on FakeNews AMT and tested on Celebrity and vice versa) to explore the applicability of our systems across the domains.

Tanik Saikh

2024

2022

2020

2019

2017

2011

2010

Co-authors

Venues