Tanik Saikh


2022

pdf bib
Novelty Detection: A Perspective from Natural Language Processing
Tirthankar Ghosal | Tanik Saikh | Tameesh Biswas | Asif Ekbal | Pushpak Bhattacharyya
Computational Linguistics, Volume 48, Issue 1 - March 2022

The quest for new information is an inborn human trait and has always been quintessential for human survival and progress. Novelty drives curiosity, which in turn drives innovation. In Natural Language Processing (NLP), Novelty Detection refers to finding text that has some new information to offer with respect to whatever is earlier seen or known. With the exponential growth of information all across the Web, there is an accompanying menace of redundancy. A considerable portion of the Web contents are duplicates, and we need efficient mechanisms to retain new information and filter out redundant information. However, detecting redundancy at the semantic level and identifying novel text is not straightforward because the text may have less lexical overlap yet convey the same information. On top of that, non-novel/redundant information in a document may have assimilated from multiple source documents, not just one. The problem surmounts when the subject of the discourse is documents, and numerous prior documents need to be processed to ascertain the novelty/non-novelty of the current one in concern. In this work, we build upon our earlier investigations for document-level novelty detection and present a comprehensive account of our efforts toward the problem. We explore the role of pre-trained Textual Entailment (TE) models to deal with multiple source contexts and present the outcome of our current investigations. We argue that a multipremise entailment task is one close approximation toward identifying semantic-level non-novelty. Our recent approach either performs comparably or achieves significant improvement over the latest reported results on several datasets and across several related tasks (paraphrasing, plagiarism, rewrite). We critically analyze our performance with respect to the existing state of the art and show the superiority and promise of our approach for future investigations. We also present our enhanced dataset TAP-DLND 2.0 and several baselines to the community for further research on document-level novelty detection.

pdf bib
A Deep Transfer Learning Method for Cross-Lingual Natural Language Inference
Dibyanayan Bandyopadhyay | Arkadipta De | Baban Gain | Tanik Saikh | Asif Ekbal
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Natural Language Inference (NLI), also known as Recognizing Textual Entailment (RTE), has been one of the central tasks in Artificial Intelligence (AI) and Natural Language Processing (NLP). RTE between the two pieces of texts is a crucial problem, and it adds further challenges when involving two different languages, i.e., in the cross-lingual scenario. This paper proposes an effective transfer learning approach for cross-lingual NLI. We perform experiments on English-Hindi language pairs in the cross-lingual setting to find out that our novel loss formulation could enhance the performance of the baseline model by up to 2%. To assess the effectiveness of our method further, we perform additional experiments on every possible language pair using four European languages, namely French, German, Bulgarian, and Turkish, on top of XNLI dataset. Evaluation results yield up to 10% performance improvement over the respective baseline models, in some cases surpassing the state-of-the-art (SOTA). It is also to be noted that our proposed model has 110M parameters which is much lesser than the SOTA model having 220M parameters. Finally, we argue that our transfer learning-based loss objective is model agnostic and thus can be used with other deep learning-based architectures for cross-lingual NLI.

2020

pdf bib
ScholarlyRead: A New Dataset for Scientific Article Reading Comprehension
Tanik Saikh | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the Twelfth Language Resources and Evaluation Conference

We present ScholarlyRead, span-of-word-based scholarly articles’ Reading Comprehension (RC) dataset with approximately 10K manually checked passage-question-answer instances. ScholarlyRead was constructed in semi-automatic way. We consider the articles from two popular journals of a reputed publishing house. Firstly, we generate questions from these articles in an automatic way. Generated questions are then manually checked by the human annotators. We propose a baseline model based on Bi-Directional Attention Flow (BiDAF) network that yields the F1 score of 37.31%. The framework would be useful for building Question-Answering (QA) systems on scientific articles.

2019

pdf bib
A Deep Learning Approach for Automatic Detection of Fake News
Tanik Saikh | Arkadipta De | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 16th International Conference on Natural Language Processing

Fake news detection is a very prominent and essential task in the field of journalism. This challenging problem is seen so far in the field of politics, but it could be even more challenging when it is to be determined in the multi-domain platform. In this paper, we propose two effective models based on deep learning for solving fake news detection problem in online news contents of multiple domains. We evaluate our techniques on the two recently released datasets, namely Fake News AMT and Celebrity for fake news detection. The proposed systems yield encouraging performance, outperforming the current hand-crafted feature engineering based state-of-the-art system with a significant margin of 3.08% and 9.3% by the two models, respectively. In order to exploit the datasets, available for the related tasks, we perform cross-domain analysis (model trained on FakeNews AMT and tested on Celebrity and vice versa) to explore the applicability of our systems across the domains.

pdf bib
IITP at MEDIQA 2019: Systems Report for Natural Language Inference, Question Entailment and Question Answering
Dibyanayan Bandyopadhyay | Baban Gain | Tanik Saikh | Asif Ekbal
Proceedings of the 18th BioNLP Workshop and Shared Task

This paper presents the experiments accomplished as a part of our participation in the MEDIQA challenge, an (Abacha et al., 2019) shared task. We participated in all the three tasks defined in this particular shared task. The tasks are viz. i. Natural Language Inference (NLI) ii. Recognizing Question Entailment(RQE) and their application in medical Question Answering (QA). We submitted runs using multiple deep learning based systems (runs) for each of these three tasks. We submitted five system results in each of the NLI and RQE tasks, and four system results for the QA task. The systems yield encouraging results in all the three tasks. The highest performance obtained in NLI, RQE and QA tasks are 81.8%, 53.2%, and 71.7%, respectively.

2017

pdf bib
Document Level Novelty Detection: Textual Entailment Lends a Helping Hand
Tanik Saikh | Tirthankar Ghosal | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 14th International Conference on Natural Language Processing (ICON-2017)

2011

pdf bib
Shared Task System Description: Measuring the Compositionality of Bigrams using Statistical Methodologies
Tanmoy Chakraborty | Santanu Pal | Tapabrata Mondal | Tanik Saikh | Sivaju Bandyopadhyay
Proceedings of the Workshop on Distributional Semantics and Compositionality

2010

pdf bib
English to Indian Languages Machine Transliteration System at NEWS 2010
Amitava Das | Tanik Saikh | Tapabrata Mondal | Asif Ekbal | Sivaji Bandyopadhyay
Proceedings of the 2010 Named Entities Workshop

pdf bib
JU_CSE_GREC10: Named Entity Generation at GREC 2010
Amitava Das | Tanik Saikh | Tapabrata Mondal | Sivaji Bandyopadhyay
Proceedings of the 6th International Natural Language Generation Conference