Vishwani Gupta
2023
Exploring Unsupervised Semantic Similarity Methods for Claim Verification in Health Care News Articles
Vishwani Gupta
|
Astrid Viciano
|
Holger Wormer
|
Najmehsadat Mousavinezhad
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing
In the 21st century, the proliferation of fake information has emerged as a significant threat to society. Particularly, healthcare medical reporters face challenges when verifying claims related to treatment effects, side effects, and risks mentioned in news articles, relying on scientific publications for accuracy. The accurate communication of scientific information in news articles has long been a crucial concern in the scientific community, as the dissemination of misinformation can have dire consequences in the healthcare domain. Healthcare medical reporters would greatly benefit from efficient methods to retrieve evidence from scientific publications supporting specific claims. This paper delves into the application of unsupervised semantic similarity models to facilitate claim verification for medical reporters, thereby expediting the process. We explore unsupervised multilingual evidence retrieval techniques aimed at reducing the time required to obtain evidence from scientific studies. Instead of employing content classification, we propose an approach that retrieves relevant evidence from scientific publications for claim verification within the healthcare domain. Given a claim and a set of scientific publications, our system generates a list of the most similar paragraphs containing supporting evidence. Furthermore, we evaluate the performance of state-of-the-art unsupervised semantic similarity methods in this task. As the claim and evidence are present in a cross-lingual space, we find that the XML-RoBERTa model exhibits high accuracy in achieving our objective. Through this research, we contribute to enhancing the efficiency and reliability of claim verification for healthcare medical reporters, enabling them to accurately source evidence from scientific publications in a timely manner.
2019
Improving Word Embeddings Using Kernel PCA
Vishwani Gupta
|
Sven Giesselbach
|
Stefan Rüping
|
Christian Bauckhage
Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)
Word-based embedding approaches such as Word2Vec capture the meaning of words and relations between them, particularly well when trained with large text collections; however, they fail to do so with small datasets. Extensions such as fastText reduce the amount of data needed slightly, however, the joint task of learning meaningful morphology, syntactic and semantic representations still requires a lot of data. In this paper, we introduce a new approach to warm-start embedding models with morphological information, in order to reduce training time and enhance their performance. We use word embeddings generated using both word2vec and fastText models and enrich them with morphological information of words, derived from kernel principal component analysis (KPCA) of word similarity matrices. This can be seen as explicitly feeding the network morphological similarities and letting it learn semantic and syntactic similarities. Evaluating our models on word similarity and analogy tasks in English and German, we find that they not only achieve higher accuracies than the original skip-gram and fastText models but also require significantly less training data and time. Another benefit of our approach is that it is capable of generating a high-quality representation of infrequent words as, for example, found in very recent news articles with rapidly changing vocabularies. Lastly, we evaluate the different models on a downstream sentence classification task in which a CNN model is initialized with our embeddings and find promising results.