Sougata Sarkar
2025
CryptOpiQA: A new Opinion and Question Answering dataset on Cryptocurrency
Sougata Sarkar
|
Aditya Badwal
|
Amartya Roy
|
Koustav Rudra
|
Kripabandhu Ghosh
Proceedings of the 31st International Conference on Computational Linguistics
Cryptocurrency has attracted a lot of public attention and opinion worldwide. Users have different kinds of information needs regarding such topics and publicly available information is a good resource to satisfy those information needs. In this paper, we investigate the public opinion on cryptocurrency and bitcoin on two social media – Twitter and Reddit. We have created a multi-level dataset CryptOpiQA and garnered valuable insights. The dataset contains both gold standard (manually annotated) and silver standard (inferred from the gold standard) labels. As a part of this dataset, we have also created a Question Answering sub-corpus. We have used state-of-the-art LLMs and advanced techniques such as retrieval augmented generation (RAG) to improve question-answering (QnA) results. We believe this dataset and the analysis will be useful in studying user opinions and Question-Answering on cryptocurrency in the research community.