Amartya Roy

2025

CryptOpiQA: A new Opinion and Question Answering dataset on Cryptocurrency
Sougata Sarkar | Aditya Badwal | Amartya Roy | Koustav Rudra | Kripabandhu Ghosh
Proceedings of the 31st International Conference on Computational Linguistics

Cryptocurrency has attracted a lot of public attention and opinion worldwide. Users have different kinds of information needs regarding such topics and publicly available information is a good resource to satisfy those information needs. In this paper, we investigate the public opinion on cryptocurrency and bitcoin on two social media – Twitter and Reddit. We have created a multi-level dataset CryptOpiQA and garnered valuable insights. The dataset contains both gold standard (manually annotated) and silver standard (inferred from the gold standard) labels. As a part of this dataset, we have also created a Question Answering sub-corpus. We have used state-of-the-art LLMs and advanced techniques such as retrieval augmented generation (RAG) to improve question-answering (QnA) results. We believe this dataset and the analysis will be useful in studying user opinions and Question-Answering on cryptocurrency in the research community.

pdf bib abs

On the effective transfer of knowledge from English to Hindi Wikipedia
Paramita Das | Amartya Roy | Ritabrata Chakraborty | Animesh Mukherjee
Proceedings of the 31st International Conference on Computational Linguistics: Industry Track

Although Wikipedia is the largest multilingual encyclopedia, it remains inherently incomplete. There is a significant disparity in the quality of content between high-resource languages (HRLs, e.g., English) and low-resource languages (LRLs, e.g., Hindi), with many LRL articles lacking adequate information. To bridge these content gaps, we propose a lightweight framework to enhance knowledge equity between English and Hindi. In case the English Wikipedia page is not up-to-date, our framework extracts relevant information from external resources readily available (such as English books), and adapts it to align with Wikipedia’s distinctive style, including its neutral point of view (NPOV) policy, using in-context learning capabilities of large language models. The adapted content is then machine-translated into Hindi for integration into the corresponding Wikipedia articles. On the other hand, if the English version is comprehensive and up-to-date, the framework directly transfers knowledge from English to Hindi. Our framework effectively generates new content for Hindi Wikipedia sections, enhancing Hindi Wikipedia articles respectively by 65% and 62% according to automatic and human judgment-based evaluations.

pdf bib abs

Causal-LLM: A Unified One-Shot Framework for Prompt- and Data-Driven Causal Graph Discovery
Amartya Roy | N Devharish | Shreya Ganguly | Kripabandhu Ghosh
Findings of the Association for Computational Linguistics: EMNLP 2025

Current causal discovery methods using Large Language Models (LLMs) often rely on pairwise or iterative strategies, which fail to capture global dependencies, amplify local biases, and reduce overall accuracy. This work introduces a unified framework for one-step full causal graph discovery through: (1) Prompt-based discovery with in-context learning when node metadata is available, and (2) Causal_llm, a data-driven method for settings without metadata. Empirical results demonstrate that the prompt-based approach outperforms state-of-the-art models (GranDAG, GES, ICA-LiNGAM) by approximately 40% in edge accuracy on datasets like Asia and Sachs, while maintaining strong performance on more complex graphs (ALARM, HEPAR2). Causal_llm consistently excels across all benchmarks, achieving 50% faster inference than reinforcement learning-based methods and improving precision by 25% in fairness-sensitive domains such as legal decision-making. We also introduce two domain-specific DAGs—one for bias propagation and another for legal reasoning under the Bhartiya Nyaya Sanhita—demonstrating LLMs’ capability for systemic, real-world causal discovery.

Co-authors

Venues

coling2
findings1

Fix author