PerCQA: Persian Community Question Answering Dataset

Naghme Jamali, Yadollah Yaghoobzadeh, Heshaam Faili


Abstract
Community Question Answering (CQA) forums provide answers to many real-life questions. These forums are trendy among machine learning researchers due to their large size. Automatic answer selection, answer ranking, question retrieval, expert finding, and fact-checking are example learning tasks performed using CQA data. This paper presents PerCQA, the first Persian dataset for CQA. This dataset contains the questions and answers crawled from the most well-known Persian forum. After data acquisition, we provide rigorous annotation guidelines in an iterative process and then the annotation of question-answer pairs in SemEvalCQA format. PerCQA contains 989 questions and 21,915 annotated answers. We make PerCQA publicly available to encourage more research in Persian CQA. We also build strong benchmarks for the task of answer selection in PerCQA by using mono- and multi-lingual pre-trained language models.
Anthology ID:
2022.lrec-1.654
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
6083–6092
Language:
URL:
https://aclanthology.org/2022.lrec-1.654
DOI:
Bibkey:
Cite (ACL):
Naghme Jamali, Yadollah Yaghoobzadeh, and Heshaam Faili. 2022. PerCQA: Persian Community Question Answering Dataset. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 6083–6092, Marseille, France. European Language Resources Association.
Cite (Informal):
PerCQA: Persian Community Question Answering Dataset (Jamali et al., LREC 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.lrec-1.654.pdf
Data
PerCQAInsuranceQATrecQAWikiQA