MedDialog-FR: A French Version of the MedDialog Corpus for Multi-label Classification and Response Generation Related to Women’s Intimate Health

Xingyu Liu; Vincent Segonne; Aidan Mannion; Didier Schwab; Lorraine Goeuriot; François Portet

MedDialog-FR: A French Version of the MedDialog Corpus for Multi-label Classification and Response Generation Related to Women’s Intimate Health

Xingyu Liu, Vincent Segonne, Aidan Mannion, Didier Schwab, Lorraine Goeuriot, François Portet

Abstract

This article presents MedDialog-FR, a large publicly available corpus of French medical conversations for the medical domain. Motivated by the lack of French dialogue corpora for data-driven dialogue systems and the paucity of available information related to women’s intimate health, we introduce an annotated corpus of question-and-answer dialogues between a real patient and a real doctor concerning women’s intimate health. The corpus is composed of about 20,000 dialogues automatically translated from the English version of MedDialog-EN. The corpus test set is composed of 1,400 dialogues that have been manually post-edited and annotated with 22 categories from the UMLS ontology. We also fine-tuned state-of-the-art reference models to automatically perform multi-label classification and response generation to give an initial performance benchmark and highlight the difficulty of the tasks.

Anthology ID:: 2024.cl4health-1.21
Volume:: Proceedings of the First Workshop on Patient-Oriented Language Processing (CL4Health) @ LREC-COLING 2024
Month:: May
Year:: 2024
Address:: Torino, Italia
Editors:: Dina Demner-Fushman, Sophia Ananiadou, Paul Thompson, Brian Ondov
Venues:: CL4Health | WS
SIG:
Publisher:: ELRA and ICCL
Note:
Pages:: 173–183
Language:
URL:: https://aclanthology.org/2024.cl4health-1.21
DOI:
Bibkey:
Cite (ACL):: Xingyu Liu, Vincent Segonne, Aidan Mannion, Didier Schwab, Lorraine Goeuriot, and François Portet. 2024. MedDialog-FR: A French Version of the MedDialog Corpus for Multi-label Classification and Response Generation Related to Women’s Intimate Health. In Proceedings of the First Workshop on Patient-Oriented Language Processing (CL4Health) @ LREC-COLING 2024, pages 173–183, Torino, Italia. ELRA and ICCL.
Cite (Informal):: MedDialog-FR: A French Version of the MedDialog Corpus for Multi-label Classification and Response Generation Related to Women’s Intimate Health (Liu et al., CL4Health-WS 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.cl4health-1.21.pdf

PDF Cite Search