CIPHE: A Framework for Document Cluster Interpretation and Precision from Human Exploration

Anton Eklund, Mona Forsman, Frank Drewes


Abstract
Document clustering models serve unique application purposes, which turns model quality into a property that depends on the needs of the individual investigator. We propose a framework, Cluster Interpretation and Precision from Human Exploration (CIPHE), for collecting and quantifying human interpretations of cluster samples. CIPHE tasks survey participants to explore actual document texts from cluster samples and records their perceptions. It also includes a novel inclusion task that is used to calculate the cluster precision in an indirect manner. A case study on news clusters shows that CIPHE reveals which clusters have multiple interpretation angles, aiding the investigator in their exploration.
Anthology ID:
2024.nlp4dh-1.52
Volume:
Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities
Month:
November
Year:
2024
Address:
Miami, USA
Editors:
Mika Hämäläinen, Emily Öhman, So Miyagawa, Khalid Alnajjar, Yuri Bizzoni
Venue:
NLP4DH
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
536–548
Language:
URL:
https://aclanthology.org/2024.nlp4dh-1.52
DOI:
Bibkey:
Cite (ACL):
Anton Eklund, Mona Forsman, and Frank Drewes. 2024. CIPHE: A Framework for Document Cluster Interpretation and Precision from Human Exploration. In Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities, pages 536–548, Miami, USA. Association for Computational Linguistics.
Cite (Informal):
CIPHE: A Framework for Document Cluster Interpretation and Precision from Human Exploration (Eklund et al., NLP4DH 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.nlp4dh-1.52.pdf