Jeremie Clos

2022

pdf bib abs
PriPA: A Tool for Privacy-Preserving Analytics of Linguistic Data
Jeremie Clos | Emma McClaughlin | Pepita Barnard | Elena Nichele | Dawn Knight | Derek McAuley | Svenja Adolphs
Proceedings of the Workshop on Ethical and Legal Issues in Human Language Technologies and Multilingual De-Identification of Sensitive Data In Language Resources within the 13th Language Resources and Evaluation Conference

The days of large amorphous corpora collected with armies of Web crawlers and stored indefinitely are, or should be, coming to an end. There is a wealth of hidden linguistic information that is increasingly difficult to access, hidden in personal data that would be unethical and technically challenging to collect using traditional methods such as Web crawling and mass surveillance of online discussion spaces. Advances in privacy regulations such as GDPR and changes in the public perception of privacy bring into question the problematic ethical dimension of extracting information from unaware if not unwilling participants. Modern corpora need to adapt, be focused on testing specific hypotheses, and be respectful of the privacy of the people who generated its data. Our work focuses on using a distributed participatory approach and continuous informed consent to solve these issues, by allowing participants to voluntarily contribute their own censored personal data at a granular level. We evaluate our approach in a three-pronged manner, testing the accuracy of measurement of statistical measures of language with respect to standard corpus linguistics tools, evaluating the usability of our application with a participant involvement panel, and using the tool for a case study on health communication.

2020

In this paper we report progress on a novel explainable artificial intelligence (XAI) initiative applying Natural Language Processing (NLP) with elements of codesign to develop a text classifier for application in psychotherapy training. The task is to produce a tool that will facilitate therapists to review their sessions by automatically labelling transcript text with levels of interaction for patient activation in known psychological processes, using XAI to increase their trust in the model’s suggestions and client trajectory predictions. After pre-processing of the language features extracted from professionally annotated therapy session transcripts, we apply a supervised machine learning approach (CHAID) to classify interaction labels (negative, neutral, positive). Weighted samples are used to overcome class imbalanced data. The results show this initial model can make useful distinctions among the three labels of patient activation with 74% accuracy and provide insight into its reasoning. This ongoing project will additionally evaluate which XAI approaches can be used to increase the transparency of the tool to end users, exploring whether direct involvement of stakeholders improves usability of the XAI interface and therefore trust in the solution.

Co-authors

Venues

legal1
nl4xai1

Fix data