Development of Conversational AI for Sleep Coaching Programme

Almost 30% of the adult population in the world is experiencing or has experience insomnia. Cognitive Behaviour Therapy for insomnia (CBT-I) is one of the most effective treatment, but it has limitations on accessibility and availability. Utilising technology is one of the possible solutions, but existing methods neglect conversational aspects, which plays a critical role in sleep therapy. To address this issue, we propose a PhD project exploring potentials of developing conversational artificial intelligence (AI) for a sleep coaching programme, which is motivated by CBT-I treatment. This PhD project aims to develop natural language processing (NLP) algorithms to allow the system to interact naturally with a user and provide automated analytic system to support human experts. In this paper, we introduce research questions lying under three phases of the sleep coaching programme: triaging, monitoring the progress, and providing coaching. We expect this research project’s outcomes could contribute to the research domains of NLP and AI but also the healthcare field by providing a more accessible and affordable sleep treatment solution and an automated analytic system to lessen the burden of human experts.


Introduction
Insomnia is one of the most common sleep disorders with a high prevalence. Approximately onethird of adults experience one or more of the symptoms of insomnia (Roth, 2007). The consequences of insomnia include not only individual problems but also societal issues, such as daytime fatigue, low energy level, which can cause depression, and even increased risk of accidents (Leger et al., 2014). The cost associated with insomnia, including direct and indirect costs, in the US is around 92.5 to 107.5 billion USD per year (Stoller, 1994). Cognitive behaviour therapy for insomnia (CBT-I) is one of the most effective solutions to treat insomnia. During CBT-I process, a person who is suffering from symptoms of insomnia (patient) will consult with a CBT-I provider (therapist) who provides support to identify behaviours, thoughts, and feelings that are related to the symptoms. Since its goal is to change the potential causes, both behavioural and cognitive factors, it produces longlasting improvement in the condition of insomnia, compared to the medication treatment (Morin et al., 2006).
Despite its effectiveness, CBT-I treatment has a limitation. From a patient's perspective, the treatment cost is high and, from clinician's perspectives, the number of patients that potentially can be treated is significantly large (Edinger and Means, 2005). Consequently, many researchers and engineers have been working on developing more accessible and affordable therapy solutions that can also lessen the burden of clinicians, such as internet-or mobile-based computerised therapy tools (Ström et al., 2004;Ritterband et al., 2009;Vincent and Lewycky, 2009;Lancee et al., 2012). These studies explored opportunities for applying technologies to automate treatment process. However, the conversational aspect, which is the core of the in-person treatment, has been neglected.
In this study, we will explore the possibilities of developing conversational AI (artificial intelligence) to make a computerised sleep therapy tool to be more close to in-person therapy. Since this research field is still in its infancy, we consider a sleep coaching programme targeting healthy people who would like to optimise their sleep, rather than a sleep therapy for patients with the chronic sleep disorder. Also, the goal of this study is to provide a user-friendly interface for users and to support human experts, rather than to replace or eliminate human-in-the-loop. Therefore, we mainly focus on two things: 1) adding a conversational feature that allows users to provide inputs to enable a natural conversation between human and the system; and 2) adding an analytic feature that automates processing user inputs to support decision making. The main research questions of this project lie under three processes of the sleep coaching programme: triage, monitor the progress, and provide coaching. Overview of research questions and overlapped components are illustrated in Figure 1.
We start our research by analysing in-person sleep treatment and existing automated tools and identifying missing gaps to decide research questions (Section 2). Then we revisit the research questions and explain the methodology to address each question (Section 3). As the first step, we introduce a pilot study and present a preliminary result to discuss and provide next steps (Section 4). Finally, we conclude this paper by summarising research questions and research plan of this PhD project (Section 5).

Related Work
In this section, we will first provide a brief overview of sleep treatment of CBT-I made by a human expert. Secondly, we will review the existing methods of automated sleep treatment tools. Lastly, we will identify missing gaps and introduce research questions.

In-person treatment
CBT-I is a sleep treatment that focuses on investigating the relationship between how we behave, how we think, and how we sleep. To achieve this, the treatment consists of multiple components: stimulus control, sleep restriction, sleep hygiene education, relaxation training, and cognitive restructuring (Perlis et al., 2006;Morin and Espie, 2007;Belanger et al., 2006). Stimulus control aims to change associations between the bedroom with habits that make sleeping more difficult (Bootzin and Perlis, 2011). Sleep restriction treatment requires patients to limit time spent in bed in order to resolve the mismatch between the time in bed and sleep time (Spielman et al., 2011). Sleep hygiene focuses on educating the patient to avoid behaviours that influence sleep (Kleitman, 1987;Hauri, 1991). Relaxation training is given to help reduce the racing thoughts (Ong et al., 2014). Cognitive restructuring targets to break the vicious circle between inaccurate thoughts about sleep and behaviours that contribute to insomnia. Standard CBT-I treatment includes three or more of these components.
Conversation between patient and therapist plays a critical role in in-person sleep treatment. In the first session of treatment, the patient will provide complaints of their sleep and the therapist will determine whether the patient is appropriate for CBT-I treatment. To identify this, the therapist should assess the patient based on the clinical interview and the completed questionnaires. Once it is determined that the patient is appropriated for the treatment, the therapist will select the treatment components and structure plan tailored to the patient. The remaining sessions will be followed depending on the stage of treatment and the degree of patient compliance. Therefore, it is important that the therapist monitors the progress, identifies the patient's difficulties, provides personalised support, and encourages the patient to complete the treatment.

Computerised treatment
One of the earliest approaches is a research work by Ström et al. (2004) investigating the feasibility of an internet-based CBT-I. They proposed a self-help program that patients provide their information and progress by completing questions and questionnaires via the internet. However, their method does not provide automated analytic features to support monitoring process done by human experts.
Later on, Ritterband et al. (2009) proposed a fully automated sleep treatment tool. They proposed an automated algorithm that can produce a personalised recommendation for sleep restriction. It also automatically sends emails of reminders. All intervention was delivered without human support and the outcome was comparable to in-person treatment. Nevertheless, it still misses the interactive conversational feature that participants could report their specific concerns or difficulties of complying the treatment.
Later studies also did not explore the opportunity to get feedback or support. For example, both Vincent and Lewycky (2009)

Missing gaps and research questions
So far, conversational aspects, which plays a critical role in in-person sleep treatment, are less studied. Our main hypothesis is that conversational AI will enable natural interaction between users and a system, throughout the treatment process, to make a computerised treatment be more close to in-person treatment. To this end, we will focus on the following research questions: RQ1. How to triage via conversation together with the completed questionnaires?
RQ2. How to monitor the progress during the sleep coaching programme?
RQ3. How to understand a user-specific situation for personalised coaching programme?
3 Research Plan RQ1: How to triage via conversation together with the completed questionnaires?
Answer to this research question entails three subtasks: The first sub-task is to assess users' complaints and identify sleep-related issues and its impacts to identify potential causes. The second subtask is to ask follow-up questions to clarify ambiguous statements and differentiate causes that have similar impacts. The third sub-task is to explain the assessment result.

Complaints assessment
Assessing the users' complaints can be reformulated as a classification task. One of the most similar approaches is a recent study conducted by Shim et al. (2020). They used neural networksbased multi-label classifier to detect pre-defined sleep issues from free-text. What makes our study more challenging is that 1) we aim to assess not only sleep issues but also impacts to identify underlying causes. And 2) we aim to incorporate the completed questionnaire results, which is different modality from free-text. For the first challenge, a naive approach is to build three separate classifiers for sleep issues, impact, and causes. It is limited, however, because these three entities are connected, such as there are causal links between sleep issues to impacts. Therefore, as the first step, we will build a directed graph, such as a Bayesian Network (BN) that each node represents the observed results of each entity. Since each node can be either freetext classification results or questionnaire results, we can address the second challenge, too. We will also explain other benefits of implementing the BN in the following paragraphs.

Follow-up question
In this project, we will explore the potential of conversational environment that the system can interact with patients. One of the benefits is that the system can actively search for additional information when it is needed. For example, the free-text inputs from users can be ambiguous. Also, multiple sleep issues could result in similar impacts so that it requires further assessment that differentiates between two or more conditions. We hypothesise that asking follow-up questions will solve these challenges by clarifying and refining it. Then the real challenge becomes how to decide 'when to ask?' and 'what to ask?'. A study by Middleton et al. (2016) addressed these challenges by framing a triage as a sequence of questions and answers. To achieve this, they encoded expert knowledge into a graph structure; each possible questions is linked to the possible answers, each of which is linked to a follow-up question. Since this approach requires human resource, we foresee to work with experts in sleep domain to encode the domain knowledge into structured form, such as a graph. Also, we will investigate whether the BN can select the most appropriate follow-up question. We will describe a preliminary experimental result of this approach in Section 4.

Triage result
The end task of the first research question is to support triage via text-based conversation and by explaining the assessment result: what was the main complaint from the user and which habits were associated with the detected complaints. We plan to take a similar approach with Chen et al. (2020) who implemented BN on top of neural networks to provide interpretability. However, compared to their task, our task is more challenging because not all information is given from the beginning. Therefore, we will model uncertainty to deal with unknown information to triage. For evaluation, we will follow a recent study by Razzaki et al. (2018) and evaluate our triage system both quantitatively and qualitatively: Quantitatively, we will calculate precision and recall of detecting sleep issues and underlying causes. Qualitatively, we will rate triage flows with the help of an expert in sleep domain.
RQ2: How to monitor the progress during the sleep coaching programme?
One of the fundamentals of sleep treatment is to monitor the progress of a participant. To monitor the progress and analyse sleep patterns, we will use a sleep diary that is a widely used method to access people's quality of sleep (Monk et al., 1994;Carney et al., 2012). The traditional sleep diary consists of a log of sleep-related activities, including bedtime, wake up time, and sleep time and heavily relies on objective values, such as total time in bed (TIB), total sleep time (TST), and sleep efficiency (SE). However, to monitor the progress and understand the user-specific condition, subjective values and context should be also considered. To achieve this, we will use a narrative sleep diary written in freetext that contains not only the time information of sleep-related events but also the rich information of context. For example, a user can describe her/his sleep in free-text to explain not only how long they slept and how many times they woke up during the night but also the quality of sleep or feeling after sleep and what disturbed their sleep. Recent work by Rick et al. (2019) takes a similar approach to obtain qualifiable insights about the subjective experience of sleep by incorporating free-text user inputs. During this project, we will investigate combine different modalities, objective and subjective values, extracted from the narrative sleep diary to assess the progress.
RQ3: How to understand a user-specific situation for personalised coaching programme?
During the sleep coaching programme that helps user change their behaviour to improve their sleep, it is critical to provide personalised coaching programme tailored to a user. To achieve this, understanding the experience of a user and identifying the user-specific issues and difficulties is essential. In this study, we will use free-text input from users that describe their experiences, thoughts, and feelings during the coaching programme. Specifically, we will consider a behaviour change programme and aim to build a model that performs aspectbased sentiment analysis on review comments from a user. Sentiment analysis (SA) is a widely used natural language processing (NLP) technique used to assess user experiences (Liu, 2012). Aspectbased sentiment analysis (ABSA) is a type of SA that aims to detect sentimental values expressed toward fine-grained aspects (Pontiki et al., 2014), rather than performing classification at the sentence level. Even though ABSA is widely studied, the majority of works are limited to the review of consumer products (Do et al., 2019). Recently, Barahona et al. (2018) conducted research on detecting mental health concepts for cognitive behaviour therapy from user inputs by reformulating it as sentiment analysis detecting negative sentiment. Similar to this, we will investigate using ABSA technique to detect concepts related to sleep health for providing personalised support and behaviour change programme by analysing user inputs during the sleep coaching programme.

Pilot study
We ran a pilot experiment to examine our assumptions for RQ1. The main goal of this pilot study as follows: 1) To build a model that classifies freetext user inputs. 2) To implement BN to select a follow-up question. Following subsections describes details of the experiments and results.

Dataset
Motivated by Shim et al. (2020), we collected freetext data via crowdsourcing platform. We also adapted their approach that asks participants to imagine they are sitting at the doctor's office and being ask to describe three different topics: sleep issues, the impact of their issues, and factors that might contribute to the issues. We cleaned the data by dropping invalid input texts and annotated to create three datasets named issues, causes, and impact, respectively.

Experimental settings
For classification, we used the pre-trained language model (Devlin et al., 2019) initialised with pretrained weights and fine-tuned on our datasets. Implementation details are given in Appendix B.
To evaluate the classifiers, we calculated macroaveraged precision, recall, and F1-score for issues, impacts, and causes, separately.
For selecting a follow-up question, we created a simple BN with three layers, which are sleep causes, issues, and impact. Details of the model are described in Appendix C. Since this is a pilot study, we only considered a few entities and created conditional probability tables (CPT) based on our limited knowledge. Note that the structure and CPT of BN are not clinically proved; The goal of this pilot study is to demonstrate the concept. At each iteration of question and answering, the BN updates its probability distribution at each node given information and selects the entity node with the highest probability of the entity is true. We qualitatively evaluated this approach. Table 2 summarises the classification result. The result shows that the model performs better on causes dataset than both on issues and impact datasets, even though there are more class categories in causes dataset. We conducted error analysis and observed that the trained models tend to misclassify similar classes. It implies that further assessment is needed to differentiate semantically close texts.   Figure 2 shows demonstrations of triage flow with BN. Each sub-figure shows a sequence of follow-up questions and answers given condition: normal BMI 1 (2a) and high BMI (2b). It is worth noting that each flow selected different follow-up question after the classification model predicted the same results. It shows the possibility of using BN to select the most appropriate follow-up question given information.

Preliminary result and next steps
Currently, we did not evaluate the system based on the final triage result and the appropriateness of follow-up question because our dataset contains only free-text describing sleep issues, causes, and impacts, seperately. In our future study, we plan to follow a similar data collection protocol of Razzaki et al. (2018). They asked doctors to play patients based on given vignettes containing simple demographics, complaint, and other information that can be obtained by either open-ended or closed-ended questions.

Conclusion
In this paper, we propose a PhD project exploring potentials of developing conversational AI for a sleep coaching programme, which is motivated by CBT-I treatment, targeting healthy people who would like to optimise their sleep. The main goal of this PhD project is to develop NLP algorithms for conversational AI to allow the system to interact naturally with a user and provide automated analytic. To this end, we identified three research questions lying under three phases of the sleep coaching programme: triage, monitor, and support. We expect this research project's outcomes could contribute to the research domains of NLP and AI but also the healthcare field by providing a more accessible and affordable sleep treatment solution and an automated analytic system to lessen the burden of human experts.

Acknowledgments
This project has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 766139. This article reflects only the author's view and the REA is not responsible for any use that may be made of the information it contains.
(a) When condition of 'high BMI' is given as false.
(b) When condition of 'high BMI' is given as true.

A Dataset for pilot study
We collected three data sets for experiment: sleep causes, issues, and impact dataset. Class labels and label descriptions of causes, issues, and impact dataset are summarised in tables 3 to 5, respectively. Each data points annotated with either one or more class labels (max. 3 classes).

B Implementation and training settings
For experiment, PyTorch version (Wolf et al., 2019) of a Bidirectional embedding representations from transformers (BERT) model (Devlin et al., 2019) was used. We initialsed the model with pre-trained weights (bert-base-uncased) obtained from language modelling with general copora (e.g., Wikicorpus, etc). For classification task, we added a  final dense layer with sigmoid activation function and used binary cross entropy loss to perform multilabel classification. Details of fine-tuning training are summarised in Table 6.

C Bayesian Network
A Bayesian Network used in a pilot study (Section 4) is illustrated in Figure 3.