An Integrative Survey on Mental Health Conversational Agents to Bridge Computer Science and Medical Perspectives

Mental health conversational agents (a.k.a. chatbots) are widely studied for their potential to offer accessible support to those experiencing mental health challenges. Previous surveys on the topic primarily consider papers published in either computer science or medicine, leading to a divide in understanding and hindering the sharing of beneficial knowledge between both domains. To bridge this gap, we conduct a comprehensive literature review using the PRISMA framework, reviewing 534 papers published in both computer science and medicine. Our systematic review reveals 136 key papers on building mental health-related conversational agents with diverse characteristics of modeling and experimental design techniques. We find that computer science papers focus on LLM techniques and evaluating response quality using automated metrics with little attention to the application while medical papers use rule-based conversational agents and outcome metrics to measure the health outcomes of participants. Based on our findings on transparency, ethics, and cultural heterogeneity in this review, we provide a few recommendations to help bridge the disciplinary divide and enable the cross-disciplinary development of mental health conversational agents.


Introduction
The proliferation of conversational agents (CAs), also known as chatbots or dialog systems, has been spurred by advancements in Natural Language Processing (NLP) technologies.Their application spans diverse sectors, from education (Okonkwo and Ade-Ibijola, 2021;Durall and Kapros, 2020) to e-commerce (Shenoy et al., 2021), demonstrating their increasing ubiquity and potency.
The utility of CAs within the mental health domain has been gaining recognition.Over 30% of the world's population suffers from one or more mental health conditions; about 75% individuals in low and middle-income countries and about 50% individuals in high-income countries do not receive care and treatment (Kohn et al., 2004;Arias et al., 2022).The sensitive (and often stigmatized) nature of mental health discussions further exacerbates this problem, as many individuals find it difficult to disclose their struggles openly (Corrigan and Matthews, 2003).
Conversational agents like Woebot (Fitzpatrick et al., 2017) and Wysa (Inkster et al., 2018) were some of the first mobile applications to address this issue.They provide an accessible and considerably less intimidating platform for mental health support, thereby assisting a substantial number of individuals.Their effectiveness highlights the potential of mental health-focused CAs as one of the viable solutions to ease the mental health disclosure and treatment gap.
Despite the successful implementation of certain CAs in mental health, a significant disconnect persists between research in computer science (CS) and medicine.This disconnect is particularly evident when we consider the limited adoption of advanced NLP (e.g.large language models) models in the research published in medicine.While CS researchers have made substantial strides in NLP, there is a lack of focus on the human evaluation and direct impacts these developments have on patients.Furthermore, we observe that mental health CAs are drawing significant attention in medicine, yet remain underrepresented in health-applicationsfocused research in NLP.This imbalance calls for a more integrated approach in future studies to optimize the potential of these evolving technologies for mental health applications.
In this paper, we present a comprehensive analysis of academic research related to mental health conversational agents, conducted within the domains of CS and medicine 1 .Employing the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) framework (Moher et al., 2010), we systematically reviewed 136 pertinent papers to discern the trends and research directions in the domain of mental health conversational agents over the past five years.We find that there is a disparity in research focus and technology across communities, which is also shown in the differences in evaluation.Furthermore, we point out the issues that apply across domains, including transparency and language/cultural heterogeneity.
The primary objective of our study is to conduct a systematic and transparent review of mental health CA research papers across the domains of CS and medicine.This process aims not only to bridge the existing gap between these two broad disciplines but also to facilitate reciprocal learning and strengths sharing.In this paper, we aim to address the following key questions: 1. What are the prevailing focus and direction of research in each of these domains?
2. What key differences can be identified between the research approaches taken by each domain?
3. How can we augment and improve mental health CA research methods?
2 Prior Survey Papers Mental health conversational agents are discussed in several non-CS survey papers, with an emphasis on their usability in psychiatry (Vaidyam et al., 2019;Montenegro et al., 2019;Laranjo et al., 2018), and users' acceptability (Koulouri et al., 2022;Gaffney et al., 2019).These survey papers focus on underpinning theory (Martinengo et al., 2022), standardized psychological outcomes for evaluation (Vaidyam et al., 2019;Gaffney et al., 2019) in addition to accessibility (Su et al., 2020), safety (Parmar et al., 2022) and validity (Pacheco-Lorenzo et al., 2021;Wilson and Marasoiu, 2022) of CAs.Contrary to surveys for medical audiences, NLP studies mostly focus on the quality of the generated response from the standpoint of text generation.Valizadeh and Parde (2022) in their latest survey, reviewed 70 articles and investigated task-oriented healthcare dialogue systems from a technical perspective.The discussion focuses on the system architecture and design of CAs.The majority of healthcare CAs were found to have pipeline archi- Surveys from the rest of CS cover HCI (de Souza et al., 2022) and the system design of CAs (Dev et al., 2022;Narynov et al., 2021a).de Souza et al. ( 2022) analyzed 6 mental health mobile applications from an HCI perspective and suggested 24 design considerations including empathetic conversation style, probing, and session duration for effective dialogue.Damij and Bhattacharya (2022) proposed three key dimensions namely people (citizen centric goals ), process (regulations and governance) and AI technology to consider when designing public care CAs.
These survey papers independently provide an in-depth understanding of advancements and challenges in the CS and medical domains.However, there is a lack of studies that can provide a joint appraisal of developments to enable cross-learning across these domains.With this goal, we consider research papers from medicine (PubMed), NLP (the ACL Anthology), and the rest of CS (ACM, AAAI, IEEE) to examine the disparities in goals, methods, and evaluations of research related to mental health conversational agents.

Paper Databases
We source papers from eminent databases in the fields of NLP, the rest of CS, and medicine, as these are integral knowledge areas in the study of mental health CA.These databases include the ACL Anthology (referred to as ACL throughout this paper)2 , AAAI3 , IEEE4 , ACM5 , and PubMed6 .ACL is recognized as a leading repository that highlights pioneering research in NLP.AAAI features cuttingedge studies in AI.IEEE, a leading community, embodies the forefront of engineering and technology research.ACM represents the latest trends in Human Computer Interaction (HCI) along with several other domains of CS.PubMed, the largest search engine for science and biomedical topics including psychology, psychiatry, and informatics among others provides extensive coverage of the medical spectrum.
Drawing on insights from prior literature reviews (Valizadeh and Parde, 2022;Montenegro et al., 2019;Laranjo et al., 2018) and discussion with experts from both the CS and medical domains, we opt for a combination of specific keywords.These search terms represent both our areas of focus: conversational agents ("conversational agent", "chatbot") and mental health ("mental health", "depression").Furthermore, we limit our search criteria to the paper between 2017 to 2022 to cover the most recent articles.We also apply the "research article" filter on ACM search, and "Free Full Text or Full Text" for PubMed search.Moreover, we manually add 3 papers recommended by the domain experts (Fitzpatrick et al., 2017;Laranjo et al., 2018;Montenegro et al., 2019).This results in 534 papers.

Screening Process
For subsequent steps in the screening process, we adhere to a set of defined inclusion criteria.Specif- ically, we include a paper if it met the following conditions for a focused and relevant review of the literature that aligns with the objectives of our study: • Primarily focused on CAs irrespective of modality, such as text, speech, or embodied.• Related to mental health and well-being.
These could be related to depression, PTSD, or other conditions defined in the DSM-IV (Bell, 1994) or other emotion-related intervention targets such as stress.• Contribute towards directly improving mental health CAs.This could be proposing novel models or conducting user studies.The initial step in our screening process is title screening, in which we examine all titles, retaining those that are related to either CA or mental health.Our approach is deliberately inclusive during this phase to maximize the recall.As a result, out of 534 papers, we keep 302 for the next step.
Following this, we proceed with abstract screening.In this stage, we evaluate whether each paper meets our inclusion criteria.To enhance the accuracy and efficiency of our decision-making process, we extract the ten most frequent words from the full text of each paper to serve as keywords.These keywords provide an additional layer of verification, assisting our decision-making process.Following this step, we are left with a selection of 157 papers.
The final step is full-text screening.When we verify if a paper meets the inclusion criteria, we extract key features (such as model techniques and evaluations) from the paper and summarize them in tables (see appendix).Simultaneously, we highlight and annotate the papers' PDF files to provide evidence supporting our claims about each feature similar to the methodology used in Howcroft et al. (2020).This process is independently conducted by two co-authors on a subset of 25 papers, and the annotations agree with each other.Furthermore, the two co-authors also agree upon the definition of features, following which all the remaining papers receive one annotation. 7 The final corpus contains 136 papers: 9 from ACL, 4 from AAAI, 20 from IEEE, 40 from ACM, and 63 from PubMed.We categorize these papers into four distinct groups: 102 model/experiment papers, 20 survey papers, and the remaining 14 papers are classified as 'other'.Model papers are articles whose primary focus is on the construction and explanation of a theoretical model, while experimental papers are research studies that conduct specific experiments on the models to answer pertinent research questions.We combine experiment and model papers together because experimental papers often involve testing on models, while model papers frequently incorporate evaluations through experiments.The 'other' papers include dataset papers, summary papers describing the proceedings of a workshop, perspectives/viewpoint papers, and design science research papers.In this paper, we focus on analyzing the experiment/model and survey papers, which have a more uniform set of features.
In this section, we briefly summarize the observations from the different features we extracted.

Language
We identify if there is a predominant language associated with either the data used for the models or if there is a certain language proficiency that was a part of the inclusion criteria for participants.Our findings, summarized in Table 2, reveal that English dominates these studies with over 71% of the papers utilizing data and/or participants proficient in English.Despite a few (17%) papers emerging from East Asia and Europe, we notice that studies in low-resource languages are relatively rare.

Mental Health Category
Most of the papers ( 43%) we reviewed do not deal with a specific mental health condition but work towards general mental health well-being (Saha et al., 2022a).The methods proposed in such papers are applicable to the symptoms associated with a broad range of mental health issues (e.g.emo- tional dysregulation).Some papers, on the other hand, are more tailored to address the characteristics of targeted mental health conditions.As shown in Table 3, depression and anxiety are two major mental health categories being dealt with, reflecting the prevalence of these conditions (Eagle et al., 2022).Other categories include stress management (Park et al., 2019;Gabrielli et al., 2021); sexual abuse, to help survivors of sexual abuse (Maeng and Lee, 2022;Park and Lee, 2021), and social isolation, mainly targeted toward older adults (Sidner et al., 2018;Razavi et al., 2022).Less-studied categories include affective disorders (Maharjan et al., 2022a,b), COVID-19-related mental health issues (Kim et al., 2022;Ludin et al., 2022), eating disorders (Beilharz et al., 2021), andPTSD (Han et al., 2021).

Target Demographic
Most of the papers (>65%) do not specify the target demographic of users for their CAs.The target demographic distribution is shown in Table 4.An advantage of the models proposed in these papers is that they could potentially offer support to a broad group of users irrespective of the underlying mental health condition.Papers without a target demographic and a target mental health category focus on proposing methods such as using generative language models for psychotherapy (Das et al., 2022a), or to address specific modules of the CAs such as leveraging reinforcement learning for response generation (Saha et al., 2022b).
On the other hand, 31% papers focus on one specific user group such as young individuals, students, women, older adults, etc, to give advanced assistance.Young individuals, including adolescents and teenagers, received the maximum attention (Rahman et al., 2021).Several papers also  2021).Papers targeting older adults are mainly designed for companionship and supporting isolated elders (Sidner et al., 2018;Razavi et al., 2022).

Model Technique
Development of Large Language Models such as GPT-series (Radford et al., 2019;Brown et al., 2020) greatly enhanced the performance of generative models, which in turn made a significant impact on the development of CAs (Das et al., 2022b;Nie et al., 2022).However, as shown in Table 5, LLMs are yet to be utilized in the development of mental health CAs (as of the papers reviewed in this study), especially in medicine.No paper from PubMed in our final list dealt with generative models, with the primary focus being rule-based and retrieval-based CAs.
Rule-based models operate on predefined rules and patterns such as if-then statements or decision trees to match user inputs with predefined responses.The execution of Rule-based CAs can be straightforward and inexpensive, but developing and maintaining a comprehensive set of rules can be challenging.Retrieval-based models rely on a predefined database of responses to generate replies.They use techniques like keyword matching (Daley et al., 2020), similarity measures (Collins et al., 2022), or information retrieval (Morris et al., 2018)  and Eliza (Weizenbaum, 1966).
tures.While they can often generate more diverse and contextually relevant responses compared to rule-based or retrieval-based models, they could suffer from hallucination and inaccuracies (Azaria and Mitchell, 2023).

Outsourced Models
Building a CA model from scratch could be challenging for several reasons such as a lack of sufficient data, compute resources, or generalizability.Publicly available models and architectures have made building CAs accessible.Google Dialogflow (Google, 2021) and Rasa (Bocklisch et al., 2017) are the two most used outsourced platforms and frameworks.Alexa, DialoGPT (Zhang et al., 2019), GPT (2 and 3) (Radford et al., 2019;Brown et al., 2020) and X2AI (now called Cass) (Cass, 2023) are also frequently used for building CA models.A summary can be found in Table 6.Google Dialogflow is a conversational AI platform developed by Google that enables developers to build and deploy chatbots and virtual assistants across various platforms.Rasa is an opensource conversational AI framework that empowers developers to create and deploy contextual chatbots and virtual assistants with advanced natural language understanding capabilities.Alexa is a voice-controlled virtual assistant developed by Amazon.It enables users to interact with a wide range of devices and services using voice commands, offering capabilities such as playing music, answering questions, and providing personalized recommendations.DialoGPT is a large, pre-trained neural conversational response generation model that is trained on the GPT2 model with 147M conversation-like exchanges from Reddit.X2AI is 9 https://manychat.com the leading mental health AI assistant that supports over 30M individuals with easy access.

Evaluation
Automatic: Mental health CAs are evaluated with various methods and metrics.Multiple factors, including user activity (total sessions, total time, days used, total word count), user utterance (sentiment analysis, LIWC (Pennebaker et al., 2015)), CA response quality (BLEU (Papineni et al., 2002), ROUGE-L (Lin, 2004), lexical diversity, perplexity), and performance of CA's sub-modules (classification f1 score, negative log-likelihood) are measured and tested.We find that papers published in the CS domain focus more on technical evaluation, while the papers published in medicine are more interested in user data.
Human outcomes: Human evaluation using survey assessment is the most prevalent method to gauge mental health CAs' performance.Some survey instruments measure the pre-and post-study status of participants and evaluate the impact of the CA by comparing mental health (e.g.PHQ-9 (Kroenke et al., 2001), GAD-7 (Spitzer et al., 2006), BFI-10 (Rammstedt et al., 2013)) and mood scores (e.g.WHO-5 (Topp et al., 2015)), or collecting user feedback on CA models (usability, difficulty, appropriateness), or asking a group of individuals to annotate user logs or utterances to collect passive feedbacks (self-disclosure level, competence, motivational).

Ethical Considerations
Mental health CAs inevitably work with sensitive data, including demographics, Personal Identifiable Information (PII), and Personal Health Information (PHI).Thus, careful ethical consideration and a high standard of data privacy must be applied in the studies.Out of the 89 papers that include human evaluations, approximately 70% (62 papers) indicate that they either have been granted approval by Institutional Review Boards (IRB) or ethics review committees or specified that ethical approval is not a requirement based on local policy.On the other hand, there are 24 papers that do not mention seeking ethical approval or consequent considerations in the paper.Out of these 24 papers that lack a statement on ethical concerns, 21 papers are published in the field of CS.

Disparity in Research Focus
Mental health Conversational Agents require expert knowledge from different domains.However, the papers we reviewed, treat this task quite differently, evidenced by the base rates of the number of papers matching our inclusion criteria.For instance, there are over 28,000 articles published in the ACL Anthology with the keywords "chatbot" or "conversational agent", which reveals the popularity of this topic in the NLP domain.However, there are only 9 papers related to both mental health and CA, which shows that the focus of NLP researchers is primarily concentrated on the technical development of CA models, less on its applications, including mental health.AAAI shares a similar trend as ACL.However, there are a lot of related papers to mental health CAs in IEEE and ACM, which show great interest from the engineering and HCI community.PubMed represents the latest trend of research in the medical domain, and it has the largest number of publications that fit our inclusion criteria.While CS papers mostly do not have a specific focus on the mental health category for which CAs are being built, papers published in the medical domain often tackle specific mental health categories.

Technology Gap
CS and medical domains are also different in the technical aspects of the CA model.In the CS domain (ACL, AAAI, IEEE, ACM), 41 (of 73 papers) developed CA models, while 14 (out of 63) from the medical domain (PubMed) developed models.Among these papers, 8 from the CS domain are based on generative methods, but no paper in PubMed uses this technology.The NLP community is actively exploring the role of generative LLMs (e.g.GPT-4) in designing CAs including mental healthcare-related CAs (Das et al., 2022a;Saha et al., 2022b;Yan and Nakashole, 2021).With the advent of more sophisticated LLMs, fluency, repetitions and, ungrammatical formations are no longer concerns for dialogue generation.However, stochastic text generation coupled with black box architecture prevents wider adoption of these models in the health sector (Vaidyam et al., 2019).Unlike task-oriented dialogues, mental health domain CAs predominantly involve unconstrained conversation style for talk-therapy that can benefit from the advancements in LLMs (Abd-Alrazaq et al., 2021).
PubMed papers rather focus on retrieval-based and rule-based methods, which are, arguably, previous-generation CA models as far as the technical complexity is concerned.This could be due to a variety of factors such as explainability, accuracy, and reliability which are crucial when dealing with patients.

Response Quality vs Health Outcome
The difference in evaluation also reveals the varying focus across CS and medicine domains.From the CS domains, 30 (of 59 papers) applied automatic evaluation, which checks both model's performance (e.g.BLEU, ROUGE-L, perplexity) and participant's CA usage (total sessions, word count, interaction time).In contrast, only 13 out of 43 papers from PubMed used automatic evaluation, and none of them investigated the models' performance.
The difference is also spotted in human evaluation.40 (of 43 papers) from PubMed consist of human outcome evaluation, and they cover a wide range of questionnaires to determine participants' status (e.g.PHQ-9, GAD-7, WHO-5).The focus is on users' psychological well-being and evaluating the chatbot's suitability in the clinical setup (Martinengo et al., 2022).Although these papers do not test the CA model's performance through automatic evaluation, they asked for participants' ratings to oversee their model's quality (e.g.helpfulness, System Usability Scale (Brooke et al., 1996), WAI-SR (Munder et al., 2010)).
All 6 ACL papers that satisfied our search criteria, solely focus on dialogue quality (e.g.fluency, friendliness etc.) with no discussion on CA's effect on users' well-being through clinical measures such as PHQ-9.CAs that aim to be the first point of contact for users seeking mental health support, should have clinically validated mechanisms to monitor the well-being of their users (Pacheco-Lorenzo et al., 2021;Wilson and Marasoiu, 2022).Moreover, the mental health CAs we review are designed without any underlying theory for psychotherapy or behavior change that puts the utility of CAs providing emotional support to those suffering from mental health challenges in doubt.

Transparency
None of the ACL papers that we reviewed released their model or API.Additionally, a baseline or comparison with the existing state-of-the-art model is often missing in the papers.There is no standard-ized outcome reporting procedure in both medicine and CS domains (Vaidyam et al., 2019).For instance, Valizadeh and Parde (2022) raised concerns about the replicability of evaluation results and transparency for healthcare CAs.We acknowledge the restrictions posed to making the models public due to the sensitive nature of the data.However, providing APIs could be a possible alternative to enable comparison for future studies.To gauge the true advantage of mental health CAs in a clinical setup, randomized control trials are an important consideration that is not observed in NLP papers.Further, standardized benchmark datasets for evaluating mental health CAs could be useful in increasing transparency.

Language and Cultural Heterogeneity
Over 75% of the research papers in our review cater to English-speaking participants struggling with depression and anxiety.Chinese and Korean are the two languages with the highest number of research papers following English, even though Chinese is the most populous language in the world.Future works could consider tapping into a diverse set of languages that also have a lot of data available -for instance, Hindi, Arabic, French, Russian, and Japanese, which are among the top 10 most spoken languages in the world.The growing prowess of multilingual LLMs could be an incredible opportunity to transfer universally applicable development in mental health CAs to low-resource languages while being mindful of the racial and cultural heterogeneity which several multilingual models might miss due to being trained on largely English data (Bang et al., 2023).

Conclusion
In this paper, we used the PRISMA framework to systematically review the recent studies about mental health CA across both CS and medical domains.From the well-represented databases in both domains, we begin with 865 papers based on a keyword search to identify mental health-related conversational agent papers and use title, abstract, and full-text screening to retain 136 papers that fit our inclusion criteria.Furthermore, we extract a wide range of features from model and experiment papers, summarizing attributes in the fields of general features, techniques, appearance, and experiment.Based on this information, we find that there is a gap between CS and medicine in mental health CA studies.They vary in research focus, technology, and evaluation purposes.We also identify common issues that lie between domains, including transparency and language/cultural heterogeneity.

Potential Recommendations
We systematically study the difference between domains and show that learning from each other is highly beneficial.Since interdisciplinary works consist of a small portion of our final list (20 over 102 based on author affiliations on papers; 7 from ACM, 2 from IEEE, and 11 from PubMed), we suggest more collaborations to help bridge the gap between the two communities.For instance, NLP (and broadly CS) papers on mental health CAs would benefit from adding pre-post analysis on human feedback and considering ethical challenges by requesting a review of an ethics committee.Further, studies in medicine could benefit by tapping into the latest developments in generative methods in addition to the commonly used rule-based methods.In terms of evaluation, both the quality of response by the CAs (in terms of automatic metrics such as BLEU, ROUGE-L, perplexity, and measures of dialogue quality) as well as the effect of CA on users' mental states (in terms of mental health-specific survey inventories) could be used to assess the performance of mental health CAs.Moreover, increasing the language coverage to include non-English data/participants and adding cultural heterogeneity while providing APIs to compare against current mental health CAs would help in addressing the challenge of mental health care support with a cross-disciplinary effort.

Limitations
This survey paper has several limitations.Our search criteria are between January 2017 to December 2022, which likely did not reflect the development of advanced CA and large language models like ChatGPT and GPT4 (Sanderson, 2023).We couldn't include more recent publications to meet the EMNLP submission date.Nonetheless, we have included relevant comments across the different sections on the applicability of more sophisticated models.
Further, search engines (e.g.Google Scholar) are not deterministic.Our search keywords, filters, and chosen databases do not guarantee the exact same search results.However, we have tested multiple times on database searching and they returned consistent results.We have downloaded PDFs of all the papers and have saved the annotated them to reflect the different steps used in this review paper.These annotations will be made public.
For some databases, the number of papers in the final list may be (surprisingly!) small to represent the general research trends in the respective domains.However, it also indicates the lack of focus on mental health CA from these domains, which also proposes further attention is required in the field.

Ethics Statement
Mental Health CAs, despite their accessibility, potential ability, and anonymity, cannot replace human therapists in providing mental health care.There are a lot of ongoing discussions about the range of availability of mental health CAs, and many raise several challenges and suspicions about automated conversations.Rule-based and retrievalbased models can be controlled for content generation, but cannot answer out-of-domain questions.Generative models are still a developing field, their non-deterministic nature raises concerns about the safety and reliability of the content.Thus at the current stage, CA could play a great supporting complementary role in mental healthcare to identify individuals who potentially need more immediate care in an already burdened healthcare system.
Since the patient's personal information and medical status are extremely sensitive, we highly encourage researchers and developers to pay extra attention to data security and ethics Arias et al. (2022).The development, validation, and deployment of mental health CAs should involve multiple diverse stakeholders to determine how, when, and which data is being used to train and infer participants' mental health.This effort requires a multidisciplinary effort to address the complex challenges of mental health care (Chancellor et al., 2019).

Table 1 :
Steps in the screening process and the number of papers retained in each database.

Table 2 :
8Distribution of predominant language of the data and/or participants recruited in mental health CA papers.Other languages include Bangla, Danish, Dutch, Japanese, Kazakh, Norwegian, Spanish, and Swedish.

Table 3 :
Distribution of mental health category in mental health CA papers.A paper could have multiple focused targets.Other categories include affective disorder, COVID-19, eating disorders, PTSD, substance use disorder, etc.

Table 4 :
Distribution of demographics focused by mental health CA papers.A paper could have multiple focused target demographic groups.Other includes black American, the military community, and employee.

Table 6 :
(Fitzpatrick et al., 2017)iate response from the database based on the user's input.Generative model-based CAs are mostly developed using deep learning techniques such as recurrent neural networks (RNNs) or transformers, which learn from large amounts of text data and generate responses based on the learned patterns and struc-Distribution of outsourced models used for building models used in mental health CA papers.Other includes Manychat 9 , Woebot(Fitzpatrick et al., 2017)

Table 8 :
All method/experiment papers in the final list of this survey.This table only shows general and appearance features.

Table 8 :
All method/experiment papers in the final list of this survey.This table only shows general and appearance features.

Table 9 :
All method/experiment papers in the final list of this survey.This table only shows technique features.Long values are truncated due to limited space.

Table 10 :
All method/experiment papers in the final list of this survey.This table only shows experiment features.Long values are truncated due to limited space.