We present MozoLM, an open-source language model microservice package intended for use in AAC text-entry applications, with a particular focus on the design principles of the library. The intent of the library is to allow the ensembling of multiple diverse language models without requiring the clients (user interface designers, system users or speech-language pathologists) to attend to the formats of the models. Issues around privacy, security, dynamic versus static models, and methods of model combination are explored and specific design choices motivated. Some simulation experiments demonstrating the benefits of personalized language model ensembling via the library are presented.
Many people with severely limited muscle control can only communicate through augmentative and alternative communication (AAC) systems with a small number of buttons. In this paper, we present the design for ColorCode, which is an AAC system with two buttons that uses Bayesian inference to determine what the user wishes to communicate. Our information-theoretic analysis of ColorCode simulations shows that it is efficient in extracting information from the user, even in the presence of errors, achieving nearly optimal error correction. ColorCode is provided as open source software.
Robitaille (2010) wrote ‘if all technology companies have accessibility in their mind then people with disabilities won’t be left behind.’ Current technology has come a long way from where it stood decades ago; however, researchers and manufacturers often do not include people with disabilities in the design process and tend to accommodate them after the fact. In this paper we share feedback from four assistive technology users who rely on one or more assistive technology devices in their everyday lives. We believe end users should be part of the design process and that by bringing together experts and users, we can bridge the research/practice gap.
Conversations between a clinician and a patient, in natural conditions, are valuable sources of information for medical follow-up. The automatic analysis of these dialogues could help extract new language markers and speed up the clinicians’ reports. Yet, it is not clear which model is the most efficient to detect and identify the speaker turns, especially for individuals with speech disorders. Here, we proposed a split of the data that allows conducting a comparative evaluation of different diarization methods. We designed and trained end-to-end neural network architectures to directly tackle this task from the raw signal and evaluate each approach under the same metric. We also studied the effect of fine-tuning models to find the best performance. Experimental results are reported on naturalistic clinical conversations between Psychologists and Interviewees, at different stages of Huntington’s disease, displaying a large panel of speech disorders. We found out that our best end-to-end model achieved 19.5 % IER on the test set, compared to 23.6% achieved by the finetuning of the X-vector architecture. Finally, we observed that we could extract clinical markers directly from the automatic systems, highlighting the clinical relevance of our methods.
In this study we compare two approaches (neural machine translation and edit-based) and the use of synthetic data for the task of translating normalised Swiss German ASR output into correct written Standard German for subtitles, with a special focus on syntactic differences. Results suggest that NMT is better suited to this task and that relatively simple rule-based generation of training data could be a valuable approach for cases where little training data is available and transformations are simple.
Communication between physician and patients can lead to misunderstandings, especially for disabled people. An automatic system that translates natural language into a pictographic language is one of the solutions that could help to overcome this issue. In this preliminary study, we present the French version of a translation system using the Arasaac pictographs and we investigate the strategies used by speech therapists to translate into pictographs. We also evaluate the medical coverage of this tool for translating physician questions and patient instructions.
This paper outlines the ethical implications of text simplification within the framework of assistive systems. We argue that a distinction should be made between the technologies that perform text simplification and the realisation of these in assistive technologies. When using the latter as a motivation for research, it is important that the subsequent ethical implications be carefully considered. We provide guidelines for the framing of text simplification independently of assistive systems, as well as suggesting directions for future research and discussion based on the concerns raised.
Stereotypes are a positive or negative, generalized, and often widely shared belief about the attributes of certain groups of people, such as people with sensory disabilities. If stereotypes manifest in assistive technologies used by deaf or blind people, they can harm the user in a number of ways, especially considering the vulnerable nature of the target population. AI models underlying assistive technologies have been shown to contain biased stereotypes, including racial, gender, and disability biases. We build on this work to present a psychology-based stereotype assessment of the representation of disability, deafness, and blindness in BERT using the Stereotype Content Model. We show that BERT contains disability bias, and that this bias differs along established stereotype dimensions.
Conversational assistants are ubiquitous among the general population, however, these systems have not had an impact on people with disabilities, or speech and language disorders, for whom basic day-to-day communication and social interaction is a huge struggle. Language model technology can play a huge role in empowering these users and help them interact with others with less effort via interaction support. To enable this population, we build a system that can represent them in a social conversation and generate responses that can be controlled by the users using cues/keywords. We build models that can speed up this communication by suggesting relevant cues in the dialog response context. We also introduce a keyword-loss to lexically constrain the model response output. We present automatic and human evaluation of our cue/keyword predictor and the controllable dialog system to show that our models perform significantly better than models without control. Our evaluation and user study shows that keyword-control on end-to-end response generation models is powerful and can enable and empower users with degenerative disorders to carry out their day-to-day communication.
This paper describes three areas of assistive technology development which deploy the resources and speech technology for Irish (Gaelic), newly emerging from the ABAIR initiative. These include (i) a screenreading facility for visually impaired people, (ii) an application to help develop phonological awareness and early literacy for dyslexic people (iii) a speech-enabled AAC system for non-speaking people. Each of these is at a different stage of development and poses unique challenges: these are dis-cussed along with the approaches adopted to address them. Three guiding principles underlie development. Firstly, the sociolinguistic context and the needs of the community are essential considerations in setting priorities. Secondly, development needs to be language sensitive. The need for skilled researchers with a deep knowledge of Irish structure is illustrated in the case of (ii) and (iii), where aspects of Irish linguistic structure (phonological, morphological and grammatical) and the striking differences from English pose challenges for systems aimed at bilingual Irish-English users. Thirdly, and most importantly, the users and their support networks are central – not as passive recipients of ready-made technologies, but as active partners at every stage of development, from design to implementation, evaluation and dissemination.