Named Entity Recognition (NER) is an essential component of many Natural Language Processing pipelines. However, building these language dependent models requires large amounts of annotated data. Crowdsourcing emerged as a scalable solution to collect and enrich data in a more time-efficient manner. To manage these annotations at scale, it is important to predict completion timelines and compute fair pricing for workers in advance. To achieve these goals, we need to know how much effort will be taken to complete each task. In this paper, we investigate which variables influence the time spent on a named entity annotation task by a human. Our results are two-fold: first, the understanding of the effort-impacting factors which we divided into cognitive load and input length; and second, the performance of the prediction itself. On the latter, through model adaptation and feature engineering, we attained a Root Mean Squared Error (RMSE) of 25.68 words per minute with a Nearest Neighbors model.
A Silent Speech Interface (SSI) allows for speech communication to take place in the absence of an acoustic signal. This type of interface is an alternative to conventional Automatic Speech Recognition which is not adequate for users with some speech impairments or in the presence of environmental noise. The work presented here produces the conditions to explore and analyze complex combinations of input modalities applicable in SSI research. By exploring non-invasive and promising modalities, we have selected the following sensing technologies used in human-computer interaction: Video and Depth input, Ultrasonic Doppler sensing and Surface Electromyography. This paper describes a novel data collection methodology where these independent streams of information are synchronously acquired with the aim of supporting research and development of a multimodal SSI. The reported recordings were divided into two rounds: a first one where the acquired data was silently uttered and a second round where speakers pronounced the scripted prompts in an audible and normal tone. In the first round of recordings, a total of 53.94 minutes were captured where 30.25% was estimated to be silent speech. In the second round of recordings, a total of 30.45 minutes were obtained and 30.05% of the recordings were audible speech.