Evaluation of a simultaneous interpretation system and analysis of speech log for user experience assessment
Akiko Sakamoto | Kazuhiko Abe | Kazuo Sumita | Satoshi Kamatani
Proceedings of the 10th International Workshop on Spoken Language Translation: Papers
This paper focuses on the user experience (UX) of a simultaneous interpretation system for face-to-face conversation between two users. To assess the UX of the system, we first made a transcript of the speech of users recorded during a task-based evaluation experiment and then analyzed user speech from the viewpoint of UX. In a task-based evaluation experiment, 44 tasks out of 45 tasks were solved. The solved task ratio was 97.8%. This indicates that the system can effectively provide interpretation to enable users to solve tasks. However, we found that users repeated speech due to errors in automatic speech recognition (ASR) or machine translation (MT). Users repeated clauses 1.8 times on average. Users seemed to repeat themselves until they received a response from their partner users. In addition, we found that after approximately 3.6 repetitions, users would change their words to avoid errors in ASR or MT and to evoke a response from their partner users.
The NICT ASR system for IWSLT2011
Kazuhiko Abe | Youzheng Wu | Chien-lin Huang | Paul R. Dixon | Shigeki Matsuda | Chiori Hori | Hideki Kashioka
Proceedings of the 8th International Workshop on Spoken Language Translation: Evaluation Campaign
In this paper, we describe NICT’s participation in the IWSLT 2011 evaluation campaign for the ASR Track. To recognize spontaneous speech, we prepared an acoustic model trained by more spontaneous speech corpora and a language model constructed with text corpora distributed by the organizer. We built the multi-pass ASR system by adapting the acoustic and language models with previous ASR results. The target speech was selected from talks on the TED (Technology, Entertainment, Design) program. Here, a large reduction in word error rate was obtained by the speaker adaptation of the acoustic model with MLLR. Additional improvement was achieved not only by adaptation of the language model but also by parallel usage of the baseline and speaker-dependent acoustic models. Accordingly, the final WER was reduced by 30% from the baseline ASR for the distributed test set.
- Youzheng Wu 1
- Chien-Lin Huang 1
- Paul Dixon 1
- Shigeki Matsuda 1
- Chiori Hori 1
- show all...