Challenges of Applying Automatic Speech Recognition for Transcribing EU Parliament Committee Meetings: A Pilot Study

Hugo de Vos, Suzan Verberne


Abstract
Challenges of Applying Automatic Speech Recognition for Transcribing EUParliament Committee Meetings: A Pilot StudyHugo de Vos and Suzan VerberneInstitute of Public Administration and Leiden Institute of Advanced Computer Science, Leiden Universityh.p.de.vos@fgga.leidenuniv.nl, s.verberne@liacs.leidenuniv.nlAbstractWe tested the feasibility of automatically transcribing committee meetings of the European Union parliament with the use of AutomaticSpeech Recognition techniques. These committee meetings contain more valuable information for political science scholars than theplenary meetings since these meetings showcase actual debates opposed to the more formal plenary meetings. However, since there areno transcriptions of those meetings, they are a lot less accessible for research than the plenary meetings, of which multiple corpora exist. We explored a freely available ASR application and analysed the output in order to identify the weaknesses of an out-of-the box system. We followed up on those weaknesses by proposing directions for optimizing the ASR for our goals. We found that, despite showcasingacceptable results in terms of Word Error Rate, the model did not yet suffice for the purpose of generating a data set for use in PoliticalScience. The application was unable to successfully recognize domain specific terms and names. To overcome this issue, future researchwill be directed at using domain specific language models in combination with off-the-shelf acoustic models.
Anthology ID:
2020.parlaclarin-1.8
Volume:
Proceedings of the Second ParlaCLARIN Workshop
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Darja Fišer, Maria Eskevich, Franciska de Jong
Venue:
ParlaCLARIN
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
40–43
Language:
English
URL:
https://aclanthology.org/2020.parlaclarin-1.8
DOI:
Bibkey:
Cite (ACL):
Hugo de Vos and Suzan Verberne. 2020. Challenges of Applying Automatic Speech Recognition for Transcribing EU Parliament Committee Meetings: A Pilot Study. In Proceedings of the Second ParlaCLARIN Workshop, pages 40–43, Marseille, France. European Language Resources Association.
Cite (Informal):
Challenges of Applying Automatic Speech Recognition for Transcribing EU Parliament Committee Meetings: A Pilot Study (de Vos & Verberne, ParlaCLARIN 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.parlaclarin-1.8.pdf