MoNoise: A Multi-lingual and Easy-to-use Lexical Normalization Tool

Rob van der Goot


Abstract
In this paper, we introduce and demonstrate the online demo as well as the command line interface of a lexical normalization system (MoNoise) for a variety of languages. We further improve this model by using features from the original word for every normalization candidate. For comparison with future work, we propose the bundling of seven datasets in six languages to form a new benchmark, together with a novel evaluation metric which is particularly suitable for cross-dataset comparisons. MoNoise reaches a new state-of-art performance for six out of seven of these datasets. Furthermore, we allow the user to tune the ‘aggressiveness’ of the normalization, and show how the model can be made more efficient with only a small loss in performance. The online demo can be found on: http://www.robvandergoot.com/monoise and the corresponding code on: https://bitbucket.org/robvanderg/monoise/
Anthology ID:
P19-3032
Volume:
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations
Month:
July
Year:
2019
Address:
Florence, Italy
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
201–206
Language:
URL:
https://aclanthology.org/P19-3032
DOI:
10.18653/v1/P19-3032
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/P19-3032.pdf