SoMaJo: State-of-the-art tokenization for German web and social media texts Thomas Proisl author Peter Uhrig author 2016-08 text Proceedings of the 10th Web as Corpus Workshop Paul Cook editor Stefan Evert editor Roland Schäfer editor Egon Stemle editor Association for Computational Linguistics Berlin conference publication proisl-uhrig-2016-somajo 10.18653/v1/W16-2607 https://aclanthology.org/W16-2607/ 2016-08 57 62