The Kairntech Sherpa – An ML Platform and API for the Enrichment of (not only) Scientific Content

Stefan Geißler


Abstract
We present an software platform and API that combines various ML and NLP approaches for the analysis and enrichment of textual content. The platform’s design and implementation is guided by the goal to allow non-technical users to conduct their own experiments and training runs on their respective data, allowing to test, tune and deploy analysis models for production. Dedicated specific packages for subtasks such as document structure processing, document categorization, annotation with existing thesauri, disambiguation and linking, annotation with newly created entity recognizers and summarization – available as open source components in isolation – are combined into an end-user-facing, collaborative, scalable platform to support large-scale industrial document analysis document analysis. We see the Sherpa’s setup as an answer to the observation that ML has reached a level of maturity that allows to attain useful results in many analysis scenarios today, but that in-depth technical competencies in the required fields of NLP and AI is often scarce; a setup that focusses on non-technical domain-expert end-users can help to bring required analysis functionalities closer to the day-to-day reality in business contexts.
Anthology ID:
2020.iwltp-1.9
Volume:
Proceedings of the 1st International Workshop on Language Technology Platforms
Month:
May
Year:
2020
Address:
Marseille, France
Venue:
IWLTP
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
54–58
Language:
English
URL:
https://aclanthology.org/2020.iwltp-1.9
DOI:
Bibkey:
Cite (ACL):
Stefan Geißler. 2020. The Kairntech Sherpa – An ML Platform and API for the Enrichment of (not only) Scientific Content. In Proceedings of the 1st International Workshop on Language Technology Platforms, pages 54–58, Marseille, France. European Language Resources Association.
Cite (Informal):
The Kairntech Sherpa – An ML Platform and API for the Enrichment of (not only) Scientific Content (Geißler, IWLTP 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.iwltp-1.9.pdf