RKorAPClient: An R Package for Accessing the German Reference Corpus DeReKo via KorAP

Marc Kupietz, Nils Diewald, Eliza Margaretha


Abstract
Making corpora accessible and usable for linguistic research is a huge challenge in view of (too) big data, legal issues and a rapidly evolving methodology. This does not only affect the design of user-friendly graphical interfaces to corpus analysis tools, but also the availability of programming interfaces supporting access to the functionality of these tools from various analysis and development environments. RKorAPClient is a new research tool in the form of an R package that interacts with the Web API of the corpus analysis platform KorAP, which provides access to large annotated corpora, including the German reference corpus DeReKo with 45 billion tokens.In addition to optionally authenticated KorAP API access, RKorAPClient provides further processing and visualization features to simplify common corpus analysis tasks. This paper introduces the basic functionality of RKorAPClient and exemplifies various analysis tasks based on DeReKo, that are bundled within the R package and can serve as a basic framework for advanced analysis and visualization approaches.
Anthology ID:
2020.lrec-1.867
Volume:
Proceedings of the 12th Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
7015–7021
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.867
DOI:
Bibkey:
Cite (ACL):
Marc Kupietz, Nils Diewald, and Eliza Margaretha. 2020. RKorAPClient: An R Package for Accessing the German Reference Corpus DeReKo via KorAP. In Proceedings of the 12th Language Resources and Evaluation Conference, pages 7015–7021, Marseille, France. European Language Resources Association.
Cite (Informal):
RKorAPClient: An R Package for Accessing the German Reference Corpus DeReKo via KorAP (Kupietz et al., LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.867.pdf