Natalie Hervieux
2024
Language Resources From Prominent Born-Digital Humanities Texts are Still Needed in the Age of LLMs
Natalie Hervieux
|
Peiran Yao
|
Susan Brown
|
Denilson Barbosa
Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities
The digital humanities (DH) community fundamentally embraces the use of computerized tools for the study and creation of knowledge related to language, history, culture, and human values, in which natural language plays a prominent role. Many successful DH tools rely heavily on Natural Language Processing methods, and several efforts exist within the DH community to promote the use of newer and better tools. Nevertheless, most NLP research is driven by web corpora that are noticeably different from texts commonly found in DH artifacts, which tend to use richer language and refer to rarer entities. Thus, the near-human performance achieved by state-of-the-art NLP tools on web texts might not be achievable on DH texts. We introduce a dataset carefully created by computer scientists and digital humanists intended to serve as a reference point for the development and evaluation of NLP tools. The dataset is a subset of a born-digital textbase resulting from a prominent and ongoing experiment in digital literary history, containing thousands of multi-sentence excerpts that are suited for information extraction tasks. We fully describe the dataset and show that its language is demonstrably different than the corpora normally used in training language resources in the NLP community.
2023
NLP Workbench: Efficient and Extensible Integration of State-of-the-art Text Mining Tools
Peiran Yao
|
Matej Kosmajac
|
Abeer Waheed
|
Kostyantyn Guzhva
|
Natalie Hervieux
|
Denilson Barbosa
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations
NLP Workbench is a web-based platform for text mining that allows non-expert users to obtain semantic understanding of large-scale corpora using state-of-the-art text mining models. The platform is built upon latest pre-trained models and open source systems from academia that provide semantic analysis functionalities, including but not limited to entity linking, sentiment analysis, semantic parsing, and relation extraction. Its extensible design enables researchers and developers to smoothly replace an existing model or integrate a new one. To improve efficiency, we employ a microservice architecture that facilitates allocation of acceleration hardware and parallelization of computation. This paper presents the architecture of NLP Workbench and discusses the challenges we faced in designing it. We also discuss diverse use cases of NLP Work- bench and the benefits of using it over other approaches. The platform is under active devel- opment, with its source code released under the MIT license. A website and a short video demonstrating our platform are also available.
Search
Co-authors
- Peiran Yao 2
- Denilson Barbosa 2
- Matej Kosmajac 1
- Abeer Waheed 1
- Kostyantyn Guzhva 1
- show all...