Greta Tuckute
2023
WhisBERT: Multimodal Text-Audio Language Modeling on 100M Words
Lukas Wolf
|
Klemen Kotar
|
Greta Tuckute
|
Eghbal Hosseini
|
Tamar I. Regev
|
Ethan Gotlieb Wilcox
|
Alexander Scott Warstadt
Proceedings of the BabyLM Challenge at the 27th Conference on Computational Natural Language Learning
2022
SentSpace: Large-Scale Benchmarking and Evaluation of Text using Cognitively Motivated Lexical, Syntactic, and Semantic Features
Greta Tuckute
|
Aalok Sathe
|
Mingye Wang
|
Harley Yoder
|
Cory Shain
|
Evelina Fedorenko
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: System Demonstrations
SentSpace is a modular framework for streamlined evaluation of text. SentSpacecharacterizes textual input using diverse lexical, syntactic, and semantic features derivedfrom corpora and psycholinguistic experiments. Core sentence features fall into three primaryfeature spaces: 1) Lexical, 2) Contextual, and 3) Embeddings. To aid in the analysis of computed features, SentSpace provides a web interface for interactive visualization and comparison with text from large corpora. The modular design of SentSpace allows researchersto easily integrate their own feature computation into the pipeline while benefiting from acommon framework for evaluation and visualization. In this manuscript we will describe thedesign of SentSpace, its core feature spaces, and demonstrate an example use case by comparing human-written and machine-generated (GPT2-XL) sentences to each other. We findthat while GPT2-XL-generated text appears fluent at the surface level, psycholinguistic normsand measures of syntactic processing reveal key differences between text produced by humansand machines. Thus, SentSpace provides a broad set of cognitively motivated linguisticfeatures for evaluation of text within natural language processing, cognitive science, as wellas the social sciences.
Search
Co-authors
- Lukas Wolf 1
- Klemen Kotar 1
- Eghbal Hosseini 1
- Tamar I. Regev 1
- Ethan Gotlieb Wilcox 1
- show all...