Anna Palatkina
2025
KoWit-24: A Richly Annotated Dataset of Wordplay in News Headlines
Alexander Baranov
|
Anna Palatkina
|
Yulia Makovka
|
Pavel Braslavski
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era
We present KoWit-24, a dataset with fine-grained annotation of wordplay in 2,700 Russian news headlines. KoWit-24 annotations include the presence of wordplay, its type, wordplay anchors, and words/phrases the wordplay refers to. Unlike the majority of existing humor collections of canned jokes, KoWit-24 provides wordplay contexts – each headline is accompanied by the news lead and summary. The most common type of wordplay in the dataset is the transformation of collocations, idioms, and named entities – the mechanism that has been underrepresented in previous humor datasets. Our experiments with five LLMs show that there is ample room for improvement in wordplay detection and interpretation tasks. The dataset and evaluation scripts are available at https://github.com/Humor-Research/KoWit-24
2023
NorBench – A Benchmark for Norwegian Language Models
David Samuel
|
Andrey Kutuzov
|
Samia Touileb
|
Erik Velldal
|
Lilja Øvrelid
|
Egil Rønningstad
|
Elina Sigdel
|
Anna Palatkina
Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
We present NorBench: a streamlined suite of NLP tasks and probes for evaluating Norwegian language models (LMs) on standardized data splits and evaluation metrics. We also introduce a range of new Norwegian language models (both encoder and encoder-decoder based). Finally, we compare and analyze their performance, along with other existing LMs, across the different benchmark tests of NorBench.
Search
Fix author
Co-authors
- Alexander Baranov 1
- Pavel Braslavski 1
- Andrey Kutuzov 1
- Yulia Makovka 1
- Egil Rønningstad 1
- show all...