ImplicaTR: A Granular Dataset for Natural Language Inference and Pragmatic Reasoning in Turkish

Mustafa Halat; Ümit Atlamaz

ImplicaTR: A Granular Dataset for Natural Language Inference and Pragmatic Reasoning in Turkish

Abstract

We introduce ImplicaTR, a linguistically informed diagnostic dataset designed to evaluate semantic and pragmatic reasoning capabilities of Natural Language Inference (NLI) models in Turkish. Existing Turkish NLI datasets treat NLI as determining whether a sentence pair represents entailment, contradiction, or a neutral relation. Such datasets do not distinguish between semantic entailment and pragmatic implicature, which linguists have long recognized as separate inferences types. ImplicaTR addresses this by testing NLI models’ ability to differentiate between entailment and implicature, thus assessing their pragmatic reasoning skills. The dataset consists of 19,350 semi-automatically generated sentence pairs covering implicature, entailment, contradiction, and neutral relations. We evaluated various models (BERT, Gemma, Llama-2, and Mistral) on ImplicaTR and found out that these models can reach up to 98% accuracy on semantic and pragmatic reasoning. We also fine tuned various models on subsets of ImplicaTR to test the abilities of NLI models to generalize across unseen implicature contexts. Our results indicate that model performance is highly dependent on the diversity of linguistic expressions within each subset, highlighting a weakness in the abstract generalization capabilities of large language models regarding pragmatic reasoning. We share all the code, models, and the dataset.

Anthology ID:: 2024.sigturk-1.3
Volume:: Proceedings of the First Workshop on Natural Language Processing for Turkic Languages (SIGTURK 2024)
Month:: August
Year:: 2024
Address:: Bangkok, Thailand and Online
Editors:: Duygu Ataman, Mehmet Oguz Derin, Sardana Ivanova, Abdullatif Köksal, Jonne Sälevä, Deniz Zeyrek
Venues:: SIGTURK | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 29–41
Language:
URL:: https://aclanthology.org/2024.sigturk-1.3/
DOI:
Bibkey:
Cite (ACL):: Mustafa Halat and Ümit Atlamaz. 2024. ImplicaTR: A Granular Dataset for Natural Language Inference and Pragmatic Reasoning in Turkish. In Proceedings of the First Workshop on Natural Language Processing for Turkic Languages (SIGTURK 2024), pages 29–41, Bangkok, Thailand and Online. Association for Computational Linguistics.
Cite (Informal):: ImplicaTR: A Granular Dataset for Natural Language Inference and Pragmatic Reasoning in Turkish (Halat & Atlamaz, SIGTURK 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.sigturk-1.3.pdf

PDF Cite Search Fix data