Looks can be Deceptive: Distinguishing Repetition Disfluency from Reduplication

Arif A. Ahmad, Khyathi Gayathri Mothika, Pushpak Bhattacharyya


Abstract
Reduplication and repetition, though similar in form, serve distinct linguistic purposes. Reduplication is a deliberate morphological process used to express grammatical, semantic, or pragmatic nuances, while repetition is often unintentional and indicative of disfluency. This paper presents the first large-scale study of reduplication and repetition in speech using computational linguistics. We introduce IndicRedRep, a new publicly available dataset containing Hindi, Telugu, and Marathi text annotated with reduplication and repetition at the word level. We evaluate transformer-based models for multi-class reduplication and repetition token classification, utilizing the Reparandum-Interregnum-Repair structure to distinguish between the two phenomena. Our models achieve macro F1 scores of up to 85.62% in Hindi, 83.95% in Telugu, and 84.82% in Marathi for reduplication-repetition classification.
Anthology ID:
2025.coling-main.15
Volume:
Proceedings of the 31st International Conference on Computational Linguistics
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
214–229
Language:
URL:
https://aclanthology.org/2025.coling-main.15/
DOI:
Bibkey:
Cite (ACL):
Arif A. Ahmad, Khyathi Gayathri Mothika, and Pushpak Bhattacharyya. 2025. Looks can be Deceptive: Distinguishing Repetition Disfluency from Reduplication. In Proceedings of the 31st International Conference on Computational Linguistics, pages 214–229, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
Looks can be Deceptive: Distinguishing Repetition Disfluency from Reduplication (Ahmad et al., COLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.coling-main.15.pdf