Detecting Text Formality: A Study of Text Classification Approaches

Daryna Dementieva, Nikolay Babakov, Alexander Panchenko


Abstract
Formality is one of the important characteristics of text documents. The automatic detection of the formality level of a text is potentially beneficial for various natural language processing tasks. Before, two large-scale datasets were introduced for multiple languages featuring formality annotation—GYAFC and X-FORMAL. However, they were primarily used for the training of style transfer models. At the same time, the detection of text formality on its own may also be a useful application. This work proposes the first to our knowledge systematic study of formality detection methods based on statistical, neural-based, and Transformer-based machine learning methods and delivers the best-performing models for public usage. We conducted three types of experiments – monolingual, multilingual, and cross-lingual. The study shows the overcome of Char BiLSTM model over Transformer-based ones for the monolingual and multilingual formality classification task, while Transformer-based classifiers are more stable to cross-lingual knowledge transfer.
Anthology ID:
2023.ranlp-1.31
Volume:
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing
Month:
September
Year:
2023
Address:
Varna, Bulgaria
Editors:
Ruslan Mitkov, Galia Angelova
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
274–284
Language:
URL:
https://aclanthology.org/2023.ranlp-1.31
DOI:
Bibkey:
Cite (ACL):
Daryna Dementieva, Nikolay Babakov, and Alexander Panchenko. 2023. Detecting Text Formality: A Study of Text Classification Approaches. In Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing, pages 274–284, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
Detecting Text Formality: A Study of Text Classification Approaches (Dementieva et al., RANLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.ranlp-1.31.pdf