Same Author or Just Same Topic? Towards Content-Independent Style Representations

Anna Wegmann; Marijn Schraagen; Dong Nguyen

doi:10.18653/v1/2022.repl4nlp-1.26

Same Author or Just Same Topic? Towards Content-Independent Style Representations

Anna Wegmann, Marijn Schraagen, Dong Nguyen

Abstract

Linguistic style is an integral component of language. Recent advances in the development of style representations have increasingly used training objectives from authorship verification (AV)”:” Do two texts have the same author? The assumption underlying the AV training task (same author approximates same writing style) enables self-supervised and, thus, extensive training. However, a good performance on the AV task does not ensure good “general-purpose” style representations. For example, as the same author might typically write about certain topics, representations trained on AV might also encode content information instead of style alone. We introduce a variation of the AV training task that controls for content using conversation or domain labels. We evaluate whether known style dimensions are represented and preferred over content information through an original variation to the recently proposed STEL framework. We find that representations trained by controlling for conversation are better than representations trained with domain or no content control at representing style independent from content.

Anthology ID:: 2022.repl4nlp-1.26
Volume:: Proceedings of the 7th Workshop on Representation Learning for NLP
Month:: May
Year:: 2022
Address:: Dublin, Ireland
Editors:: Spandana Gella, He He, Bodhisattwa Prasad Majumder, Burcu Can, Eleonora Giunchiglia, Samuel Cahyawijaya, Sewon Min, Maximilian Mozes, Xiang Lorraine Li, Isabelle Augenstein, Anna Rogers, Kyunghyun Cho, Edward Grefenstette, Laura Rimell, Chris Dyer
Venue:: RepL4NLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 249–268
Language:
URL:: https://aclanthology.org/2022.repl4nlp-1.26
DOI:: 10.18653/v1/2022.repl4nlp-1.26
Bibkey:
Cite (ACL):: Anna Wegmann, Marijn Schraagen, and Dong Nguyen. 2022. Same Author or Just Same Topic? Towards Content-Independent Style Representations. In Proceedings of the 7th Workshop on Representation Learning for NLP, pages 249–268, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):: Same Author or Just Same Topic? Towards Content-Independent Style Representations (Wegmann et al., RepL4NLP 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.repl4nlp-1.26.pdf
Video:: https://aclanthology.org/2022.repl4nlp-1.26.mp4

PDF Cite Search Video