An Investigation into the Effect of Control Tokens on Text Simplification

Zihao Li, Matthew Shardlow, Saeed Hassan


Abstract
Recent work on text simplification has focused on the use of control tokens to further the state of the art. However, it is not easy to further improve without an in-depth comprehension of the mechanisms underlying control tokens. One unexplored factor is the tokenisation strategy, which we also explore. In this paper, we (1) reimplemented ACCESS, (2) explored the effects of varying control tokens, (3) tested the influences of different tokenisation strategies, and (4) demonstrated how separate control tokens affect performance. We show variations of performance in the four control tokens separately. We also uncover how the design of control tokens could influence the performance and propose some suggestions for designing control tokens, which also reaches into other controllable text generation tasks.
Anthology ID:
2022.tsar-1.14
Volume:
Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates (Virtual)
Editors:
Sanja Štajner, Horacio Saggion, Daniel Ferrés, Matthew Shardlow, Kim Cheng Sheang, Kai North, Marcos Zampieri, Wei Xu
Venue:
TSAR
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
154–165
Language:
URL:
https://aclanthology.org/2022.tsar-1.14
DOI:
10.18653/v1/2022.tsar-1.14
Bibkey:
Cite (ACL):
Zihao Li, Matthew Shardlow, and Saeed Hassan. 2022. An Investigation into the Effect of Control Tokens on Text Simplification. In Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022), pages 154–165, Abu Dhabi, United Arab Emirates (Virtual). Association for Computational Linguistics.
Cite (Informal):
An Investigation into the Effect of Control Tokens on Text Simplification (Li et al., TSAR 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.tsar-1.14.pdf
Video:
 https://aclanthology.org/2022.tsar-1.14.mp4