Zihao Li


2023

pdf bib
Comparing Generic and Expert Models for Genre-Specific Text Simplification
Zihao Li | Matthew Shardlow | Fernando Alva-Manchego
Proceedings of the Second Workshop on Text Simplification, Accessibility and Readability

We investigate how text genre influences the performance of models for controlled text simplification. Regarding datasets from Wikipedia and PubMed as two different genres, we compare the performance of genre-specific models trained by transfer learning and prompt-only GPT-like large language models. Our experiments showed that: (1) the performance loss of genre-specific models on general tasks can be limited to 2%, (2) transfer learning can improve performance on genre-specific datasets up to 10% in SARI score from the base model without transfer learning, (3) simplifications generated by the smaller but more customized models show similar performance in simplicity and a better meaning reservation capability to the larger generic models in both automatic and human evaluations.

2022

pdf bib
An Investigation into the Effect of Control Tokens on Text Simplification
Zihao Li | Matthew Shardlow | Saeed Hassan
Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)

Recent work on text simplification has focused on the use of control tokens to further the state of the art. However, it is not easy to further improve without an in-depth comprehension of the mechanisms underlying control tokens. One unexplored factor is the tokenisation strategy, which we also explore. In this paper, we (1) reimplemented ACCESS, (2) explored the effects of varying control tokens, (3) tested the influences of different tokenisation strategies, and (4) demonstrated how separate control tokens affect performance. We show variations of performance in the four control tokens separately. We also uncover how the design of control tokens could influence the performance and propose some suggestions for designing control tokens, which also reaches into other controllable text generation tasks.