Entity Embellishment Mitigation in LLMs Output with Noisy Synthetic Dataset for Alignment

Svitlana Galeshchuk


Abstract
The present work focuses on the entity embellishments when named entities are accompanied by additional information that is not supported by the context or the source material. Our paper contributes into mitigating this problem in large language model’s generated texts, summaries in particular, by proposing the approach with synthetic noise injection in the generated samples that are further used for alignment of finetuned LLM. We also challenge the issue of solutions scarcity for low-resourced languages and test our approach with corpora in Ukrainian.
Anthology ID:
2024.unlp-1.15
Volume:
Proceedings of the Third Ukrainian Natural Language Processing Workshop (UNLP) @ LREC-COLING 2024
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Mariana Romanyshyn, Nataliia Romanyshyn, Andrii Hlybovets, Oleksii Ignatenko
Venue:
UNLP
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
129–134
Language:
URL:
https://aclanthology.org/2024.unlp-1.15
DOI:
Bibkey:
Cite (ACL):
Svitlana Galeshchuk. 2024. Entity Embellishment Mitigation in LLMs Output with Noisy Synthetic Dataset for Alignment. In Proceedings of the Third Ukrainian Natural Language Processing Workshop (UNLP) @ LREC-COLING 2024, pages 129–134, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Entity Embellishment Mitigation in LLMs Output with Noisy Synthetic Dataset for Alignment (Galeshchuk, UNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.unlp-1.15.pdf