Cross-Lingual Summarization with Pseudo-Label Regularization

Thang Le


Abstract
Cross-Lingual Summarization (XLS) aims to summarize a document in the source language into a condensed version in the target language, effectively removing language barriers for non-native readers. Previous approaches, however, have the same limitation that only a single reference (gold summary) is exploited during model training, making the base model exposed to an underrepresented hypothesis space since the actual number of possible hypotheses is exponentially large. To alleviate this problem, we present a study adopting pseudo-labels in regularizing standard cross-lingual summarization training. We investigate several components leading to the gains in regularization training with verified experiments involving 8 diverse languages from different families. Conclusively, we show that pseudo-labeling is a simple and effective approach that significantly improves over standard gold reference training in XLS.
Anthology ID:
2024.findings-naacl.289
Volume:
Findings of the Association for Computational Linguistics: NAACL 2024
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Kevin Duh, Helena Gomez, Steven Bethard
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4644–4677
Language:
URL:
https://aclanthology.org/2024.findings-naacl.289
DOI:
10.18653/v1/2024.findings-naacl.289
Bibkey:
Cite (ACL):
Thang Le. 2024. Cross-Lingual Summarization with Pseudo-Label Regularization. In Findings of the Association for Computational Linguistics: NAACL 2024, pages 4644–4677, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
Cross-Lingual Summarization with Pseudo-Label Regularization (Le, Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-naacl.289.pdf