Automatic Data Retrieval for Cross Lingual Summarization

Bhatnagar Nikhilesh, Urlana Ashok, Mishra Pruthwik, Mujadia Vandan, M. Sharma Dipti


Abstract
Cross-lingual summarization involves the sum marization of text written in one language to a different one. There is a body of research addressing cross-lingual summarization from English to other European languages. In this work, we aim to perform cross-lingual summarization from English to Hindi. We propose pairing up the coverage of newsworthy events in textual and video format can prove to be helpful for data acquisition for cross lingual summarization. We analyze the data and propose methods to match articles to video descriptions that serve as document and summary pairs. We also outline filtering methods over reasonable thresholds to ensure the correctness of the summaries. Further, we make available 28,583 mono and cross-lingual article-summary pairs* . We also build and analyze multiple baselines on the collected data and report error analysis.
Anthology ID:
2023.icon-1.85
Volume:
Proceedings of the 20th International Conference on Natural Language Processing (ICON)
Month:
December
Year:
2023
Address:
Goa University, Goa, India
Editors:
D. Pawar Jyoti, Lalitha Devi Sobha
Venue:
ICON
SIG:
SIGLEX
Publisher:
NLP Association of India (NLPAI)
Note:
Pages:
822–827
Language:
URL:
https://aclanthology.org/2023.icon-1.85
DOI:
Bibkey:
Cite (ACL):
Bhatnagar Nikhilesh, Urlana Ashok, Mishra Pruthwik, Mujadia Vandan, and M. Sharma Dipti. 2023. Automatic Data Retrieval for Cross Lingual Summarization. In Proceedings of the 20th International Conference on Natural Language Processing (ICON), pages 822–827, Goa University, Goa, India. NLP Association of India (NLPAI).
Cite (Informal):
Automatic Data Retrieval for Cross Lingual Summarization (Nikhilesh et al., ICON 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.icon-1.85.pdf