Text-2-Wiki: Summarization and Template-driven Article Generation

Panwar Jayant, Mamidi Radhika


Abstract
Users on Wikipedia collaborate in a structured and organized manner to publish and update articles on numerous topics, which makes Wikipedia a very rich source of knowledge. English Wikipedia has the most amount of information available (more than 6.7 million articles); however, there are few good informative articles on Wikipedia in Indian languages. Hindi Wikipedia has approximately only 160k articles. The same article in Hindi can be vastly different from its English version and generally contains less information. This poses a problem for native Indian language speakers who are not proficient in English. Therefore, having the same amount of information in Indian Languages will help promote knowledge among those who are not well-versed in English. Publishing the articles manually, like the usual process in Global English Wikipedia, is a timeconsuming process. To get the amount of information in native Indian languages up-to-speed with the amount of information in English, automating the whole article generation process is the best option. In this study, we present a stage-wise approach ranging from Data Collection to Summarization and Translation, and finally ending with Template Creation. This approach ensures the efficient generation of a large amount of content in Hindi Wikipedia in less time. With the help of this study, we were able to successfully generate more than a thousand articles in Hindi Wikipedia with ease.
Anthology ID:
2023.icon-1.51
Volume:
Proceedings of the 20th International Conference on Natural Language Processing (ICON)
Month:
December
Year:
2023
Address:
Goa University, Goa, India
Editors:
D. Pawar Jyoti, Lalitha Devi Sobha
Venue:
ICON
SIG:
SIGLEX
Publisher:
NLP Association of India (NLPAI)
Note:
Pages:
551–556
Language:
URL:
https://aclanthology.org/2023.icon-1.51
DOI:
Bibkey:
Cite (ACL):
Panwar Jayant and Mamidi Radhika. 2023. Text-2-Wiki: Summarization and Template-driven Article Generation. In Proceedings of the 20th International Conference on Natural Language Processing (ICON), pages 551–556, Goa University, Goa, India. NLP Association of India (NLPAI).
Cite (Informal):
Text-2-Wiki: Summarization and Template-driven Article Generation (Jayant & Radhika, ICON 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.icon-1.51.pdf