Multi-Document Summarization of Persian Text using Paragraph Vectors

Morteza Rohanian

doi:10.26615/issn.1314-9156.2017_005

Multi-Document Summarization of Persian Text using Paragraph Vectors

Abstract

A multi-document summarizer finds the key topics from multiple textual sources and organizes information around them. In this paper we propose a summarization method for Persian text using paragraph vectors that can represent textual units of arbitrary lengths. We use these vectors to calculate the semantic relatedness between documents, cluster them to a number of predetermined groups, weight them based on their distance to the centroids and the intra-cluster homogeneity and take out the key paragraphs. We compare the final summaries with the gold-standard summaries of 21 digital topics using the ROUGE evaluation metric. Experimental results show the advantages of using paragraph vectors over earlier attempts at developing similar methods for a low resource language like Persian.

Anthology ID:: R17-2005
Volume:: Proceedings of the Student Research Workshop Associated with RANLP 2017
Month:: September
Year:: 2017
Address:: Varna
Editors:: Venelin Kovatchev, Irina Temnikova, Pepa Gencheva, Yasen Kiprov, Ivelina Nikolova
Venue:: RANLP
SIG:
Publisher:: INCOMA Ltd.
Note:
Pages:: 35–40
Language:
URL:: https://doi.org/10.26615/issn.1314-9156.2017_005
DOI:: 10.26615/issn.1314-9156.2017_005
Bibkey:
Cite (ACL):: Morteza Rohanian. 2017. Multi-Document Summarization of Persian Text using Paragraph Vectors. In Proceedings of the Student Research Workshop Associated with RANLP 2017, pages 35–40, Varna. INCOMA Ltd..
Cite (Informal):: Multi-Document Summarization of Persian Text using Paragraph Vectors (Rohanian, RANLP 2017)
Copy Citation:
PDF:: https://doi.org/10.26615/issn.1314-9156.2017_005

PDF Cite Search Fix data