Clause-based Discourse Segmentation of Arabic Texts

Iskandar Keskes, Farah Benamara, Lamia Hadrich Belguith


Abstract
This paper describes a rule-based approach to segment Arabic texts into clauses. Our method relies on an extensive analysis of a large set of lexical cues as well as punctuation marks. Our analysis was carried out on two different corpus genres: news articles and elementary school textbooks. We propose a three steps segmentation algorithm: first by using only punctuation marks, then by relying only on lexical cues and finally by using both typology and lexical cues. The results were compared with manual segmentations elaborated by experts.
Anthology ID:
L12-1559
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
2826–2832
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/939_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Iskandar Keskes, Farah Benamara, and Lamia Hadrich Belguith. 2012. Clause-based Discourse Segmentation of Arabic Texts. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 2826–2832, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
Clause-based Discourse Segmentation of Arabic Texts (Keskes et al., LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/939_Paper.pdf