2024
pdf
bib
abs
Lexicons Gain the Upper Hand in Arabic MWE Identification
Najet Hadj Mohamed
|
Agata Savary
|
Cherifa Ben Khelil
|
Jean-Yves Antoine
|
Iskandar Keskes
|
Lamia Hadrich-Belguith
Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024
This paper highlights the importance of integrating MWE identification with the development of syntactic MWE lexicons. It suggests that lexicons with minimal morphosyntactic information can amplify current MWE-annotated datasets and refine identification strategies. To our knowledge, this work represents the first attempt to focus on both seen and unseen of VMWEs for Arabic. It also deals with the challenge of differentiating between literal and figurative interpretations of idiomatic expressions. The approach involves a dual-phase procedure: first projecting a VMWE lexicon onto a corpus to identify candidate occurrences, then disambiguating these occurrences to distinguish idiomatic from literal instances. Experiments outlined in the paper aim to assess the efficacy of this technique, utilizing a lexicon known as LEXAR and the “parseme-ar” corpus. The findings suggest that lexicon-driven strategies have the potential to refine MWE identification, particularly for unseen occurrences.
2022
pdf
bib
abs
Annotating Verbal Multiword Expressions in Arabic: Assessing the Validity of a Multilingual Annotation Procedure
Najet Hadj Mohamed
|
Cherifa Ben Khelil
|
Agata Savary
|
Iskandar Keskes
|
Jean-Yves Antoine
|
Lamia Hadrich-Belguith
Proceedings of the Thirteenth Language Resources and Evaluation Conference
This paper describes our efforts to extend the PARSEME framework to Modern Standard Arabic. Theapplicability of the PARSEME guidelines was tested by measuring the inter-annotator agreement in theearly annotation stage. A subset of 1,062 sentences from the Prague Arabic Dependency Treebank PADTwas selected and annotated by two Arabic native speakers independently. Following their annotations, anew Arabic corpus with over 1,250 annotated VMWEs has been built. This corpus already exceeds thesmallest corpora of the PARSEME suite, and enables first observations. We discuss our annotation guide-line schema that shows full MWE annotation is realizable in Arabic where we get good inter-annotator agreement.
2013
pdf
bib
Segmenting Arabic Texts into Elementary Discourse Units (Segmentation de textes arabes en unités discursives minimales) [in French]
Iskandar Keskes
|
Farah Beanamara
|
Lamia Hadrich Belguith
Proceedings of TALN 2013 (Volume 1: Long Papers)
2012
pdf
bib
Étude comparative entre trois approches de résumé automatique de documents arabes (Comparative Study of Three Approaches to Automatic Summarization of Arabic Documents) [in French]
Iskandar Keskes
|
Mohamed Mahdi Boudabous
|
Mohamed Hédi Maaloul
|
Lamia Hadrich Belguith
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 2: TALN
pdf
bib
abs
Clause-based Discourse Segmentation of Arabic Texts
Iskandar Keskes
|
Farah Benamara
|
Lamia Hadrich Belguith
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
This paper describes a rule-based approach to segment Arabic texts into clauses. Our method relies on an extensive analysis of a large set of lexical cues as well as punctuation marks. Our analysis was carried out on two different corpus genres: news articles and elementary school textbooks. We propose a three steps segmentation algorithm: first by using only punctuation marks, then by relying only on lexical cues and finally by using both typology and lexical cues. The results were compared with manual segmentations elaborated by experts.
2010
pdf
bib
abs
Résumé automatique de documents arabes basé sur la technique RST
Mohamed Hédi Maâloul
|
Iskandar Keskes
Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. REncontres jeunes Chercheurs en Informatique pour le Traitement Automatique des Langues
Dans cet article, nous nous intéressons au résumé automatique de textes arabes. Nous commençons par présenter une étude analytique réalisée sur un corpus de travail qui nous a permis de déduire, suite à des observations empiriques, un ensemble de relations et de frames (règles ou patrons) rhétoriques; ensuite nous présentons notre méthode de production de résumés pour les textes arabes. La méthode que nous proposons se base sur la Théorie de la Structure Rhétorique (RST) (Mann et al., 1988) et utilise des connaissances purement linguistiques. Le principe de notre proposition s’appuie sur trois piliers. Le premier pilier est le repérage des relations rhétoriques entres les différentes unités minimales du texte dont l’une possède le statut de noyau – segment de texte primordial pour la cohérence – et l’autre a le statut noyau ou satellite – segment optionnel. Le deuxième pilier est le dressage et la simplification de l’arbre RST. Le troisième pilier est la sélection des phrases noyaux formant le résumé final, qui tiennent en compte le type de relation rhétoriques choisi pour l’extrait.