2014
pdf
bib
Unsupervised Construction of a Lexicon and a Repository of Variation Patterns for Arabic Modal Multiword Expressions
Rania Al-Sabbagh
|
Roxana Girju
|
Jana Diesner
Proceedings of the 10th Workshop on Multiword Expressions (MWE)
pdf
bib
Interactive Annotation for Event Modality in Modern Standard and Egyptian Arabic Tweets
Rania Al-Sabbagh
|
Roxana Girju
|
Jana Diesner
Proceedings of LAW VIII - The 8th Linguistic Annotation Workshop
pdf
bib
3arif: A Corpus of Modern Standard and Egyptian Arabic Tweets Annotated for Epistemic Modality Using Interactive Crowdsourcing
Rania Al-Sabbagh
|
Roxana Girju
|
Jana Diesner
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers
2013
pdf
bib
Using the Semantic-Syntactic Interface for Reliable Arabic Modality Annotation
Rania Al-Sabbagh
|
Jana Diesner
|
Roxana Girju
Proceedings of the Sixth International Joint Conference on Natural Language Processing
2012
pdf
bib
abs
YADAC: Yet another Dialectal Arabic Corpus
Rania Al-Sabbagh
|
Roxana Girju
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
This paper presents the first phase of building YADAC ― a multi-genre Dialectal Arabic (DA) corpus ― that is compiled using Web data from microblogs (i.e. Twitter), blogs/forums and online knowledge market services in which both questions and answers are user-generated. In addition to introducing two new genres to the current efforts of building DA corpora (i.e. microblogs and question-answer pairs extracted from online knowledge market services), the paper highlights and tackles several new issues related to building DA corpora that have not been handled in previous studies: function-based Web harvesting and dialect identification, vowel-based spelling variation, linguistic hypercorrection and its effect on spelling variation, unsupervised Part-of-Speech (POS) tagging and base phrase chunking for DA. Although the algorithms for both POS tagging and base-phrase chunking are still under development, the results are promising.
2010
pdf
bib
abs
Mining the Web for the Induction of a Dialectical Arabic Lexicon
Rania Al-Sabbagh
|
Roxana Girju
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
This paper describes the first phase of building a lexicon of Egyptian Cairene Arabic (ECA) ― one of the most widely understood dialects in the Arab World ― and Modern Standard Arabic (MSA). Each ECA entry is mapped to its MSA synonym, Part-of-Speech (POS) tag and top-ranked contexts based on Web queries; and thus each entry is provided with basic syntactic and semantic information for a generic lexicon compatible with multiple NLP applications. Moreover, through their MSA synonyms, ECA entries acquire access to MSA available NLP tools and resources which are considerably available. Using an associationist approach based on the correlations between word co-occurrence patterns in both dialects, we change the direction of the acquisition process from parallel to circular to overcome a bottleneck of current research on Arabic dialects, namely the lack of parallel corpora, and to alleviate accuracy rates for using unrelated Web documents which are more frequently available. Manually evaluated for 1,000 word entries by two native speakers of the ECA-MSA varieties, the proposed approach achieves a promising F-measured performance rate of 70.9%. In discussion to the proposed algorithm, different semantic issues are highlighted for upcoming phases of the induction of a more comprehensive ECA-MSA lexicon.