2020
pdf
bib
abs
Multitask Learning for Cross-Lingual Transfer of Broad-coverage Semantic Dependencies
Maryam Aminian
|
Mohammad Sadegh Rasooli
|
Mona Diab
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
We describe a method for developing broad-coverage semantic dependency parsers for languages for which no semantically annotated resource is available. We leverage a multitask learning framework coupled with annotation projection. We use syntactic parsing as the auxiliary task in our multitask setup. Our annotation projection experiments from English to Czech show that our multitask setup yields 3.1% (4.2%) improvement in labeled F1-score on in-domain (out-of-domain) test set compared to a single-task baseline.
2019
pdf
bib
abs
Cross-Lingual Transfer of Semantic Roles: From Raw Text to Semantic Roles
Maryam Aminian
|
Mohammad Sadegh Rasooli
|
Mona Diab
Proceedings of the 13th International Conference on Computational Semantics - Long Papers
We describe a transfer method based on annotation projection to develop a dependency-based semantic role labeling system for languages for which no supervised linguistic information other than parallel data is available. Unlike previous work that presumes the availability of supervised features such as lemmas, part-of-speech tags, and dependency parse trees, we only make use of word and character features. Our deep model considers using character-based representations as well as unsupervised stem embeddings to alleviate the need for supervised features. Our experiments outperform a state-of-the-art method that uses supervised lexico-syntactic features on 6 out of 7 languages in the Universal Proposition Bank.
2017
pdf
bib
abs
Transferring Semantic Roles Using Translation and Syntactic Information
Maryam Aminian
|
Mohammad Sadegh Rasooli
|
Mona Diab
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
Our paper addresses the problem of annotation projection for semantic role labeling for resource-poor languages using supervised annotations from a resource-rich language through parallel data. We propose a transfer method that employs information from source and target syntactic dependencies as well as word alignment density to improve the quality of an iterative bootstrapping method. Our experiments yield a 3.5 absolute labeled F-score improvement over a standard annotation projection method.
2016
pdf
bib
abs
Automatic Verification and Augmentation of Multilingual Lexicons
Maryam Aminian
|
Mohamed Al-Badrashiny
|
Mona Diab
Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3)
We present an approach for automatic verification and augmentation of multilingual lexica. We exploit existing parallel and monolingual corpora to extract multilingual correspondents via tri-angulation. We demonstrate the efficacy of our approach on two publicly available resources: Tharwa, a three-way lexicon comprising Dialectal Arabic, Modern Standard Arabic and English lemmas among other information (Diab et al., 2014); and BabelNet, a multilingual thesaurus comprising over 276 languages including Arabic variant entries (Navigli and Ponzetto, 2012). Our automated approach yields an F1-score of 71.71% in generating correct multilingual correspondents against gold Tharwa, and 54.46% against gold BabelNet without any human intervention.
2015
pdf
bib
Unsupervised False Friend Disambiguation Using Contextual Word Clusters and Parallel Word Alignments
Maryam Aminian
|
Mahmoud Ghoneim
|
Mona Diab
Proceedings of the Ninth Workshop on Syntax, Semantics and Structure in Statistical Translation
2014
pdf
bib
abs
Tharwa: A Large Scale Dialectal Arabic - Standard Arabic - English Lexicon
Mona Diab
|
Mohamed Al-Badrashiny
|
Maryam Aminian
|
Mohammed Attia
|
Heba Elfardy
|
Nizar Habash
|
Abdelati Hawwari
|
Wael Salloum
|
Pradeep Dasigi
|
Ramy Eskander
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
We introduce an electronic three-way lexicon, Tharwa, comprising Dialectal Arabic, Modern Standard Arabic and English correspondents. The paper focuses on Egyptian Arabic as the first pilot dialect for the resource, with plans to expand to other dialects of Arabic in later phases of the project. We describe Tharwas creation process and report on its current status. The lexical entries are augmented with various elements of linguistic information such as POS, gender, rationality, number, and root and pattern information. The lexicon is based on a compilation of information from both monolingual and bilingual existing resources such as paper dictionaries and electronic, corpus-based dictionaries. Multiple levels of quality checks are performed on the output of each step in the creation process. The importance of this lexicon lies in the fact that it is the first resource of its kind bridging multiple variants of Arabic with English. Furthermore, it is a wide coverage lexical resource containing over 73,000 Egyptian entries. Tharwa is publicly available. We believe it will have a significant impact on both Theoretical Linguistics as well as Computational Linguistics research.
pdf
bib
Handling OOV Words in Dialectal Arabic to English Machine Translation
Maryam Aminian
|
Mahmoud Ghoneim
|
Mona Diab
Proceedings of the EMNLP’2014 Workshop on Language Technology for Closely Related Languages and Language Variants