Ragheb Al-Ghezi


2023

pdf bib
Automated Assessment of Task Completion in Spontaneous Speech for Finnish and Finland Swedish Language Learners
Ekaterina Voskoboinik | Yaroslav Getman | Ragheb Al-Ghezi | Mikko Kurimo | Tamas Grosz
Proceedings of the 12th Workshop on NLP for Computer Assisted Language Learning

2020

pdf bib
Graph-based Syntactic Word Embeddings
Ragheb Al-Ghezi | Mikko Kurimo
Proceedings of the Graph-based Methods for Natural Language Processing (TextGraphs)

We propose a simple and efficient framework to learn syntactic embeddings based on information derived from constituency parse trees. Using biased random walk methods, our embeddings not only encode syntactic information about words, but they also capture contextual information. We also propose a method to train the embeddings on multiple constituency parse trees to ensure the encoding of global syntactic representation. Quantitative evaluation of the embeddings show a competitive performance on POS tagging task when compared to other types of embeddings, and qualitative evaluation reveals interesting facts about the syntactic typology learned by these embeddings.

2019

pdf bib
Inferring missing metadata from environmental policy texts
Steven Bethard | Egoitz Laparra | Sophia Wang | Yiyun Zhao | Ragheb Al-Ghezi | Aaron Lien | Laura López-Hoffman
Proceedings of the 3rd Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

The National Environmental Policy Act (NEPA) provides a trove of data on how environmental policy decisions have been made in the United States over the last 50 years. Unfortunately, there is no central database for this information and it is too voluminous to assess manually. We describe our efforts to enable systematic research over US environmental policy by extracting and organizing metadata from the text of NEPA documents. Our contributions include collecting more than 40,000 NEPA-related documents, and evaluating rule-based baselines that establish the difficulty of three important tasks: identifying lead agencies, aligning document versions, and detecting reused text.