Dimitar Kazakov


2024

pdf bib
Meta-Evaluation of Sentence Simplification Metrics
Noof Abdullah Alfear | Dimitar Kazakov | Hend Al-Khalifa
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Automatic Text Simplification (ATS) is one of the major Natural Language Processing (NLP) tasks, which aims to help people understand text that is above their reading abilities and comprehension. ATS models reconstruct the text into a simpler format by deletion, substitution, addition or splitting, while preserving the original meaning and maintaining correct grammar. Simplified sentences are usually evaluated by human experts based on three main factors: simplicity, adequacy and fluency or by calculating automatic evaluation metrics. In this paper, we conduct a meta-evaluation of reference-based automatic metrics for English sentence simplification using high-quality, human-annotated dataset, NEWSELA-LIKERT. We study the behavior of several evaluation metrics at sentence level across four different sentence simplification models. All the models were trained on the NEWSELA-AUTO dataset. The correlation between the metrics’ scores and human judgements was analyzed and the results used to recommend the most appropriate metrics for this task.

2017

pdf bib
Machine Learning Models of Universal Grammar Parameter Dependencies
Dimitar Kazakov | Guido Cordoni | Andrea Ceolin | Monica-Alexandrina Irimia | Shin-Sook Kim | Dimitris Michelioudakis | Nina Radkevich | Cristina Guardiano | Giuseppe Longobardi
Proceedings of the Workshop Knowledge Resources for the Socio-Economic Sciences and Humanities associated with RANLP 2017

The use of parameters in the description of natural language syntax has to balance between the need to discriminate among (sometimes subtly different) languages, which can be seen as a cross-linguistic version of Chomsky’s (1964) descriptive adequacy, and the complexity of the acquisition task that a large number of parameters would imply, which is a problem for explanatory adequacy. Here we present a novel approach in which a machine learning algorithm is used to find dependencies in a table of parameters. The result is a dependency graph in which some of the parameters can be fully predicted from others. These empirical findings can be then subjected to linguistic analysis, which may either refute them by providing typological counter-examples of languages not included in the original dataset, dismiss them on theoretical grounds, or uphold them as tentative empirical laws worth of further study.

pdf bib
Building Dialectal Arabic Corpora
Hani Elgabou | Dimitar Kazakov
Proceedings of the Workshop Human-Informed Translation and Interpreting Technology

The aim of this research is to identify local Arabic dialects in texts from social media (Twitter) and link them to specific geographic areas. Dialect identification is studied as a subset of the task of language identification. The proposed method is based on unsupervised learning using simultaneously lexical and geographic distance. While this study focusses on Libyan dialects, the approach is general, and could produce resources to support human translators and interpreters when dealing with vernaculars rather than standard Arabic.

2013

pdf bib
Using Parallel Corpora for Word Sense Disambiguation
Dimitar Kazakov | Ahmad R. Shahid
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

2009

pdf bib
Unsupervised Construction of a Multilingual WordNet from Parallel Corpora
Dimitar Kazakov | Ahmad R. Shahid
Proceedings of the Workshop on Natural Language Processing Methods and Corpora in Translation, Lexicography, and Language Learning

2004

pdf bib
WordNet-based text document clustering
Julian Sedding | Dimitar Kazakov
Proceedings of the 3rd workshop on RObust Methods in Analysis of Natural Language Data (ROMAND 2004)