JCT at SemEval-2021 Task 1: Context-aware Representation for Lexical Complexity Prediction
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)
In this paper, we present our contribution in SemEval-2021 Task 1: Lexical Complexity Prediction, where we integrate linguistic, statistical, and semantic properties of the target word and its context as features within a Machine Learning (ML) framework for predicting lexical complexity. In particular, we use BERT contextualized word embeddings to represent the semantic meaning of the target word and its context. We participated in the sub-task of predicting the complexity score of single words
Automatic Construction of Aramaic-Hebrew Translation Lexicon
Proceedings of LT4HALA 2020 - 1st Workshop on Language Technologies for Historical and Ancient Languages
Aramaic is an ancient Semitic language with a 3,000 year history. However, since the number of Aramaic speakers in the world hasdeclined, Aramaic is in danger of extinction. In this paper, we suggest a methodology for automatic construction of Aramaic-Hebrew translation Lexicon. First, we generate an initial translation lexicon by a state-of-the-art word alignment translation model. Then,we filter the initial lexicon using string similarity measures of three types: similarity between terms in the target language, similarity between a source and a target term, and similarity between terms in the source language. In our experiments, we use a parallel corporaof Biblical Aramaic-Hebrew sentence pairs and evaluate various string similarity measures for each type of similarity. We illustratethe empirical benefit of our methodology and its effect on precision and F1. In particular, we demonstrate that our filtering methodsignificantly exceeds a filtering approach based on the probability scores given by a state-of-the-art word alignment translation model.