Arbi Haza Nasution
2022
A Neural Network Approach to Create Minangkabau-Indonesia Bilingual Dictionary
Kartika Resiandi
|
Yohei Murakami
|
Arbi Haza Nasution
Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages
Indonesia has many varieties of ethnic languages, and most come from the same language family, namely Austronesian languages. Coming from that same language family, the words in Indonesian ethnic languages are very similar. However, there is research stating that Indonesian ethnic languages are endangered. Thus, to prevent that, we proposed to create a bilingual dictionary between ethnic languages using a neural network approach to extract transformation rules using character level embedding and the Bi-LSTM method in a sequence-to-sequence model. The model has an encoder and decoder. The encoder functions read the input sequence, character by character, generate context, then extract a summary of the input. The decoder will produce an output sequence where every character in each time-step and the next character that comes out are affected by the previous character. The current case for experiment translation focuses on Minangkabau and Indonesian languages with 13761-word pairs. For evaluating the model’s performance, 5-Fold Cross-Validation is used.
2018
Designing a Collaborative Process to Create Bilingual Dictionaries of Indonesian Ethnic Languages
Arbi Haza Nasution
|
Yohei Murakami
|
Toru Ishida
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
2016
Constraint-Based Bilingual Lexicon Induction for Closely Related Languages
Arbi Haza Nasution
|
Yohei Murakami
|
Toru Ishida
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
The lack or absence of parallel and comparable corpora makes bilingual lexicon extraction becomes a difficult task for low-resource languages. Pivot language and cognate recognition approach have been proven useful to induce bilingual lexicons for such languages. We analyze the features of closely related languages and define a semantic constraint assumption. Based on the assumption, we propose a constraint-based bilingual lexicon induction for closely related languages by extending constraints and translation pair candidates from recent pivot language approach. We further define three constraint sets based on language characteristics. In this paper, two controlled experiments are conducted. The former involves four closely related language pairs with different language pair similarities, and the latter focuses on sense connectivity between non-pivot words and pivot words. We evaluate our result with F-measure. The result indicates that our method works better on voluminous input dictionaries and high similarity languages. Finally, we introduce a strategy to use proper constraint sets for different goals and language characteristics.