Thanaruk Theeramunkong

2023

This paper proposes a method to develop a machine translation (MT) system from Myanmar Sign Language (MSL) to Myanmar Written Language (MWL) and vice versa for the deaf community. Translation of MSL is a difficult task since only a small amount of a parallel corpus between MSL and MWL is available. To address the challenge for MT of the low-resource language, transfer learning is applied. An MT model is trained first for a high-resource language pair, American Sign Language (ASL) and English, then it is used as an initial model to train an MT model between MSL and MWL. The mT5 model is used as a base MT model in this transfer learning. Additionally, a self-training technique is applied to generate synthetic translation pairs of MSL and MWL from a large monolingual MWL corpus. Furthermore, since the segmentation of a sentence is required as preprocessing of MT for the Myanmar language, several segmentation schemes are empirically compared. Results of experiments show that both transfer learning and self-training can enhance the performance of the translation between MSL and MWL compared with a baseline model fine-tuned from a small MSL-MWL parallel corpus only.

2011

pdf bib

Multi-stage Annotation using Pattern-based and Statistical-based Techniques for Automatic Thai Annotated Corpus Construction
Nattapong Tongtep | Thanaruk Theeramunkong
Proceedings of the 9th Workshop on Asian Language Resources

2009

pdf bib

QAST: Question Answering System for ThaiWikipedia
Wittawat Jitkrittum | Choochart Haruechaiyasak | Thanaruk Theeramunkong
Proceedings of the 2009 Workshop on Knowledge and Reasoning for Answering Questions (KRAQ 2009)

2006

pdf bib abs

Word Knowledge Acquisition for Computational Lexicon Construction
Thatsanee Charoenporn | Canasai Kruengkrai | Thanaruk Theeramunkong | Virach Sornlertlamvanich | Hitoshi Isahara
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

The growing of multilingual information processing technology has created the need of linguistic resources, especially lexical database. Many attempts were put to alter the traditional dictionary to computational dictionary, or widely named as computational lexicon. TCLs Computational Lexicon (TCLLEX) is a recent development of a large-scale Thai Lexicon, which aims to serve as a fundamental linguistic resource for natural language processing research. We design either terminology or ontology for structuring the lexicon based on the idea of computability and reusability.

Grammar Acquisition Based on Clustering Analysis and Its Application to Statistical Parsing
Thanaruk Theeramunkong | Manabu Okumura
Fifth Workshop on Very Large Corpora

pdf bib

Exploiting Contextual Information in Hypothesis Selection for Grammar Refinement
Thanaruk Theeramunkong | Yasunobu Kawaguchi | Manabu Okumura
Computational Environments for Grammar Development and Linguistic Engineering

1996

pdf bib

Towards Automatic Grammar Acquisition from a Bracketed Corpus
Thanaruk Theeramunkong | Manabu Okumara
Fourth Workshop on Very Large Corpora