Thepchai Supnithi


2024

pdf bib
myMediCon: End-to-End Burmese Automatic Speech Recognition for Medical Conversations
Hay Man Htun | Ye Kyaw Thu | Hutchatai Chanlekha | Kotaro Funakoshi | Thepchai Supnithi
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

End-to-End Automatic Speech Recognition (ASR) models have significantly advanced the field of speech processing by streamlining traditionally complex ASR system pipelines, promising enhanced accuracy and efficiency. Despite these advancements, there is a notable absence of freely available medical conversation speech corpora for Burmese, which is one of the low-resource languages. Addressing this gap, we present a manually curated Burmese Medical Speech Conversations (myMediCon) corpus, encapsulating conversations among medical doctors, nurses, and patients. Utilizing the ESPnet speech processing toolkit, we explore End-to-End ASR models for the Burmese language, focus on Transformer and Recurrent Neural Network (RNN) architectures. Our corpus comprises 12 speakers, including three males and nine females, with a total speech duration of nearly 11 hours within the medical domain. To assess the ASR performance, we applied word and syllable segmentation to the text corpus. ASR models were evaluated using Character Error Rate (CER), Word Error Rate (WER), and Translation Error Rate (TER). The experimental results indicate that the RNN-based Burmese speech recognition with syllable-level segmentation achieved the best performance, yielding a CER of 9.7%. Moreover, the RNN approach significantly outperformed the Transformer model.

2023

pdf bib
Enhancing Translation of Myanmar Sign Language by Transfer Learning and Self-Training
Hlaing Myat Nwe | Kiyoaki Shirai | Natthawut Kertkeidkachorn | Thanaruk Theeramunkong | Ye Kyaw Thu | Thepchai Supnithi | Natsuda Kaothanthong
Proceedings of Machine Translation Summit XIX, Vol. 1: Research Track

This paper proposes a method to develop a machine translation (MT) system from Myanmar Sign Language (MSL) to Myanmar Written Language (MWL) and vice versa for the deaf community. Translation of MSL is a difficult task since only a small amount of a parallel corpus between MSL and MWL is available. To address the challenge for MT of the low-resource language, transfer learning is applied. An MT model is trained first for a high-resource language pair, American Sign Language (ASL) and English, then it is used as an initial model to train an MT model between MSL and MWL. The mT5 model is used as a base MT model in this transfer learning. Additionally, a self-training technique is applied to generate synthetic translation pairs of MSL and MWL from a large monolingual MWL corpus. Furthermore, since the segmentation of a sentence is required as preprocessing of MT for the Myanmar language, several segmentation schemes are empirically compared. Results of experiments show that both transfer learning and self-training can enhance the performance of the translation between MSL and MWL compared with a baseline model fine-tuned from a small MSL-MWL parallel corpus only.

2021

pdf bib
NECTEC’s Participation in WAT-2021
Zar Zar Hlaing | Ye Kyaw Thu | Thazin Myint Oo | Mya Ei San | Sasiporn Usanavasin | Ponrudee Netisopakul | Thepchai Supnithi
Proceedings of the 8th Workshop on Asian Translation (WAT2021)

In this paper, we report the experimental results of Machine Translation models conducted by a NECTEC team for the translation tasks of WAT-2021. Basically, our models are based on neural methods for both directions of English-Myanmar and Myanmar-English language pairs. Most of the existing Neural Machine Translation (NMT) models mainly focus on the conversion of sequential data and do not directly use syntactic information. However, we conduct multi-source neural machine translation (NMT) models using the multilingual corpora such as string data corpus, tree data corpus, or POS-tagged data corpus. The multi-source translation is an approach to exploit multiple inputs (e.g. in two different formats) to increase translation accuracy. The RNN-based encoder-decoder model with attention mechanism and transformer architectures have been carried out for our experiment. The experimental results showed that the proposed models of RNN-based architecture outperform the baseline model for English-to-Myanmar translation task, and the multi-source and shared-multi-source transformer models yield better translation results than the baseline.

2019

pdf bib
Statistical Machine Translation between Myanmar (Burmese) and Dawei (Tavoyan)
Thazin Myint Oo | Ye Kyaw Thu | Khin Mar Soe | Thepchai Supnithi
Proceedings of the First International Workshop on NLP Solutions for Under Resourced Languages (NSURL 2019) co-located with ICNLSP 2019 - Short Papers

pdf bib
String Similarity Measures for Myanmar Language (Burmese)
Khaing Hsu Wai | Ye Kyaw Thu | Hnin Aye Thant | Swe Zin Moe | Thepchai Supnithi
Proceedings of the First International Workshop on NLP Solutions for Under Resourced Languages (NSURL 2019) co-located with ICNLSP 2019 - Short Papers

2017

pdf bib
Proceedings of the IJCNLP 2017, System Demonstrations
Seong-Bae Park | Thepchai Supnithi
Proceedings of the IJCNLP 2017, System Demonstrations

2014

pdf bib
Character-Cluster-Based Segmentation using Monolingual and Bilingual Information for Statistical Machine Translation
Vipas Sutantayawalee | Peerachet Porkeaw | Thepchai Supnithi | Prachya Boonkwan | Sitthaa Phaholphinyo
Proceedings of the Fifth Workshop on South and Southeast Asian Natural Language Processing

pdf bib
Improvement of Statistical Machine Translation using Charater-Based Segmentationwith Monolingual and Bilingual Information
Vipas Sutantayawalee | Peerachet Porkaew | Prachya Boonkwan | Sitthaa Phaholphinyo | Thepchai Supnithi
Proceedings of the 28th Pacific Asia Conference on Language, Information and Computing

2011

pdf bib
Automatic Transformation of the Thai Categorial Grammar Treebank to Dependency Trees
Christian Rishøj | Taneth Ruangrajitpakorn | Prachya Boonkwan | Thepchai Supnithi
Proceedings of 5th International Joint Conference on Natural Language Processing

2010

pdf bib
AutoTagTCG : A Framework for Automatic Thai CG Tagging
Thepchai Supnithi | Taneth Ruangrajitpakorn | Kanokorn Trakultaweekool | Peerachet Porkaew
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper aims to develop a framework for automatic CG tagging. We investigated two main algorithms, CRF and Statistical alignment model based on information theory (SAM). We found that SAM gives the best results both in word level and sentence level. We got the accuracy 89.25% in word level and 82.49% in sentence level. Combining both methods can be suited for both known and unknown word.

pdf bib
A Supervised Learning based Chunking in Thai using Categorial Grammar
Thepchai Supnithi | Chanon Onman | Peerachet Porkaew | Taneth Ruangrajitpakorn | Kanokorn Trakultaweekool | Asanee Kawtrakul
Proceedings of the Eighth Workshop on Asian Language Resouces

pdf bib
A Current Status of Thai Categorial Grammars and Their Applications
Taneth Ruangrajitpakorn | Thepchai Supnithi
Proceedings of the Eighth Workshop on Asian Language Resouces

2009

pdf bib
A Syntactic Resource for Thai: CG Treebank
Taneth Ruangrajitpakorn | Kanokorn Trakultaweekoon | Thepchai Supnithi
Proceedings of the 7th Workshop on Asian Language Resources (ALR7)

2008

pdf bib
Memory-Inductive Categorial Grammar: An Approach to Gap Resolution in Analytic-Language Translation
Prachya Boonkwan | Thepchai Supnithi
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I

pdf bib
Speech-to-Speech Translation Activities in Thailand
Chai Wutiwiwatchai | Thepchai Supnithi | Krit Kosawat
Proceedings of the Workshop on Technologies and Corpora for Asia-Pacific Speech Translation (TCAST)

pdf bib
OpenCCG Workbench and Visualization Tool
Thepchai Supnithi | Suchinder Singh | Taneth Ruangrajitpakorn | Prachya Boonkwan | Monthika Boriboon
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Combinatorial Category Grammar is (CCG) a lexicalized grammar formalism which is expressed by syntactic category, a logical form representation. There are difficulties in representing CCG without any visualization tools. This paper presents a design framework of OpenCCG workbench and visualization tool which enables linguists to develop CCG based lexicons more easily. Our research is aimed to resolve these gaps by developing a user-friendly tool. OpenCCG Workbench, an open source web-based environment, was developed to enable multiple users to visually create and update grammars for using with the OpenCCG library. It was designed to streamline and speed-up the lexicon building process, and to free the linguists from writing XML files which is both cumbersome and error-prone. The system consists of three sub-systems: grammar management system, grammar validator system, and concordance retrieval system. In this paper we will mainly discuss the most important parts, grammar management and validation systems, which are directly related to a CCG lexicon construction. We support users in three levels; Expert linguists who play a role as lexical entry designer, normal linguists who adds or edits lexicons, and guests who requires an acquisition to the lexicon into their applications.

2005

pdf bib
A Practical of Memory-based Approach for Improving Accuracy of MT
Sitthaa Phaholphinyo | Teerapong Modhiran | Nattapol Kritsuthikul | Thepchai Supnithi
Proceedings of Machine Translation Summit X: Papers

Rule-Based Machine Translation (RBMT) [1] approach is a major approach in MT research. It needs linguistic knowledge to create appropriate rules of translation. However, we cannot completely add all linguistic rules to the system because adding new rules may cause a conflict with the old ones. So, we propose a memory based approach to improve the translation quality without modifying the existing linguistic rules. This paper analyses the translation problems and shows how this approach works.

pdf bib
PARSIT-TE: Online Thai-English Machine Translation
Teerapong Modhiran | Krit Kosawat | Supon Klaithin | Monthika Boriboon | Thepchai Supnithi
Proceedings of Machine Translation Summit X: Posters

This paper presents an online Thai-English MT system, called PARSITTE, which is an extension of PARSIT English-Thai one. We aim to assist foreigners and Thai in exchanging more easily their information. The system is a rule-based and Interlingua approach. To improve the system, we concentrate on pre-processing and rule analysis phases, which are considered necessary because of some specific problems of Thai language.

2003

pdf bib
Automatic Error Detection in the Japanese Learners’ English Spoken Data
Emi Izumi | Kiyotaka Uchimoto | Toyomi Saiga | Thepchai Supnithi | Hitoshi Isahara
The Companion Volume to the Proceedings of 41st Annual Meeting of the Association for Computational Linguistics

2002

pdf bib
A Cross System Machine Translation
Thepchai Supnithi | Virach Sornlertlamvanich | Thatsanee Charoenporn
COLING-02: Machine Translation in Asia