Survey on Thai NLP Language Resources and Tools

Ratchakrit Arreerard, Stephen Mander, Scott Piao


Abstract
Over the past decades, Natural Language Processing (NLP) research has been expanding to cover more languages. Recently particularly, NLP community has paid increasing attention to under-resourced languages. However, there are still many languages for which NLP research is limited in terms of both language resources and software tools. Thai language is one of the under-resourced languages in the NLP domain, although it is spoken by nearly 70 million people globally. In this paper, we report on our survey on the past development of Thai NLP research to help understand its current state and future research directions. Our survey shows that, although Thai NLP community has achieved a significant achievement over the past three decades, particularly on NLP upstream tasks such as tokenisation, research on downstream tasks such as syntactic parsing and semantic analysis is still limited. But we foresee that Thai NLP research will advance rapidly as richer Thai language resources and more robust NLP techniques become available.
Anthology ID:
2022.lrec-1.697
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
6495–6505
Language:
URL:
https://aclanthology.org/2022.lrec-1.697
DOI:
Bibkey:
Cite (ACL):
Ratchakrit Arreerard, Stephen Mander, and Scott Piao. 2022. Survey on Thai NLP Language Resources and Tools. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 6495–6505, Marseille, France. European Language Resources Association.
Cite (Informal):
Survey on Thai NLP Language Resources and Tools (Arreerard et al., LREC 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.lrec-1.697.pdf
Data
Polyglot-NER