A trigram part-of-speech tagger for the Apertium free/open-source machine translation platform

Zaid Md Abdul Wahab Sheikh, Felipe Sánchez-Martínez


Abstract
This paper describes the implementation of a second-order hidden Markov model (HMM) based part-of-speech tagger for the Apertium free/opensource rule-based machine translation platform. We describe the part-ofspeech (PoS) tagging approach in Apertium and how it is parametrised through a tagger definition file that defines: (1) the set of tags to be used and (2) constrain rules that can be used to forbid certain PoS tag sequences, thus refining the HMM parameters and increasing its tagging accuracy. The paper also reviews the Baum-Welch algorithm used to estimate the HMM parameters and compares the tagging accuracy achieved with that achieved by the original, first-order HMM-based PoS tagger in Apertium.
Anthology ID:
2009.freeopmt-1.11
Volume:
Proceedings of the First International Workshop on Free/Open-Source Rule-Based Machine Translation
Month:
November 2-3
Year:
2009
Address:
Alacant, Spain
Editors:
Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martinez, Francis M. Tyers
Venue:
FreeOpMT
SIG:
Publisher:
Note:
Pages:
67–74
Language:
URL:
https://aclanthology.org/2009.freeopmt-1.11
DOI:
Bibkey:
Cite (ACL):
Zaid Md Abdul Wahab Sheikh and Felipe Sánchez-Martínez. 2009. A trigram part-of-speech tagger for the Apertium free/open-source machine translation platform. In Proceedings of the First International Workshop on Free/Open-Source Rule-Based Machine Translation, pages 67–74, Alacant, Spain.
Cite (Informal):
A trigram part-of-speech tagger for the Apertium free/open-source machine translation platform (Sheikh & Sánchez-Martínez, FreeOpMT 2009)
Copy Citation:
PDF:
https://aclanthology.org/2009.freeopmt-1.11.pdf