Hengam: An Adversarially Trained Transformer for Persian Temporal Tagging

Sajad Mirzababaei, Amir Hossein Kargaran, Hinrich Schütze, Ehsaneddin Asgari


Abstract
Many NLP main tasks benefit from an accurate understanding of temporal expressions, e.g., text summarization, question answering, and information retrieval. This paper introduces Hengam, an adversarially trained transformer for Persian temporal tagging outperforming state-of-the-art approaches on a diverse and manually created dataset. We create Hengam in the following concrete steps: (1) we develop HengamTagger, an extensible rule-based tool that can extract temporal expressions from a set of diverse language-specific patterns for any language of interest. (2) We apply HengamTagger to annotate temporal tags in a large and diverse Persian text collection (covering both formal and informal contexts) to be used as weakly labeled data. (3) We introduce an adversarially trained transformer model on HengamCorpus that can generalize over the HengamTagger’s rules. We create HengamGold, the first high-quality gold standard for Persian temporal tagging. Our trained adversarial HengamTransformer not only achieves the best performance in terms of the F1-score (a type F1-Score of 95.42 and a partial F1-Score of 91.60) but also successfully deals with language ambiguities and incorrect spellings. Our code, data, and models are publicly available at https://github.com/kargaranamir/Hengam.
Anthology ID:
2022.aacl-main.74
Volume:
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Month:
November
Year:
2022
Address:
Online only
Editors:
Yulan He, Heng Ji, Sujian Li, Yang Liu, Chua-Hui Chang
Venues:
AACL | IJCNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1013–1024
Language:
URL:
https://aclanthology.org/2022.aacl-main.74
DOI:
Bibkey:
Cite (ACL):
Sajad Mirzababaei, Amir Hossein Kargaran, Hinrich Schütze, and Ehsaneddin Asgari. 2022. Hengam: An Adversarially Trained Transformer for Persian Temporal Tagging. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1013–1024, Online only. Association for Computational Linguistics.
Cite (Informal):
Hengam: An Adversarially Trained Transformer for Persian Temporal Tagging (Mirzababaei et al., AACL-IJCNLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.aacl-main.74.pdf