Synchronizing Approach in Designing Annotation Guidelines for Multilingual Datasets: A COVID-19 Case Study Using English and Japanese Tweets

Kiki Ferawati, Wan Jou She, Shoko Wakamiya, Eiji Aramaki


Abstract
The difference in culture between the U.S. and Japan is a popular subject for Western vs. Eastern cultural comparison for researchers. One particular challenge is to obtain and annotate multilingual datasets. In this study, we utilized COVID-19 tweets from the two countries as a case study, focusing particularly on discussions concerning masks. The annotation task was designed to gain insights into societal attitudes toward the mask policies implemented in both countries. The aim of this study is to provide a practical approach for the annotation task by thoroughly documenting how we aligned the multilingual annotation guidelines to obtain a comparable dataset. We proceeded to document the effective practices during our annotation process to synchronize our multilingual guidelines. Furthermore, we discussed difficulties caused by differences in expression style and culture, and potential strategies that helped improve our agreement scores and reduce discrepancies between the annotation results in both languages. These findings offer an alternative method for synchronizing multilingual annotation guidelines and achieving feasible agreement scores for cross-cultural annotation tasks. This study resulted in a multilingual guideline in English and Japanese to annotate topics related to public discourses about COVID-19 masks in the U.S. and Japan.
Anthology ID:
2024.c3nlp-1.3
Volume:
Proceedings of the 2nd Workshop on Cross-Cultural Considerations in NLP
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Vinodkumar Prabhakaran, Sunipa Dev, Luciana Benotti, Daniel Hershcovich, Laura Cabello, Yong Cao, Ife Adebara, Li Zhou
Venues:
C3NLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
32–41
Language:
URL:
https://aclanthology.org/2024.c3nlp-1.3
DOI:
10.18653/v1/2024.c3nlp-1.3
Bibkey:
Cite (ACL):
Kiki Ferawati, Wan Jou She, Shoko Wakamiya, and Eiji Aramaki. 2024. Synchronizing Approach in Designing Annotation Guidelines for Multilingual Datasets: A COVID-19 Case Study Using English and Japanese Tweets. In Proceedings of the 2nd Workshop on Cross-Cultural Considerations in NLP, pages 32–41, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
Synchronizing Approach in Designing Annotation Guidelines for Multilingual Datasets: A COVID-19 Case Study Using English and Japanese Tweets (Ferawati et al., C3NLP-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.c3nlp-1.3.pdf