Guidance-Based Prompt Data Augmentation in Specialized Domains for Named Entity Recognition

Hyeonseok Kang, Hyein Seo, Jeesu Jung, Sangkeun Jung, Du-Seong Chang, Riwoo Chung


Abstract
While the abundance of rich and vast datasets across numerous fields has facilitated the advancement of natural language processing, sectors in need of specialized data types continue to struggle with the challenge of finding quality data. Our study introduces a novel guidance data augmentation technique utilizing abstracted context and sentence structures to produce varied sentences while maintaining context-entity relationships, addressing data scarcity challenges. By fostering a closer relationship between context, sentence structure, and role of entities, our method enhances data augmentation’s effectiveness. Consequently, by showcasing diversification in both entity-related vocabulary and overall sentence structure, and simultaneously improving the training performance of named entity recognition task.
Anthology ID:
2024.acl-short.61
Volume:
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
665–672
Language:
URL:
https://aclanthology.org/2024.acl-short.61
DOI:
10.18653/v1/2024.acl-short.61
Bibkey:
Cite (ACL):
Hyeonseok Kang, Hyein Seo, Jeesu Jung, Sangkeun Jung, Du-Seong Chang, and Riwoo Chung. 2024. Guidance-Based Prompt Data Augmentation in Specialized Domains for Named Entity Recognition. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 665–672, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
Guidance-Based Prompt Data Augmentation in Specialized Domains for Named Entity Recognition (Kang et al., ACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.acl-short.61.pdf