Hyunju Song
2024
Metadata Enhancement Using Large Language Models
Hyunju Song
|
Steven Bethard
|
Andrea Thomer
Proceedings of the Fourth Workshop on Scholarly Document Processing (SDP 2024)
In the natural sciences, a common form of scholarly document is a physical sample record, which provides categorical and textual metadata for specimens collected and analyzed for scientific research. Physical sample archives like museums and repositories publish these records in data repositories to support reproducible science and enable the discovery of physical samples. However, the success of resource discovery in such interfaces depends on the completeness of the sample records. We investigate approaches for automatically completing the scientific metadata fields of sample records. We apply large language models in zero and few-shot settings and incorporate the hierarchical structure of the taxonomy. We show that a combination of record summarization, bottom-up taxonomy traversal, and few-shot prompting yield F1 as high as 0.928 on metadata completion in the Earth science domain.
2022
UA-KO at SemEval-2022 Task 11: Data Augmentation and Ensembles for Korean Named Entity Recognition
Hyunju Song
|
Steven Bethard
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)
This paper presents the approaches and systems of the UA-KO team for the Korean portion of SemEval-2022 Task 11 on Multilingual Complex Named Entity Recognition.We fine-tuned Korean and multilingual BERT and RoBERTA models, conducted experiments on data augmentation, ensembles, and task-adaptive pretraining. Our final system ranked 8th out of 17 teams with an F1 score of 0.6749 F1.