Advancing Community Directories: Leveraging LLMs for Automated Extraction in MARC Standard Venue Availability Notes

Mostafa Didar Mahdi, Thushari Atapattu, Menasha Thilakaratne


Abstract
This paper addresses the challenge of efficiently managing and accessing community service information, specifically focusing on venue hire details within the SAcommunity directory. By leveraging Large Language Models (LLMs), particularly the RoBERTa transformer model, we developed an automated system to extract and structure venue availability information according to MARC (Machine-Readable Cataloging) standards. Our approach involved fine-tuning the RoBERTa model on a dataset of community service descriptions, enabling it to identify and categorize key elements such as facility names, capacities, equipment availability, and accessibility features. The model was then applied to process unstructured text data from the SAcommunity database, automatically extracting relevant information and organizing it into standardized fields. The results demonstrate the effectiveness of this method in transforming free-text summaries into structured, MARC-compliant data. This automation not only significantly reduces the time and effort required for data entry and categorization but also enhances the accessibility and usability of community information.
Anthology ID:
2024.alta-1.9
Volume:
Proceedings of the 22nd Annual Workshop of the Australasian Language Technology Association
Month:
December
Year:
2024
Address:
Canberra, Australia
Editors:
Tim Baldwin, Sergio José Rodríguez Méndez, Nicholas Kuo
Venue:
ALTA
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
118–129
Language:
URL:
https://aclanthology.org/2024.alta-1.9/
DOI:
Bibkey:
Cite (ACL):
Mostafa Didar Mahdi, Thushari Atapattu, and Menasha Thilakaratne. 2024. Advancing Community Directories: Leveraging LLMs for Automated Extraction in MARC Standard Venue Availability Notes. In Proceedings of the 22nd Annual Workshop of the Australasian Language Technology Association, pages 118–129, Canberra, Australia. Association for Computational Linguistics.
Cite (Informal):
Advancing Community Directories: Leveraging LLMs for Automated Extraction in MARC Standard Venue Availability Notes (Mahdi et al., ALTA 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.alta-1.9.pdf