Mostafa Didar Mahdi


2024

pdf bib
Advancing Community Directories: Leveraging LLMs for Automated Extraction in MARC Standard Venue Availability Notes
Mostafa Didar Mahdi | Thushari Atapattu | Menasha Thilakaratne
Proceedings of the 22nd Annual Workshop of the Australasian Language Technology Association

This paper addresses the challenge of efficiently managing and accessing community service information, specifically focusing on venue hire details within the SAcommunity directory. By leveraging Large Language Models (LLMs), particularly the RoBERTa transformer model, we developed an automated system to extract and structure venue availability information according to MARC (Machine-Readable Cataloging) standards. Our approach involved fine-tuning the RoBERTa model on a dataset of community service descriptions, enabling it to identify and categorize key elements such as facility names, capacities, equipment availability, and accessibility features. The model was then applied to process unstructured text data from the SAcommunity database, automatically extracting relevant information and organizing it into standardized fields. The results demonstrate the effectiveness of this method in transforming free-text summaries into structured, MARC-compliant data. This automation not only significantly reduces the time and effort required for data entry and categorization but also enhances the accessibility and usability of community information.