AI4Culture: Towards Multilingual Access for Cultural Heritage Data

Tom Vanallemeersch, Sara Szoc, Laurens Meeus


Abstract
The AI4Culture project (2023-2025), funded by the European Commission, and involving a 12-partner consortium led by the National Technical University of Athens, develops a platform serving as an online capacity building hub for AI technologies in the cultural heritage (CH) sector, enabling multilingual access to CH data. It offers access to AI-related resources, including openly labelled datasets for model training and testing, deployable and reusable tools, and capacity building materials. The tools are aimed at optical character recognition (OCR) for printed and handwritten documents, subtitle generation and validation, machine translation (MT), and metadata enrichment via image information extraction and semantic linking. The project also customises these tools to enhance interface and component usability. We illustrate this with technology that corrects OCR output using language models and adapts it for MT.
Anthology ID:
2024.eamt-2.30
Volume:
Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 2)
Month:
June
Year:
2024
Address:
Sheffield, UK
Editors:
Carolina Scarton, Charlotte Prescott, Chris Bayliss, Chris Oakley, Joanna Wright, Stuart Wrigley, Xingyi Song, Edward Gow-Smith, Mikel Forcada, Helena Moniz
Venue:
EAMT
SIG:
Publisher:
European Association for Machine Translation (EAMT)
Note:
Pages:
59–60
Language:
URL:
https://aclanthology.org/2024.eamt-2.30
DOI:
Bibkey:
Cite (ACL):
Tom Vanallemeersch, Sara Szoc, and Laurens Meeus. 2024. AI4Culture: Towards Multilingual Access for Cultural Heritage Data. In Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 2), pages 59–60, Sheffield, UK. European Association for Machine Translation (EAMT).
Cite (Informal):
AI4Culture: Towards Multilingual Access for Cultural Heritage Data (Vanallemeersch et al., EAMT 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.eamt-2.30.pdf