Multi-Property Multi-Label Documents Metadata Recommendation based on Encoder Embeddings

Nasredine Cheniki, Vidas Daudaravicius, Abdelfettah Feliachi, Didier Hardy, Marc Wilhelm Küster


Abstract
The task of document classification, particularly multi-label classification, presents a significant challenge due to the complexity of assigning multiple relevant labels to each document. This complexity is further amplified in multi-property multi-label classification tasks, where documents must be categorized across various sets of labels. In this research, we introduce an innovative encoder embedding-driven approach to multi-property multi-label document classification that leverages semantic-text similarity and the reuse of pre-existing annotated data to enhance the efficiency and accuracy of the document annotation process. Our method requires only a single model for text similarity, eliminating the need for multiple property-specific classifiers and thereby reducing computational demands and simplifying deployment. We evaluate our approach through a prototype deployed for daily operations, which demonstrates superior performance over existing classification systems. Our contributions include improved accuracy without additional training, increased efficiency, and demonstrated effectiveness in practical applications. The results of our study indicate the potential of our approach to be applied across various domains requiring multi-property multi-label document classification, offering a scalable and adaptable solution for metadata annotation tasks.
Anthology ID:
2024.nllp-1.19
Volume:
Proceedings of the Natural Legal Language Processing Workshop 2024
Month:
November
Year:
2024
Address:
Miami, FL, USA
Editors:
Nikolaos Aletras, Ilias Chalkidis, Leslie Barrett, Cătălina Goanță, Daniel Preoțiuc-Pietro, Gerasimos Spanakis
Venue:
NLLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
233–242
Language:
URL:
https://aclanthology.org/2024.nllp-1.19
DOI:
Bibkey:
Cite (ACL):
Nasredine Cheniki, Vidas Daudaravicius, Abdelfettah Feliachi, Didier Hardy, and Marc Wilhelm Küster. 2024. Multi-Property Multi-Label Documents Metadata Recommendation based on Encoder Embeddings. In Proceedings of the Natural Legal Language Processing Workshop 2024, pages 233–242, Miami, FL, USA. Association for Computational Linguistics.
Cite (Informal):
Multi-Property Multi-Label Documents Metadata Recommendation based on Encoder Embeddings (Cheniki et al., NLLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.nllp-1.19.pdf