CoLeM: A framework for semantic interpretation of Russian-language tables based on contrastive learning

Kirill Tobola; Nikita Dorodnykh

doi:10.18653/v1/2025.acl-srw.52

CoLeM: A framework for semantic interpretation of Russian-language tables based on contrastive learning

Abstract

Tables are extensively utilized to represent and store data, however, they often lack explicit semantics necessary for machine interpretation of their contents. Semantic table interpretation is essential for integrating structured data with knowledge graphs, yet existing methods face challenges with Russian-language tables due to limited labeled data and linguistic peculiarities. This paper introduces a contrastive learning approach to minimize reliance on manual labeling and enhance the accuracy of column annotation for rare semantic types. The proposed method adapts contrastive learning for tabular data through augmentations and employs a distilled multilingual BERT model trained on the unlabeled RWT corpus (comprising 7.4 million columns). The resulting table representations are incorporated into the RuTaBERT pipeline, reducing computational overhead. Experimental results demonstrate a micro-F1 score of 97% and a macro-F1 score of 92%, surpassing several baseline approaches. These findings emphasize the efficiency of the proposed method in addressing data sparsity and handling unique features of the Russian language. The results further confirm that contrastive learning effectively captures semantic similarities among columns without explicit supervision, which is particularly vital for rare data types.

Anthology ID:: 2025.acl-srw.52
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Jin Zhao, Mingyang Wang, Zhu Liu
Venues:: ACL | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 784–794
Language:
URL:: https://aclanthology.org/2025.acl-srw.52/
DOI:: 10.18653/v1/2025.acl-srw.52
Bibkey:
Cite (ACL):: Kirill Tobola and Nikita Dorodnykh. 2025. CoLeM: A framework for semantic interpretation of Russian-language tables based on contrastive learning. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop), pages 784–794, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: CoLeM: A framework for semantic interpretation of Russian-language tables based on contrastive learning (Tobola & Dorodnykh, ACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.acl-srw.52.pdf

PDF Cite Search Fix data