T-VEC: A Telecom-Specific Vectorization Model with Enhanced Semantic Understanding via Deep Triplet Loss Fine-Tuning

Vignesh Ethiraj; Ashwath D; Sidhanth Menon; Divya Vijay; Vidhyakshaya Kannan

doi:10.18653/v1/2025.emnlp-industry.168

T-VEC: A Telecom-Specific Vectorization Model with Enhanced Semantic Understanding via Deep Triplet Loss Fine-Tuning

Vignesh Ethiraj, Ashwath D, Sidhanth Menon, Divya Vijay, Vidhyakshaya Kannan

Abstract

The specialized vocabulary and nuanced concepts of the telecommunications industry pose persistent challenges for standard Natural Language Processing (NLP) models. Generic embedding models often struggle to represent telecom-specific semantics, limiting their utility in retrieval and downstream tasks. We present T-VEC (Telecom Vectorization Model), a domain-adapted embedding model fine-tuned from the gte-Qwen2-1.5B-instruct backbone using a triplet loss objective. Fine-tuning was performed on T-Embed, a high-quality, large-scale dataset covering diverse telecom concepts, standards, and operational scenarios. Although T-Embed contains some proprietary material and cannot be fully released, we open source 75% of the dataset to support continued research in domain-specific representation learning. On a custom benchmark comprising 1500 query-passage pairs from IETF RFCs and vendor manuals, T-VEC surpasses MPNet, BGE, Jina and E5, demonstrating superior domain grounding and semantic precision in telecom-specific retrieval. Embedding visualizations further showcase tight clustering of telecom-relevant concepts. We release T-VEC and its tokenizer to support semantically faithful NLP applications within the telecom domain.

Anthology ID:: 2025.emnlp-industry.168
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:: November
Year:: 2025
Address:: Suzhou (China)
Editors:: Saloni Potdar, Lina Rojas-Barahona, Sebastien Montella
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2449–2460
Language:
URL:: https://aclanthology.org/2025.emnlp-industry.168/
DOI:: 10.18653/v1/2025.emnlp-industry.168
Bibkey:
Cite (ACL):: Vignesh Ethiraj, Ashwath D, Sidhanth Menon, Divya Vijay, and Vidhyakshaya Kannan. 2025. T-VEC: A Telecom-Specific Vectorization Model with Enhanced Semantic Understanding via Deep Triplet Loss Fine-Tuning. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 2449–2460, Suzhou (China). Association for Computational Linguistics.
Cite (Informal):: T-VEC: A Telecom-Specific Vectorization Model with Enhanced Semantic Understanding via Deep Triplet Loss Fine-Tuning (Ethiraj et al., EMNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.emnlp-industry.168.pdf

PDF Cite Search Fix data