Homophone2Vec: Embedding Space Analysis for Empirical Evaluation of Phonological and Semantic Similarity

Sophie Wu, Anita Zheng, Joey Chuang


Abstract
This paper introduces a novel method for empirically evaluating the relationship between the phonological and semantic similarity of linguistic units using embedding spaces. Chinese character homophones are used as a proof-of-concept. We employ cosine similarity as a proxy for semantic similarity between characters, and compare relationships between phonologically-related characters and baseline characters (chosen as similar-frequency characters). We show there is a strongly statistically significant positive semantic relationship among different Chinese characters at varying levels of sound-sharing. We also perform some basic probing using t-SNE and UMAP visualizations, and indicate directions for future applications of this method.
Anthology ID:
2024.luhme-srw.34
Volume:
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Xiyan Fu, Eve Fleisig
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
287–292
Language:
URL:
https://aclanthology.org/2024.luhme-srw.34/
DOI:
10.18653/v1/2024.acl-srw.34
Bibkey:
Cite (ACL):
Sophie Wu, Anita Zheng, and Joey Chuang. 2024. Homophone2Vec: Embedding Space Analysis for Empirical Evaluation of Phonological and Semantic Similarity. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop), pages 287–292, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
Homophone2Vec: Embedding Space Analysis for Empirical Evaluation of Phonological and Semantic Similarity (Wu et al., ACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.acl-srw.34.pdf