Variance Matters: Detecting Semantic Differences without Corpus/Word Alignment

Ryo Nagata, Hiroya Takamura, Naoki Otani, Yoshifumi Kawasaki


Abstract
In this paper, we propose methods for discovering semantic differences in words appearing in two corpora. The key idea is to measure the coverage of meanings of a word in a corpus through the norm of its mean word vector, which is equivalent to examining a kind of variance of the word vector distribution. The proposed methods do not require alignments between words and/or corpora for comparison that previous methods do. All they require are to compute variance (or norms of mean word vectors) for each word type. Nevertheless, they rival the best-performing system in the SemEval-2020 Task 1. In addition, they are (i) robust for the skew in corpus sizes; (ii) capable of detecting semantic differences in infrequent words; and (iii) effective in pinpointing word instances that have a meaning missing in one of the two corpora under comparison. We show these advantages for historical corpora and also for native/non-native English corpora.
Anthology ID:
2023.emnlp-main.965
Volume:
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
15609–15622
Language:
URL:
https://aclanthology.org/2023.emnlp-main.965
DOI:
10.18653/v1/2023.emnlp-main.965
Bibkey:
Cite (ACL):
Ryo Nagata, Hiroya Takamura, Naoki Otani, and Yoshifumi Kawasaki. 2023. Variance Matters: Detecting Semantic Differences without Corpus/Word Alignment. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 15609–15622, Singapore. Association for Computational Linguistics.
Cite (Informal):
Variance Matters: Detecting Semantic Differences without Corpus/Word Alignment (Nagata et al., EMNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.emnlp-main.965.pdf
Video:
 https://aclanthology.org/2023.emnlp-main.965.mp4