Simplified Chinese Character Distance Based on Ideographic Description Sequences

Yixia Wang, Emmanuel Keuleers


Abstract
Character encoding systems have long overlooked the internal structure of characters. Ideographic Description Sequences, which explicitly represent spatial relations between character components, are a potential solution to this problem. In this paper, we illustrate the utility of Ideographic Description Sequences in computing edit distance and finding orthographic neighbors for Simplified Chinese characters. In addition, we explore the possibility of using Ideographic Description Sequences to encode spatial relations between components in other scripts.
Anthology ID:
2024.cawl-1.8
Volume:
Proceedings of the Second Workshop on Computation and Written Language (CAWL) @ LREC-COLING 2024
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Kyle Gorman, Emily Prud'hommeaux, Brian Roark, Richard Sproat
Venues:
CAWL | WS
SIG:
SIGWrit
Publisher:
ELRA and ICCL
Note:
Pages:
59–66
Language:
URL:
https://aclanthology.org/2024.cawl-1.8
DOI:
Bibkey:
Cite (ACL):
Yixia Wang and Emmanuel Keuleers. 2024. Simplified Chinese Character Distance Based on Ideographic Description Sequences. In Proceedings of the Second Workshop on Computation and Written Language (CAWL) @ LREC-COLING 2024, pages 59–66, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Simplified Chinese Character Distance Based on Ideographic Description Sequences (Wang & Keuleers, CAWL-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.cawl-1.8.pdf