Yo Sato


2024

pdf bib
Disambiguating Homographs and Homophones Simultaneously: A Regrouping Method for Japanese
Yo Sato
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

We present a method that re-groups surface forms into clusters representing synonyms, and help disambiguate homographs as well as homophone. The method is applied post-hoc to trained contextual word embeddings. It is beneficial to languages where both homographs and homophones abound, which compromise the efficiency of language model and causes the underestimation problem in evaluation. Taking Japanese as an example, we evaluate how accurate such disambiguation can be, and how much the underestimation can be mitigated.

2020

pdf bib
Homonym normalisation by word sense clustering: a case in Japanese
Yo Sato | Kevin Heffernan
Proceedings of the 28th International Conference on Computational Linguistics

This work presents a method of word sense clustering that differentiates homonyms and merge homophones, taking Japanese as an example, where orthographical variation causes problem for language processing. It uses contextualised embeddings (BERT) to cluster tokens into distinct sense groups, and we use these groups to normalise synonymous instances to a single representative form. We see the benefit of this normalisation in language model, as well as in transliteration.

pdf bib
Dialect Clustering with Character-Based Metrics: in Search of the Boundary of Language and Dialect
Yo Sato | Kevin Heffernan
Proceedings of the Twelfth Language Resources and Evaluation Conference

We present in this work a universal, character-based method for representing sentences so that one can thereby calculate the distance between any two sentence pair. With a small alphabet, it can function as a proxy of phonemes, and as one of its main uses, we carry out dialect clustering: cluster a dialect/sub-language mixed corpus into sub-groups and see if they coincide with the conventional boundaries of dialects and sub-languages. By using data with multiple Japanese dialects and multiple Slavic languages, we report how well each group clusters, in a manner to partially respond to the question of what separates languages from dialects.

2018

pdf bib
Creating dialect sub-corpora by clustering: a case in Japanese for an adaptive method
Yo Sato | Kevin Heffernan
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2009

pdf bib
Incrementality, Speaker-Hearer Switching and the Disambiguation Challenge
Ruth Kempson | Eleni Gregoromichelaki | Yo Sato
Proceedings of SRSL 2009, the 2nd Workshop on Semantic Representation of Spoken Language

pdf bib
Dialogue Modelling and the Remit of Core Grammar
Eleni Gregoromichelaki | Yo Sato | Ruth Kempson | Andrew Gargett | Christine Howes
Proceedings of the Eight International Conference on Computational Semantics

2008

pdf bib
Lexicalised Parsing of German V2
Yo Sato
Proceedings of the Workshop on Parsing German

pdf bib
Parser Evaluation Across Frameworks without Format Conversion
Wai Lok Tam | Yo Sato | Yusuke Miyao | Junichi Tsujii
Coling 2008: Proceedings of the workshop on Cross-Framework and Cross-Domain Parser Evaluation

2006

pdf bib
Lexicalising Word Order Constraints for Implemented Linearisation Grammar
Yo Sato
Student Research Workshop