Shohei Higashiyama


2024

pdf bib
Arukikata Travelogue Dataset with Geographic Entity Mention, Coreference, and Link Annotation
Shohei Higashiyama | Hiroki Ouchi | Hiroki Teranishi | Hiroyuki Otomo | Yusuke Ide | Aitaro Yamamoto | Hiroyuki Shindo | Yuki Matsuda | Shoko Wakamiya | Naoya Inoue | Ikuya Yamada | Taro Watanabe
Findings of the Association for Computational Linguistics: EACL 2024

Geoparsing is a fundamental technique for analyzing geo-entity information in text, which is useful for geographic applications, e.g., tourist spot recommendation. We focus on document-level geoparsing that considers geographic relatedness among geo-entity mentions and present a Japanese travelogue dataset designed for training and evaluating document-level geoparsing systems. Our dataset comprises 200 travelogue documents with rich geo-entity information: 12,171 mentions, 6,339 coreference clusters, and 2,551 geo-entities linked to geo-database entries.

pdf bib
Evaluating Language Models in Location Referring Expression Extraction from Early Modern and Contemporary Japanese Texts
Ayuki Katayama | Yusuke Sakai | Shohei Higashiyama | Hiroki Ouchi | Ayano Takeuchi | Ryo Bando | Yuta Hashimoto | Toshinobu Ogiso | Taro Watanabe
Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities

Automatic extraction of geographic information, including Location Referring Expressions (LREs), can aid humanities research in analyzing large collections of historical texts. In this study, to investigate how accurate pretrained Transformer language models (LMs) can extract LREs from historical texts, we evaluate two representative types of LMs, namely, masked language model and causal language model, using early modern and contemporary Japanese datasets. Our experimental results demonstrated the potential of contemporary LMs for historical texts, but also suggest the need for further model enhancement, such as pretraining on historical texts.

pdf bib
Results of the WAT/WMT 2024 Shared Task on Patent Translation
Shohei Higashiyama
Proceedings of the Ninth Conference on Machine Translation

This paper presents the results of the patent translation shared task at the 11th Workshop on Asian Translation and 9th Conference on Machine Translation. Two teams participated in this task, and their submitted translation results for one or more of the six language directions were automatically and manually evaluated. The evaluation results demonstrate the strong performance of large language model-based systems from both participants.

2023

pdf bib
Proceedings of the 10th Workshop on Asian Translation
Toshiaki Nakazawa | Kazutaka Kinugawa | Hideya Mino | Isao Goto | Raj Dabre | Shohei Higashiyama | Shantipriya Parida | Makoto Morishita | Ondrej Bojar | Akiko Eriguchi | Yusuke Oda | Akiko Eriguchi | Chenhui Chu | Sadao Kurohashi
Proceedings of the 10th Workshop on Asian Translation

pdf bib
Overview of the 10th Workshop on Asian Translation
Toshiaki Nakazawa | Kazutaka Kinugawa | Hideya Mino | Isao Goto | Raj Dabre | Shohei Higashiyama | Shantipriya Parida | Makoto Morishita | Ondřej Bojar | Akiko Eriguchi | Yusuke Oda | Chenhui Chu | Sadao Kurohashi
Proceedings of the 10th Workshop on Asian Translation

This paper presents the results of the shared tasks from the 10th workshop on Asian translation (WAT2023). For the WAT2023, 2 teams submitted their translation results for the human evaluation. We also accepted 1 research paper. About 40 translation results were submitted to the automatic evaluation server, and selected submissions were manually evaluated.

2022

pdf bib
Overview of the 9th Workshop on Asian Translation
Toshiaki Nakazawa | Hideya Mino | Isao Goto | Raj Dabre | Shohei Higashiyama | Shantipriya Parida | Anoop Kunchukuttan | Makoto Morishita | Ondřej Bojar | Chenhui Chu | Akiko Eriguchi | Kaori Abe | Yusuke Oda | Sadao Kurohashi
Proceedings of the 9th Workshop on Asian Translation

This paper presents the results of the shared tasks from the 9th workshop on Asian translation (WAT2022). For the WAT2022, 8 teams submitted their translation results for the human evaluation. We also accepted 4 research papers. About 300 translation results were submitted to the automatic evaluation server, and selected submissions were manually evaluated.

pdf bib
A Japanese Corpus of Many Specialized Domains for Word Segmentation and Part-of-Speech Tagging
Shohei Higashiyama | Masao Ideuchi | Masao Utiyama | Yoshiaki Oida | Eiichiro Sumita
Proceedings of the 3rd Workshop on Evaluation and Comparison of NLP Systems

2021

pdf bib
A Text Editing Approach to Joint Japanese Word Segmentation, POS Tagging, and Lexical Normalization
Shohei Higashiyama | Masao Utiyama | Taro Watanabe | Eiichiro Sumita
Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021)

Lexical normalization, in addition to word segmentation and part-of-speech tagging, is a fundamental task for Japanese user-generated text processing. In this paper, we propose a text editing model to solve the three task jointly and methods of pseudo-labeled data generation to overcome the problem of data deficiency. Our experiments showed that the proposed model achieved better normalization performance when trained on more diverse pseudo-labeled data.

pdf bib
User-Generated Text Corpus for Evaluating Japanese Morphological Analysis and Lexical Normalization
Shohei Higashiyama | Masao Utiyama | Taro Watanabe | Eiichiro Sumita
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Morphological analysis (MA) and lexical normalization (LN) are both important tasks for Japanese user-generated text (UGT). To evaluate and compare different MA/LN systems, we have constructed a publicly available Japanese UGT corpus. Our corpus comprises 929 sentences annotated with morphological and normalization information, along with category information we classified for frequent UGT-specific phenomena. Experiments on the corpus demonstrated the low performance of existing MA/LN methods for non-general words and non-standard forms, indicating that the corpus would be a challenging benchmark for further research on UGT.

pdf bib
Proceedings of the 8th Workshop on Asian Translation (WAT2021)
Toshiaki Nakazawa | Hideki Nakayama | Isao Goto | Hideya Mino | Chenchen Ding | Raj Dabre | Anoop Kunchukuttan | Shohei Higashiyama | Hiroshi Manabe | Win Pa Pa | Shantipriya Parida | Ondřej Bojar | Chenhui Chu | Akiko Eriguchi | Kaori Abe | Yusuke Oda | Katsuhito Sudoh | Sadao Kurohashi | Pushpak Bhattacharyya
Proceedings of the 8th Workshop on Asian Translation (WAT2021)

pdf bib
Overview of the 8th Workshop on Asian Translation
Toshiaki Nakazawa | Hideki Nakayama | Chenchen Ding | Raj Dabre | Shohei Higashiyama | Hideya Mino | Isao Goto | Win Pa Pa | Anoop Kunchukuttan | Shantipriya Parida | Ondřej Bojar | Chenhui Chu | Akiko Eriguchi | Kaori Abe | Yusuke Oda | Sadao Kurohashi
Proceedings of the 8th Workshop on Asian Translation (WAT2021)

This paper presents the results of the shared tasks from the 8th workshop on Asian translation (WAT2021). For the WAT2021, 28 teams participated in the shared tasks and 24 teams submitted their translation results for the human evaluation. We also accepted 5 research papers. About 2,100 translation results were submitted to the automatic evaluation server, and selected submissions were manually evaluated.

2020

pdf bib
Overview of the 7th Workshop on Asian Translation
Toshiaki Nakazawa | Hideki Nakayama | Chenchen Ding | Raj Dabre | Shohei Higashiyama | Hideya Mino | Isao Goto | Win Pa Pa | Anoop Kunchukuttan | Shantipriya Parida | Ondřej Bojar | Sadao Kurohashi
Proceedings of the 7th Workshop on Asian Translation

This paper presents the results of the shared tasks from the 7th workshop on Asian translation (WAT2020). For the WAT2020, 20 teams participated in the shared tasks and 14 teams submitted their translation results for the human evaluation. We also received 12 research paper submissions out of which 7 were accepted. About 500 translation results were submitted to the automatic evaluation server, and selected submissions were manually evaluated.

2019

pdf bib
Incorporating Word Attention into Character-Based Word Segmentation
Shohei Higashiyama | Masao Utiyama | Eiichiro Sumita | Masao Ideuchi | Yoshiaki Oida | Yohei Sakamoto | Isaac Okada
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Neural network models have been actively applied to word segmentation, especially Chinese, because of the ability to minimize the effort in feature engineering. Typical segmentation models are categorized as character-based, for conducting exact inference, or word-based, for utilizing word-level information. We propose a character-based model utilizing word information to leverage the advantages of both types of models. Our model learns the importance of multiple candidate words for a character on the basis of an attention mechanism, and makes use of it for segmentation decisions. The experimental results show that our model achieves better performance than the state-of-the-art models on both Japanese and Chinese benchmark datasets.

pdf bib
Overview of the 6th Workshop on Asian Translation
Toshiaki Nakazawa | Nobushige Doi | Shohei Higashiyama | Chenchen Ding | Raj Dabre | Hideya Mino | Isao Goto | Win Pa Pa | Anoop Kunchukuttan | Yusuke Oda | Shantipriya Parida | Ondřej Bojar | Sadao Kurohashi
Proceedings of the 6th Workshop on Asian Translation

This paper presents the results of the shared tasks from the 6th workshop on Asian translation (WAT2019) including Ja↔En, Ja↔Zh scientific paper translation subtasks, Ja↔En, Ja↔Ko, Ja↔En patent translation subtasks, Hi↔En, My↔En, Km↔En, Ta↔En mixed domain subtasks and Ru↔Ja news commentary translation task. For the WAT2019, 25 teams participated in the shared tasks. We also received 10 research paper submissions out of which 61 were accepted. About 400 translation results were submitted to the automatic evaluation server, and selected submis- sions were manually evaluated.

2018

pdf bib
Overview of the 5th Workshop on Asian Translation
Toshiaki Nakazawa | Katsuhito Sudoh | Shohei Higashiyama | Chenchen Ding | Raj Dabre | Hideya Mino | Isao Goto | Win Pa Pa | Anoop Kunchukuttan | Sadao Kurohashi
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation: 5th Workshop on Asian Translation: 5th Workshop on Asian Translation

2017

pdf bib
Overview of the 4th Workshop on Asian Translation
Toshiaki Nakazawa | Shohei Higashiyama | Chenchen Ding | Hideya Mino | Isao Goto | Hideto Kazawa | Yusuke Oda | Graham Neubig | Sadao Kurohashi
Proceedings of the 4th Workshop on Asian Translation (WAT2017)

This paper presents the results of the shared tasks from the 4th workshop on Asian translation (WAT2017) including J↔E, J↔C scientific paper translation subtasks, C↔J, K↔J, E↔J patent translation subtasks, H↔E mixed domain subtasks, J↔E newswire subtasks and J↔E recipe subtasks. For the WAT2017, 12 institutions participated in the shared tasks. About 300 translation results have been submitted to the automatic evaluation server, and selected submissions were manually evaluated.

2013

pdf bib
Developing ML-based Systems to Extract Medical Information from Japanese Medical History Summaries
Shohei Higashiyama | Kazuhiro Seki | Kuniaki Uehara
The First Workshop on Natural Language Processing for Medical and Healthcare Fields