Kiyoaki Shirai


2024

pdf bib
Learning Contextualized Box Embeddings with Prototypical Networks
Kohei Oda | Kiyoaki Shirai | Natthawut Kertkeidkachorn
Proceedings of the 9th Workshop on Representation Learning for NLP (RepL4NLP-2024)

This paper proposes ProtoBox, a novel method to learn contextualized box embeddings. Unlike an ordinary word embedding, which represents a word as a single vector, a box embedding represents the meaning of a word as a box in a high-dimensional space: that is suitable for representing semantic relations between words. In addition, our method aims to obtain a “contextualized” box embedding, which is an abstract representation of a word in a specific context. ProtoBox is based on Prototypical Networks, which is a robust method for classification problems, especially focusing on learning the hypernym–hyponym relation between senses. ProtoBox is evaluated on three tasks: Word Sense Disambiguation (WSD), New Sense Classification (NSC), and Hypernym Identification (HI). Experimental results show that ProtoBox outperforms baselines for the HI task and is comparable for the WSD and NSC tasks.

2023

pdf bib
Text Generation Model Enhanced with Semantic Information in Aspect Category Sentiment Analysis
Tu Tran | Kiyoaki Shirai | Natthawut Kertkeidkachorn
Findings of the Association for Computational Linguistics: ACL 2023

Aspect Category Sentiment Analysis (ACSA) is one of the main subtasks of sentiment analysis, which aims at predicting polarity over a given aspect category. Recently, generative methods emerge as an efficient way to utilize a pre-trained language model for solving ACSA. However, those methods fail to model relations of target words and opinion words in a sentence including multiple aspects. To tackle this problem, this paper proposes a method to incorporate Abstract Meaning Representation (AMR), which describes semantic representation of a sentence as a directed graph, into a text generation model. Furthermore, two regularizers are designed to guide cross attention weights allocation over AMR graphs. One is the identical regularizer that constrains attention weights of aligned nodes, the other is the entropy regularizer that helps the decoder generate tokens by heavily considering only a few related nodes in the AMR graph. Experimental results on three datasets show that the proposed method outperforms state-of-the-art methods, proving the effectiveness of our model.

pdf bib
Sentiment Analysis using the Relationship between Users and Products
Natthawut Kertkeidkachorn | Kiyoaki Shirai
Findings of the Association for Computational Linguistics: ACL 2023

In product reviews, user and product aspects are useful in sentiment analysis. Nevertheless, previous studies mainly focus on modeling user and product aspects without considering the relationship between users and products. The relationship between users and products is typically helpful in estimating the bias of a user toward a product. In this paper, we, therefore, introduce the Graph Neural Network-based model with the pre-trained Language Model (GNNLM), where the relationship between users and products is incorporated. We conducted experiments on three well-known benchmarks for sentiment classification with the user and product information. The experimental results show that the relationship between users and products improves the performance of sentiment analysis. Furthermore, GNNLM achieves state-of-the-art results on yelp-2013 and yelp-2014 datasets.

pdf bib
Discovering Highly Influential Shortcut Reasoning: An Automated Template-Free Approach
Daichi Haraguchi | Kiyoaki Shirai | Naoya Inoue | Natthawut Kertkeidkachorn
Findings of the Association for Computational Linguistics: EMNLP 2023

Shortcut reasoning is an irrational process of inference, which degrades the robustness of an NLP model. While a number of previous work has tackled the identification of shortcut reasoning, there are still two major limitations: (i) a method for quantifying the severity of the discovered shortcut reasoning is not provided; (ii) certain types of shortcut reasoning may be missed. To address these issues, we propose a novel method for identifying shortcut reasoning. The proposed method quantifies the severity of the shortcut reasoning by leveraging out-of-distribution data and does not make any assumptions about the type of tokens triggering the shortcut reasoning. Our experiments on Natural Language Inference and Sentiment Analysis demonstrate that our framework successfully discovers known and unknown shortcut reasoning in the previous work.

pdf bib
Enhancing Translation of Myanmar Sign Language by Transfer Learning and Self-Training
Hlaing Myat Nwe | Kiyoaki Shirai | Natthawut Kertkeidkachorn | Thanaruk Theeramunkong | Ye Kyaw Thu | Thepchai Supnithi | Natsuda Kaothanthong
Proceedings of Machine Translation Summit XIX, Vol. 1: Research Track

This paper proposes a method to develop a machine translation (MT) system from Myanmar Sign Language (MSL) to Myanmar Written Language (MWL) and vice versa for the deaf community. Translation of MSL is a difficult task since only a small amount of a parallel corpus between MSL and MWL is available. To address the challenge for MT of the low-resource language, transfer learning is applied. An MT model is trained first for a high-resource language pair, American Sign Language (ASL) and English, then it is used as an initial model to train an MT model between MSL and MWL. The mT5 model is used as a base MT model in this transfer learning. Additionally, a self-training technique is applied to generate synthetic translation pairs of MSL and MWL from a large monolingual MWL corpus. Furthermore, since the segmentation of a sentence is required as preprocessing of MT for the Myanmar language, several segmentation schemes are empirically compared. Results of experiments show that both transfer learning and self-training can enhance the performance of the translation between MSL and MWL compared with a baseline model fine-tuned from a small MSL-MWL parallel corpus only.

pdf bib
Coherent Story Generation with Structured Knowledge
Congda Ma | Kotaro Funakoshi | Kiyoaki Shirai | Manabu Okumura
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing

The emergence of pre-trained language models has taken story generation, which is the task of automatically generating a comprehensible story from limited information, to a new stage. Although generated stories from the language models are fluent and grammatically correct, the lack of coherence affects their quality. We propose a knowledge-based multi-stage model that incorporates the schema, a kind of structured knowledge, to guide coherent story generation. Our framework includes a schema acquisition module, a plot generation module, and a surface realization module. In the schema acquisition module, high-relevant structured knowledge pieces are selected as a schema. In the plot generation module, a coherent plot plan is navigated by the schema. In the surface realization module, conditioned by the generated plot, a story is generated. Evaluations show that our methods can generate more comprehensible stories than strong baselines, especially with higher global coherence and less repetition.

2022

pdf bib
Automatic Construction of an Annotated Corpus with Implicit Aspects
Aye Aye Mar | Kiyoaki Shirai
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Aspect-based sentiment analysis (ABSA) is a task that involves classifying the polarity of aspects of the products or services described in users’ reviews. Most previous work on ABSA has focused on explicit aspects, which appear as explicit words or phrases in the sentences of the review. However, users often express their opinions toward the aspects indirectly or implicitly, in which case the specific name of an aspect does not appear in the review. The current datasets used for ABSA are mainly annotated with explicit aspects. This paper proposes a novel method for constructing a corpus that is automatically annotated with implicit aspects. The main idea is that sentences containing explicit and implicit aspects share a similar context. First, labeled sentences with explicit aspects and unlabeled sentences that include implicit aspects are collected. Next, clustering is performed on these sentences so that similar sentences are merged into the same cluster. Finally, the explicit aspects are propagated to the unlabeled sentences in the same cluster, in order to construct a labeled dataset containing implicit aspects. The results of our experiments on mobile phone reviews show that our method of identifying the labels of implicit aspects achieves a maximum accuracy of 82%.

2018

pdf bib
JAIST Annotated Corpus of Free Conversation
Kiyoaki Shirai | Tomotaka Fukuoka
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2016

pdf bib
Recurrent Neural Network with Word Embedding for Complaint Classification
Panuwat Assawinjaipetch | Kiyoaki Shirai | Virach Sornlertlamvanich | Sanparith Marukata
Proceedings of the Third International Workshop on Worldwide Language Service Infrastructure and Second Workshop on Open Infrastructures and Analysis Frameworks for Human Language Technologies (WLSI/OIAF4HLT2016)

Complaint classification aims at using information to deliver greater insights to enhance user experience after purchasing the products or services. Categorized information can help us quickly collect emerging problems in order to provide a support needed. Indeed, the response to the complaint without the delay will grant users highest satisfaction. In this paper, we aim to deliver a novel approach which can clarify the complaints precisely with the aim to classify each complaint into nine predefined classes i.e. acces-sibility, company brand, competitors, facilities, process, product feature, staff quality, timing respec-tively and others. Given the idea that one word usually conveys ambiguity and it has to be interpreted by its context, the word embedding technique is used to provide word features while applying deep learning techniques for classifying a type of complaints. The dataset we use contains 8,439 complaints of one company.

2015

pdf bib
PhraseRNN: Phrase Recursive Neural Network for Aspect-based Sentiment Analysis
Thien Hai Nguyen | Kiyoaki Shirai
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Topic Modeling based Sentiment Analysis on Social Media for Stock Market Prediction
Thien Hai Nguyen | Kiyoaki Shirai
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf bib
Sentiment Analyzer with Rich Features for Ironic and Sarcastic Tweets
Piyoros Tungthamthiti | Enrico Santus | Hongzhi Xu | Chu-Ren Huang | Kiyoaki Shirai
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation

pdf bib
Identification of Sympathy in Free Conversation
Tomotaka Fukuoka | Kiyoaki Shirai
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation: Posters

2014

pdf bib
Sentiment Lexicon Interpolation and Polarity Estimation of Objective and Out-Of-Vocabulary Words to Improve Sentiment Classification on Microblogging
Yongyos Kaewpitakkun | Kiyoaki Shirai | Masnizah Mohd
Proceedings of the 28th Pacific Asia Conference on Language, Information and Computing

pdf bib
Recognition of Sarcasms in Tweets Based on Concept Level Sentiment Analysis and Supervised Learning Approaches
Piyoros Tungthamthiti | Kiyoaki Shirai | Masnizah Mohd
Proceedings of the 28th Pacific Asia Conference on Language, Information and Computing

2010

pdf bib
SemEval-2010 Task: Japanese WSD
Manabu Okumura | Kiyoaki Shirai | Kanako Komiya | Hikaru Yokono
Proceedings of the 5th International Workshop on Semantic Evaluation

pdf bib
JAIST: Clustering and Classification Based Approaches for Japanese WSD
Kiyoaki Shirai | Makoto Nakamura
Proceedings of the 5th International Workshop on Semantic Evaluation

2009

pdf bib
Query Expansion using LMF-Compliant Lexical Resources
Takenobu Tokunaga | Dain Kaplan | Nicoletta Calzolari | Monica Monachini | Claudia Soria | Virach Sornlertlamvanich | Thatsanee Charoenporn | Yingju Xia | Chu-Ren Huang | Shu-Kai Hsieh | Kiyoaki Shirai
Proceedings of the 7th Workshop on Asian Language Resources (ALR7)

2008

pdf bib
Constructing Taxonomy of Numerative Classifiers for Asian Languages
Kiyoaki Shirai | Takenobu Tokunaga | Chu-Ren Huang | Shu-Kai Hsieh | Tzu-Yi Kuo | Virach Sornlertlamvanich | Thatsanee Charoenporn
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I

pdf bib
Adapting International Standard for Asian Language Technologies
Takenobu Tokunaga | Dain Kaplan | Chu-Ren Huang | Shu-Kai Hsieh | Nicoletta Calzolari | Monica Monachini | Claudia Soria | Kiyoaki Shirai | Virach Sornlertlamvanich | Thatsanee Charoenporn | YingJu Xia
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Corpus-based approaches and statistical approaches have been the main stream of natural language processing research for the past two decades. Language resources play a key role in such approaches, but there is an insufficient amount of language resources in many Asian languages. In this situation, standardisation of language resources would be of great help in developing resources in new languages. This paper presents the latest development efforts of our project which aims at creating a common standard for Asian language resources that is compatible with an international standard. In particular, the paper focuses on i) lexical specification and data categories relevant for building multilingual lexical resources for Asian languages; ii) a core upper-layer ontology needed for ensuring multilingual interoperability and iii) the evaluation platform used to test the entire architectural framework.

2006

pdf bib
Compiling a Lexicon of Cooking Actions for Animation Generation
Kiyoaki Shirai | Hiroshi Ookawa
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

pdf bib
Infrastructure for Standardization of Asian Language Resources
Takenobu Tokunaga | Virach Sornlertlamvanich | Thatsanee Charoenporn | Nicoletta Calzolari | Monica Monachini | Claudia Soria | Chu-Ren Huang | YingJu Xia | Hao Yu | Laurent Prevot | Kiyoaki Shirai
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

2004

pdf bib
Learning a Robust Word Sense Disambiguation Model using Hypernyms in Definition Sentences
Kiyoaki Shirai | Tsunekazu Yagi
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

2002

pdf bib
Construction of a Word Sense Tagged Corpus for SENSEVAL-2 Japanese Dictionary Task
Kiyoaki Shirai
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

2001

pdf bib
Decision lists for determining adjective dependency in Japanese
Taiichi Hashimoto | Kosuke Nishidate | Kiyoaki Shirai | Takenobu Tokunaga | Hozumi Tanaka
Proceedings of Machine Translation Summit VIII

In Japanese constructions of the form [N1 no Adj N2], the adjective Adj modifies either N1 or N2. Determing the semantic dependencies of adjective in such phrase is an important task for machine translation. This paper describes a method for determining the adjective dependency in such constructions using decision lists, and inducing decision lists from training contexts with correct semantic dependencies and without. Based on evaluation, our method is able to determine adjective dependency with an precision of about 94%. We further analyze rules in the induced decision lists and examine effective features to determine the semantic dependencies of adjectives.

pdf bib
SENSEVAL-2 Japanese Dictionary Task
Kiyoaki Shirai
Proceedings of SENSEVAL-2 Second International Workshop on Evaluating Word Sense Disambiguation Systems

2000

pdf bib
Semi-automatic Construction of a Tree-annotated Corpus Using an Iterative Learning Statistical Language Model
Kiyoaki Shirai | Hozumi Tanaka | Takenobu Tokunaga
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

1998

pdf bib
An Empirical Evaluation on Statistical Parsing of Japanese Sentences Using Lexical Association Statistics
Kiyoaki Shirai | Kentaro Inui | Takenobu Tokunaga | Hozumi Tanaka
Proceedings of the Third Conference on Empirical Methods for Natural Language Processing