Yoshihiko Hayashi - ACL Anthology

Yoshihiko Hayashi

Also published as: Y. Hayashi

2025

Evaluating LLMs’ Capability to Identify Lexical Semantic Equivalence: Probing with the Word-in-Context Task
Yoshihiko Hayashi
Proceedings of the 31st International Conference on Computational Linguistics

This study proposes a method to evaluate the capability of large language models (LLMs) in identifying lexical semantic equivalence. The Word-in-Context (WiC) task, a benchmark designed to determine whether the meanings of a target word remain identical across different contexts, is employed as a probing task. Experiments are conducted with several LLMs, including proprietary GPT models and open-source models, using zero-shot prompting with adjectives that represent varying levels of semantic equivalence (e.g., “the same”) or inequivalence (e.g., “different”). The fundamental capability to identify lexical semantic equivalence in context is measured using standard accuracy metrics. Consistency across different levels of semantic equivalence is assessed via rank correlation with the expected canonical ranking of precision and recall, reflecting anticipated trends in performance across prompts. The proposed method demonstrates its effectiveness, highlighting the superior capability of GPT-4o, as it consistently outperforms other explored LLMs. Analysis of the WiC dataset, the discriminative properties of adjectives (i.e., their ability to differentiate between levels of semantic equivalence), and linguistic patterns in erroneous cases offer insights into the LLM’s capability and sensitivity. These findings could inform improvements in WiC task performance, although performance enhancement is not the primary focus of this study.

2024

Reassessing Semantic Knowledge Encoded in Large Language Models through the Word-in-Context Task
Yoshihiko Hayashi
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Despite the remarkable recent advancements in large language models (LLMs), a comprehensive understanding of their inner workings and the depth of their knowledge remains elusive. This study aims to reassess the semantic knowledge encoded in LLMs by utilizing the Word-in-Context (WiC) task, which involves predicting the semantic equivalence of a target word across different contexts, as a probing task. To address this challenge, we start by prompting LLMs, specifically GPT-3 and GPT-4, to generate natural language descriptions that contrast the meanings of the target word in two contextual sentences given in the WiC dataset. Subsequently, we conduct a manual analysis to examine their linguistic attributes. In parallel, we train a text classification model that utilizes the generated descriptions as supervision and assesses their practical effectiveness in the WiC task. The linguistic and empirical findings reveal a consistent provision of valid and valuable descriptions by LLMs, with LLM-generated descriptions significantly improving classification accuracy. Notably, the highest classification result achieved with GPT-3-generated descriptions largely surpassed GPT-3’s zero-shot baseline. However, the GPT-4-generated descriptions performed slightly below GPT-4’s zero-shot baseline, suggesting that the full potential of the most advanced large language models, such as GPT-4, is yet to be fully revealed.

2022

Towards the Detection of a Semantic Gap in the Chain of Commonsense Knowledge Triples
Yoshihiko Hayashi
Proceedings of the Thirteenth Language Resources and Evaluation Conference

A commonsense knowledge resource organizes common sense that is not necessarily correct all the time, but most people are expected to know or believe. Such knowledge resources have recently been actively built and utilized in artificial intelligence, particularly natural language processing. In this paper, we discuss an important but not significantly discussed the issue of semantic gaps potentially existing in a commonsense knowledge graph and propose a machine learning-based approach to detect a semantic gap that may inhibit the proper chaining of knowledge triples. In order to establish this line of research, we created a pilot dataset from ConceptNet, in which chains consisting of two adjacent triples are sampled, and the validity of each chain is human-annotated. We also devised a few baseline methods for detecting the semantic gaps and compared them in small-scale experiments. Although the experimental results suggest that the detection of semantic gaps may not be a trivial task, we achieved several insights to further push this research direction, including the potential efficacy of sense embeddings and contextualized word representations enabled by a pre-trained language model.

Evaluating the Effects of Embedding with Speaker Identity Information in Dialogue Summarization
Yuji Naraki | Tetsuya Sakai | Yoshihiko Hayashi
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Automatic dialogue summarization is a task used to succinctly summarize a dialogue transcript while correctly linking the speakers and their speech, which distinguishes this task from a conventional document summarization. To address this issue and reduce the “who said what”-related errors in a summary, we propose embedding the speaker identity information in the input embedding into the dialogue transcript encoder. Unlike the speaker embedding proposed by Gu et al. (2020), our proposal takes into account the informativeness of position embedding. By experimentally comparing several embedding methods, we confirmed that the scores of ROUGE and a human evaluation of the generated summaries were substantially increased by embedding speaker information at the less informative part of the fixed position embedding with sinusoidal functions.

Phrase-Level Localization of Inconsistency Errors in Summarization by Weak Supervision
Masato Takatsuka | Tetsunori Kobayashi | Yoshihiko Hayashi
Proceedings of the 29th International Conference on Computational Linguistics

Although the fluency of automatically generated abstractive summaries has improved significantly with advanced methods, the inconsistency that remains in summarization is recognized as an issue to be addressed. In this study, we propose a methodology for localizing inconsistency errors in summarization. A synthetic dataset that contains a variety of factual errors likely to be produced by a common summarizer is created by applying sentence fusion, compression, and paraphrasing operations. In creating the dataset, we automatically label erroneous phrases and the dependency relations between them as “inconsistent,” which can contribute to detecting errors more adequately than existing models that rely only on dependency arc-level labels. Subsequently, this synthetic dataset is employed as weak supervision to train a model called SumPhrase, which jointly localizes errors in a summary and their corresponding sentences in the source document. The empirical results demonstrate that our SumPhrase model can detect factual errors in summarization more effectively than existing weakly supervised methods owing to the phrase-level labeling. Moreover, the joint identification of error-corresponding original sentences is proven to be effective in improving error detection accuracy.

2020

Word Attribute Prediction Enhanced by Lexical Entailment Tasks
Mika Hasegawa | Tetsunori Kobayashi | Yoshihiko Hayashi
Proceedings of the Twelfth Language Resources and Evaluation Conference

Human semantic knowledge about concepts acquired through perceptual inputs and daily experiences can be expressed as a bundle of attributes. Unlike the conventional distributed word representations that are purely induced from a text corpus, a semantic attribute is associated with a designated dimension in attribute-based vector representations. Thus, semantic attribute vectors can effectively capture the commonalities and differences among concepts. However, as semantic attributes have been generally created by psychological experimental settings involving human annotators, an automatic method to create or extend such resources is highly demanded in terms of language resource development and maintenance. This study proposes a two-stage neural network architecture, Word2Attr, in which initially acquired attribute representations are then fine-tuned by employing supervised lexical entailment tasks. The quantitative empirical results demonstrated that the fine-tuning was indeed effective in improving the performances of semantic/visual similarity/relatedness evaluation tasks. Although the qualitative analysis confirmed that the proposed method could often discover valid but not-yet human-annotated attributes, they also exposed future issues to be worked: we should refine the inventory of semantic attributes that currently relies on an existing dataset.

Exploiting Narrative Context and A Priori Knowledge of Categories in Textual Emotion Classification
Hikari Tanabe | Tetsuji Ogawa | Tetsunori Kobayashi | Yoshihiko Hayashi
Proceedings of the 28th International Conference on Computational Linguistics

Recognition of the mental state of a human character in text is a major challenge in natural language processing. In this study, we investigate the efficacy of the narrative context in recognizing the emotional states of human characters in text and discuss an approach to make use of a priori knowledge regarding the employed emotion category system. Specifically, we experimentally show that the accuracy of emotion classification is substantially increased by encoding the preceding context of the target sentence using a BERT-based text encoder. We also compare ways to incorporate a priori knowledge of emotion categories by altering the loss function used in training, in which our proposal of multi-task learning that jointly learns to classify positive/negative polarity of emotions is included. The experimental results suggest that, when using Plutchik’s Wheel of Emotions, it is better to jointly classify the basic emotion categories with positive/negative polarity rather than directly exploiting its characteristic structure in which eight basic categories are arranged in a wheel.

2019

Towards Answer-unaware Conversational Question Generation
Mao Nakanishi | Tetsunori Kobayashi | Yoshihiko Hayashi
Proceedings of the 2nd Workshop on Machine Reading for Question Answering

Conversational question generation is a novel area of NLP research which has a range of potential applications. This paper is first to presents a framework for conversational question generation that is unaware of the corresponding answers. To properly generate a question coherent to the grounding text and the current conversation history, the proposed framework first locates the focus of a question in the text passage, and then identifies the question pattern that leads the sequential generation of the words in a question. The experiments using the CoQA dataset demonstrate that the quality of generated questions greatly improves if the question foci and the question patterns are correctly identified. In addition, it was shown that the question foci, even estimated with a reasonable accuracy, could contribute to the quality improvement. These results established that our research direction may be promising, but at the same time revealed that the identification of question patterns is a challenging issue, and it has to be largely refined to achieve a better quality in the end-to-end automatic question generation.

2018

Social Image Tags as a Source of Word Embeddings: A Task-oriented Evaluation
Mika Hasegawa | Tetsunori Kobayashi | Yoshihiko Hayashi
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

Answerable or Not: Devising a Dataset for Extending Machine Reading Comprehension
Mao Nakanishi | Tetsunori Kobayashi | Yoshihiko Hayashi
Proceedings of the 27th International Conference on Computational Linguistics

Machine-reading comprehension (MRC) has recently attracted attention in the fields of natural language processing and machine learning. One of the problematic presumptions with current MRC technologies is that each question is assumed to be answerable by looking at a given text passage. However, to realize human-like language comprehension ability, a machine should also be able to distinguish not-answerable questions (NAQs) from answerable questions. To develop this functionality, a dataset incorporating hard-to-detect NAQs is vital; however, its manual construction would be expensive. This paper proposes a dataset creation method that alters an existing MRC dataset, the Stanford Question Answering Dataset, and describes the resulting dataset. The value of this dataset is likely to increase if each NAQ in the dataset is properly classified with the difficulty of identifying it as an NAQ. This difficulty level would allow researchers to evaluate a machine’s NAQ detection performance more precisely. Therefore, we propose a method for automatically assigning difficulty level labels, which measures the similarity between a question and the target text passage. Our NAQ detection experiments demonstrate that the resulting dataset, having difficulty level annotations, is valid and potentially useful in the development of advanced MRC models.

2017

Incorporating visual features into word embeddings: A bimodal autoencoder-based approach
Mika Hasegawa | Tetsunori Kobayashi | Yoshihiko Hayashi
Proceedings of the 12th International Conference on Computational Semantics (IWCS) — Short papers

Classifying Lexical-semantic Relationships by Exploiting Sense/Concept Representations
Kentaro Kanada | Tetsunori Kobayashi | Yoshihiko Hayashi
Proceedings of the 1st Workshop on Sense, Concept and Entity Representations and their Applications

This paper proposes a method for classifying the type of lexical-semantic relation between a given pair of words. Given an inventory of target relationships, this task can be seen as a multi-class classification problem. We train a supervised classifier by assuming: (1) a specific type of lexical-semantic relation between a pair of words would be indicated by a carefully designed set of relation-specific similarities associated with the words; and (2) the similarities could be effectively computed by “sense representations” (sense/concept embeddings). The experimental results show that the proposed method clearly outperforms an existing state-of-the-art method that does not utilize sense/concept embeddings, thereby demonstrating the effectiveness of the sense representations.

2016

A Framework for Cross-lingual/Node-wise Alignment of Lexical-Semantic Resources
Yoshihiko Hayashi
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Given lexical-semantic resources in different languages, it is useful to establish cross-lingual correspondences, preferably with semantic relation labels, between the concept nodes in these resources. This paper presents a framework for enabling a cross-lingual/node-wise alignment of lexical-semantic resources, where cross-lingual correspondence candidates are first discovered and ranked, and then classified by a succeeding module. Indeed, we propose that a two-tier classifier configuration is feasible for the second module: the first classifier filters out possibly irrelevant correspondence candidates and the second classifier assigns a relatively fine-grained semantic relation label to each of the surviving candidates. The results of Japanese-to-English alignment experiments using EDR Electronic Dictionary and Princeton WordNet are described to exemplify the validity of the proposal.

Extending Monolingual Semantic Textual Similarity Task to Multiple Cross-lingual Settings
Yoshihiko Hayashi | Wentao Luo
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper describes our independent effort for extending the monolingual semantic textual similarity (STS) task setting to multiple cross-lingual settings involving English, Japanese, and Chinese. So far, we have adopted a “monolingual similarity after translation” strategy to predict the semantic similarity between a pair of sentences in different languages. With this strategy, a monolingual similarity method is applied after having (one of) the target sentences translated into a pivot language. Therefore, this paper specifically details the required and developed resources to implement this framework, while presenting our current results for English-Japanese-Chinese cross-lingual STS tasks that may exemplify the validity of the framework.

Predicting the Evocation Relation between Lexicalized Concepts
Yoshihiko Hayashi
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Evocation is a directed yet weighted semantic relationship between lexicalized concepts. Although evocation relations are considered potentially useful in several semantic NLP tasks, the prediction of the evocation relation between an arbitrary pair of concepts remains difficult, since evocation relationships cover a broader range of semantic relations rooted in human perception and experience. This paper presents a supervised learning approach to predict the strength (by regression) and to determine the directionality (by classification) of the evocation relation that might hold between a pair of lexicalized concepts. Empirical results that were obtained by investigating useful features are shown, indicating that a combination of the proposed features largely outperformed individual baselines, and also suggesting that semantic relational vectors computed from existing semantic vectors for lexicalized concepts were indeed effective for both the prediction of strength and the determination of directionality.

2014

Web-imageability of the Behavioral Features of Basic-level Concepts
Yoshihiko Hayashi
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The recent research direction toward multimodal semantic representation would be further advanced, if we could have a machinery to collect adequate images from the Web, given a target concept. With this motivation, this paper particularly investigates into the Web imageabilities of the behavioral features (e.g. beaver builds dams) of a basic-level concept (beaver). The term Web-imageability denotes how adequately the images acquired from the Web deliver the intended meaning of a complex concept. The primary contributions made in this paper are twofold: (1) beaver building dams-type queries can better yield relevant Web images, suggesting that the present participle form (-ing form) of a verb (building), as a query component, is more effective than the base form; (2) the behaviors taken by animate beings are likely to be more depicted on the Web, particularly if the behaviors are, in a sense, inherent to animate beings (e.g.,motion, consumption), while the creation-type behaviors of inanimate beings are not. The paper further analyzes linguistic annotations that were independently given to some of the images, and discusses an aspect of the semantic gap between image and language.

2013

Migrating Psycholinguistic Semantic Feature Norms into Linked Data in Linguistics
Yoshihiko Hayashi
Proceedings of the 2nd Workshop on Linked Data in Linguistics (LDL-2013): Representing and linking lexicons, terminologies and other language data

2012

Classifying Standard Linguistic Processing Functionalities based on Fundamental Data Operation Types
Yoshihiko Hayashi | Chiharu Narawa
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

iIt is often argued that a set of standard linguistic processing functionalities should be identified,with each of them given a formal specification. We would benefit from the formal specifications; for example, the semi-automated composition of a complex language processing workflow could be enabled in due time. This paper extracts a standard set of linguistic processing functionalities and tries to classify them formally. To do this, we first investigated prominent types of language Web services/linguistic processors by surveying a Web-based language service infrastructure and published NLP toolkits. We next induced a set of standard linguistic processing functionalities by carefully investigating each of the linguistic processor types. The standard linguistic processing functionalities was then characterized by the input/output data types, as well as the required data operation types, which were also derived from the investigation. As a result, we came up with an ontological depiction that classifies linguistic processors and linguistic processing functionalities with respect to the fundamental data operation types. We argue that such an ontological depiction can explicitly describe the functional aspects of a linguistic processing functionality.

2011

Prospects for an Ontology-Grounded Language Service Infrastructure
Yoshihiko Hayashi
Proceedings of the Workshop on Language Resources, Technology and Services in the Sharing Paradigm

A Representation Framework for Cross-lingual/Interlingual Lexical Semantic Correspondences
Yoshihiko Hayashi
Proceedings of the Ninth International Conference on Computational Semantics (IWCS 2011)

2010

An LMF-based Web Service for Accessing WordNet-type Semantic Lexicons
Bora Savas | Yoshihiko Hayashi | Monica Monachini | Claudia Soria | Nicoletta Calzolari
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper describes a Web service for accessing WordNet-type semantic lexicons. The central idea behind the service design is: given a query, the primary functionality of lexicon access is to present a partial lexicon by extracting the relevant part of the target lexicon. Based on this idea, we implemented the system as a RESTful Web service whose input query is specified by the access URI and whose output is presented in a standardized XML data format. LMF, an ISO standard for modeling lexicons, plays the most prominent role: the access URI pattern basically reflects the lexicon structure as defined by LMF; the access results are rendered based on Wordnet-LMF, which is a version of LMF XML-serialization. The Web service currently provides accesses to Princeton WordNet, Japanese WordNet, as well as the EDR Electronic Dictionary as a trial. To accommodate the EDR dictionary within the same framework, we modeled it also as a WordNet-type semantic lexicon. This paper thus argues possible alternatives to model innately bilingual/multilingual lexicons like EDR with LMF, and proposes possible revisions to Wordnet-LMF.

LAF/GrAF-grounded Representation of Dependency Structures
Yoshihiko Hayashi | Thierry Declerck | Chiharu Narawa
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper shows that a LAF/GrAF-based annotation schema can be used for the adequate representation of syntactic dependency structures possibly in many languages. We first argue that there are at least two types of textual units that can be annotated with dependency information: words/tokens and chunks/phrases. We especially focus on importance of the latter dependency unit: it is particularly useful for representing Japanese dependency structures, known as Kakari-Uke structure. Based on this consideration, we then discuss a sub-typing of GrAF to represent the corresponding dependency structures. We derive three node types, two edge types, and the associated constraints for properly representing both the token-based and the chunk-based dependency structures. We finally propose a wrapper program that, as a proof of concept, converts output data from different dependency parsers in proprietary XML formats to the GrAF-compliant XML representation. It partially proves the value of an international standard like LAF/GrAF in the Web service context: an existing dependency parser can be, in a sense, standardized, once wrapped by a data format conversion process.

2008

Ontologizing Lexicon Access Functions based on an LMF-based Lexicon Taxonomy
Yoshihiko Hayashi | Chiharu Narawa | Monica Monachini | Claudia Soria | Nicoletta Calzolari
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper discusses ontologization of lexicon access functions in the context of a service-oriented language infrastructure, such as the Language Grid. In such a language infrastructure, an access function to a lexical resource, embodied as an atomic Web service, plays a crucially important role in composing a composite Web service tailored to a users specific requirement. To facilitate the composition process involving service discovery, planning and invocation, the language infrastructure should be ontology-based; hence the ontologization of a range of lexicon functions is highly required. In a service-oriented environment, lexical resources however can be classified from a service-oriented perspective rather than from a lexicographically motivated standard. Hence to address the issue of interoperability, the taxonomy for lexical resources should be ground to principled and shared lexicon ontology. To do this, we have ontologized the standardized lexicon modeling framework LMF, and utilized it as a foundation to stipulate the service-oriented lexicon taxonomy and the corresponding ontology for lexicon access functions. This paper also examines a possible solution to fill the gap between the ontological descriptions and the actual Web service API by adopting a W3C recommendation SAWSDL, with which Web service descriptions can be linked with the domain ontology.

SriShell Primo: A Predictive Sinhala Text Input System
Sandeva Goonetilleke | Yoshihiko Hayashi | Yuichi Itoh | Fumio Kishino
Proceedings of the IJCNLP-08 Workshop on NLP for Less Privileged Languages

2007

A Linguistic Service Ontology for Language Infrastructures
Yoshihiko Hayashi
Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions

2006

A Dictionary Model for Unifying Machine Readable Dictionaries and Computational Concept Lexicons
Yoshihiko Hayashi | Toru Ishida
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

The Language Grid, recently proposed by one of the authors, is a language infrastructure available on the Internet. It aims to resolve the problems of accessibility and usability inherent in the currently available language services. The infrastructure will accommodate an operational environment in which a user and/or a software agent can develop a language service that is tailored to specific requirements derived from the various situations of intercultural communication. In order to effectively operate the infrastructure, each atomic language service has to be discovered by the planner of a composite service and incorporated into the composite service scenario. Meta-description of an atomic service is crucial to accomplish the planning process. This paper focuses on dictionary access services and proposes an abstract dictionary model that is vital for the accurate meta-description of such a service. In principle, the proposed model is based on the organization compatible with Princeton WordNet. Computational lexicons, including the EDR dictionary, as well as a range of human monolingual/bilingual dictionaries are uniformly organized into a WordNet-like lexical concept system. A modeling example with a few dictionary instances demonstrates the fundamental validity of the model.

1999

A scalable cross-language metasearch architecture for multilingual information access on the Web
Yoshihiko Hayashi | Genichiro Kikui | Toshiaki Iwadera
Proceedings of Machine Translation Summit VII

This position paper for the special session on "Multilingual Information Access" comprises of three parts. The first part reviews possible demands for Multilingual Information Access (hereafter, MLIA) on the Web, and examines required technical elements. Among those, we, in the second part, focus on Cross-Language Information Retrieval (hereafter, CLIR), particularly a scalable architecture which enables CLIR in a number of language combinations. Such a distributed architecture developed around XIRCH project (an international joint experimental project currently involves NTT, KRDL, and KAIST) is then described in a certain detail. The final part discusses some NLP/MT related issues associated with such a CLIR architecture.

1992

A Three-level Revision Model for Improving Japanese Bad-styled Expressions
Yoshihiko Hayashi
COLING 1992 Volume 2: The 14th International Conference on Computational Linguistics

1982

Japanese Sentence Analysis System Essay - Evaluation of Dictionary Derived From Real Text Data
K. Shirai | J. Kubota | Y. Hayashi
Coling 1982 Abstracts: Proceedings of the Ninth International Conference on Computational Linguistics Abstracts

Venues