Naoki Yoshinaga - ACL Anthology

Naoki Yoshinaga

2026

CLICKER: Cross-Lingual Knowledge Editing via In-Context Learning with Adaptive Stepwise Reasoning
Zehui Jiang | Xin Zhao | Yuta Kumadaki | Naoki Yoshinaga
Findings of the Association for Computational Linguistics: EACL 2026

As large language models (LLMs) are increasingly deployed as multilingual services, keeping their factual knowledge accurate across languages has become both essential and challenging. However, most of the existing knowledge editing (KE) methods are static, in that they update parameters offline for given accumulated edits of knowledge, and are struggling to effectively propagate edits in one language to others, while avoiding side effects. To mitigate this issue, we propose **CLICKER**, a KE method with stepwise reasoning that dynamically retrieves only knowledge relevant to a given query and then edit, while maintaining cross-lingual consistency through: (1) relevance-aware knowledge retrieval, (2) on-demand in-context KE, and (3) language alignment of the outputs. To rigorously evaluate the locality of edits in cross-lingual KE, we develop **Multi-CounterFact** dataset that contain many semantically-similar but irrelevant prompts for the edit. Experiments on Multi-CounterFact and MzsRE with both open- and closed-source LLMs confirmed that CLICKER effectively localizes edits and resolves cross-lingual inconsistencies, outperforming dynamic KE baselines.

Is He Extroverted? Identifying Missing Relevant Personas for Faithful User Simulation
Weiwen Su | Yuhan Zhou | Zihan Wang | Naoki Yoshinaga | Masashi Toyoda
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 4: Student Research Workshop)

Existing user simulation approaches focus on generating user-like responses in dialogue. They often assume that the provided persona is sufficient for producing such responses, without verifying whether critical personas are supplied. This raises concerns about the validity of simulation results.To address this issue, we study the task of identifying persona dimensions (e.g., ”whether the user is price-sensitive”) that are relevant but missing in simulating a user’s reply for a given dialogue context.We introduce PICQ-drama (constructed from TVShowGuess), a benchmark of context-aware choice questions, annotated with missing persona dimensions whose absence leads to ambiguous user choices. We further design diverse evaluation criteria for missing persona identification.Benchmarking leading LLMs on our PICQ-drama dataset demonstrates the feasibility of this task. Evaluation across diverse criteria, along with further analyses, reveals cognitive differences between LLMs and humans and highlights the distinct roles of different persona categories in shaping responses.

Tracing Multilingual Knowledge Acquisition Dynamics in Domain Adaptation: A Case Study of Biomedical Adaptation
Xin Zhao | Naoki Yoshinaga | Yuma Tsuta | Akiko Aizawa
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

Multilingual domain adaptation (ML-DA) enables large language models (LLMs) to acquire domain knowledge across languages. Despite many methods, how domain knowledge is acquired within a language and transferred across languages remains, leading to suboptimal performance, particularly in low-resource settings.This work examines the learning dynamics of LLMs during ML-DA. Because prior ML-DA studies often train and evaluate on datasets with mismatched knowledge coverage, we propose AdaXEval, an adaptive evaluation method that constructs multiple-choice QA datasets from the same bilingual domain corpus used for training, thereby enabling direct analysis of multilingual knowledge acquisition.Through continual training of LLMs with diverse data recipes, we track how LLMs acquire domain facts and pinpoint the loss shielding mechanism behind the knowledge memorization and generalization in domain adaptation. Our experiments on multilingual LLMs reveal that cross-lingual transfer remains challenging.The code is released.

2025

Commentary Generation from Multimodal Game Data for Esports Moments in Multiplayer Strategy Games
Zihan Wang | Naoki Yoshinaga
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics

Esports is a competitive sport in which highly skilled players face off in fast-paced video games. Matches consist of intense, moment-by-moment plays that require exceptional technique and strategy. These moments often involve complex interactions, including team fights, positioning, or strategic decisions, which are difficult to interpret without expert explanation. In this study, we set up the task of generating commentary for a specific game moment from multimodal game data consisting of a gameplay screenshot and structured JSON data. Specifically, we construct the first large-scale tri-modal dataset for League of Legends, one of the most popular multiplayer strategy esports titles, and then design evaluation criteria for the task. Using this dataset, we evaluate various large vision language models in generating commentary for a specific moment. We will release the scripts to reconstruct our dataset.

A-TASC: Asian TED-Based Automatic Subtitling Corpus
Yuhan Zhou | Naoki Yoshinaga
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Subtitles play a crucial role in improving the accessibility of the vast amount of audiovisual content available on the Internet, allowing audiences worldwide to comprehend and engage with this content in various languages. Automatic subtitling (AS) systems are essential for alleviating the substantial workload of human transcribers and translators. However, existing AS corpora and the primary metric SubER focus on European languages. This paper introduces A-TASC, an Asian TED-based automatic subtitling corpus derived from English TED Talks, comprising nearly 800 hours of audio segments, aligned English transcripts, and subtitles in Chinese, Japanese, Korean, and Vietnamese. We then present SacreSubER, a modification of SubER, to enable the reliable evaluation of subtitle quality for languages without explicit word boundaries. Experimental results, using both end-to-end systems and pipeline approaches built on strong ASR and LLM components, validate the quality of the proposed corpus and reveal differences in AS performance between European and Asian languages. The code to build our corpus is released.

Neuron Empirical Gradient: Discovering and Quantifying Neurons’ Global Linear Controllability
Xin Zhao | Zehui Jiang | Naoki Yoshinaga
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

While feed-forward neurons in pre-trained language models (PLMs) can encode knowledge, past research targeted a small subset of neurons that heavily influence outputs.This leaves the broader role of neuron activations unclear, limiting progress in areas like knowledge editing.We uncover a global linear relationship between neuron activations and outputs using neuron interventions on a knowledge probing dataset.The gradient of this linear relationship, which we call the **neuron empirical gradient (NEG)**, captures how changes in activations affect predictions.To compute NEG efficiently, we propose **NeurGrad**, enabling large-scale analysis of neuron behavior in PLMs.We also show that NEG effectively captures language skills across diverse prompts through skill neuron probing. Experiments on **MCEval8k**, a multi-genre multiple-choice knowledge benchmark, support NEG’s ability to represent model knowledge. Further analysis highlights the key properties of NEG-based skill representation: efficiency, robustness, flexibility, and interdependency.Code and data are released.

2024

Commentary Generation from Data Records of Multiplayer Strategy Esports Game
Zihan Wang | Naoki Yoshinaga
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop)

Esports, a sports competition on video games, has become one of the most important sporting events. Although esports play logs have been accumulated, only a small portion of them accompany text commentaries for the audience to retrieve and understand the plays. In this study, we therefore introduce the task of generating game commentaries from esports’ data records. We first build large-scale esports data-to-text datasets that pair structured data and commentaries from a popular esports game, League of Legends. We then evaluate Transformer-based models to generate game commentaries from structured data records, while examining the impact of the pre-trained language models. Evaluation results on our dataset revealed the challenges of this novel task. We will release our dataset to boost potential research in the data-to-text generation community.

Further Compressing Distilled Language Models via Frequency-aware Partial Sparse Coding of Embeddings
Kohki Tamura | Naoki Yoshinaga | Masato Neishi
Proceedings of the 28th Conference on Computational Natural Language Learning

Although pre-trained language models (PLMs) are effective for natural language understanding (NLU) tasks, they demand a huge computational resource, thus preventing us from deploying them on edge devices. Researchers have therefore applied compression techniques for neural networks, such as pruning, quantization, and knowledge distillation, to the PLMs. Although these generic techniques can reduce the number of internal parameters of hidden layers in the PLMs, the embedding layers tied to the tokenizer arehard to compress, occupying a non-negligible portion of the compressed model. In this study, aiming to further compress PLMs reduced by the generic techniques, we exploit frequency-aware sparse coding to compress the embedding layers of the PLMs fine-tuned to downstream tasks. To minimize the impact of the compression on the accuracy, we retain the embeddings of common tokens as they are and use them to reconstruct embeddings of rare tokens by locally linear mapping. Experimental results on the GLUE and JGLUE benchmarks for language understanding in English and Japanese confirmed that our method can further compress the fine-tuned DistilBERT models models while maintaining accuracy.

What Matters in Memorizing and Recalling Facts? Multifaceted Benchmarks for Knowledge Probing in Language Models
Xin Zhao | Naoki Yoshinaga | Daisuke Oba
Findings of the Association for Computational Linguistics: EMNLP 2024

Language models often struggle with handling factual knowledge, exhibiting factual hallucination issue. This makes it vital to evaluate the models’ ability to recall its parametric knowledge about facts. In this study, we introduce a knowledge probing benchmark, BELIEF(ICL), to evaluate the knowledge recall ability of both encoder- and decoder-based pre-trained language models (PLMs) from diverse perspectives. BELIEFs utilize a multi-prompt dataset to evaluate PLM’s accuracy, consistency, and reliability in factual knowledge recall. To enable a more reliable evaluation with BELIEFs, we semi-automatically create MyriadLAMA, which has massively diverse prompts. We validate the effectiveness of BELIEFs in comprehensively evaluating PLM’s knowledge recall ability on diverse PLMs, including recent large language models (LLMs). We then investigate key factors in memorizing and recalling facts in PLMs, such as model size, pretraining strategy and corpora, instruction-tuning process and in-context learning settings. Finally, we reveal the limitation of the prompt-based knowledge probing. The MyriadLAMA is publicized.

Tracing the Roots of Facts in Multilingual Language Models: Independent, Shared, and Transferred Knowledge
Xin Zhao | Naoki Yoshinaga | Daisuke Oba
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

Acquiring factual knowledge for language models (LMs) in low-resource languages poses a serious challenge, thus resorting to cross-lingual transfer in multilingual LMs (ML-LMs). In this study, we ask how ML-LMs acquire and represent factual knowledge. Using the multilingual factual knowledge probing dataset, mLAMA, we first conducted a neuron investigation of ML-LMs (specifically, multilingual BERT). We then traced the roots of facts back to the knowledge source (Wikipedia) to identify the ways in which ML-LMs acquire specific facts. We finally identified three patterns of acquiring and representing facts in ML-LMs: language-independent, cross-lingual shared and transferred, and devised methods for differentiating them. Our findings highlight the challenge of maintaining consistent factual knowledge across languages, underscoring the need for better fact representation learning in ML-LMs.

2023

Sparse Neural Retrieval Model for Efficient Cross-Domain Retrieval-Based Question Answering
Kosuke Nishida | Naoki Yoshinaga | Kyosuke Nishida
Proceedings of the 37th Pacific Asia Conference on Language, Information and Computation

Does Named Entity Recognition Truly Not Scale Up to Real-world Product Attribute Extraction?
Wei-Te Chen | Keiji Shinzato | Naoki Yoshinaga | Yandi Xia
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track

The key challenge in the attribute-value extraction (AVE) task from e-commerce sites is the scalability to diverse attributes for a large number of products in real-world e-commerce sites. To make AVE scalable to diverse attributes, recent researchers adopted a question-answering (QA)-based approach that additionally inputs the target attribute as a query to extract its values, and confirmed its advantage over a classical approach based on named-entity recognition (NER) on real-word e-commerce datasets. In this study, we argue the scalability of the NER-based approach compared to the QA-based approach, since researchers have compared BERT-based QA-based models to only a weak BiLSTM-based NER baseline trained from scratch in terms of only accuracy on datasets designed to evaluate the QA-based approach. Experimental results using a publicly available real-word dataset revealed that, under a fair setting, BERT-based NER models rival BERT-based QA models in terms of the accuracy, and their inference is faster than the QA model that processes the same product text several times to handle multiple target attributes.

Early Discovery of Disappearing Entities in Microblogs
Satoshi Akasaki | Naoki Yoshinaga | Masashi Toyoda
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We make decisions by reacting to changes in the real world, particularly the emergence and disappearance of impermanent entities such as restaurants, services, and events. Because we want to avoid missing out on opportunities or making fruitless actions after those entities have disappeared, it is important to know when entities disappear as early as possible. We thus tackle the task of detecting disappearing entities from microblogs where various information is shared timely. The major challenge is detecting uncertain contexts of disappearing entities from noisy microblog posts. To collect such disappearing contexts, we design time-sensitive distant supervision, which utilizes entities from the knowledge base and time-series posts. Using this method, we actually build large-scale Twitter datasets of disappearing entities. To ensure robust detection in noisy environments, we refine pretrained word embeddings for the detection model on microblog streams in a timely manner. Experimental results on the Twitter datasets confirmed the effectiveness of the collected labeled data and refined word embeddings; the proposed method outperformed a baseline in terms of accuracy, and more than 70% of the detected disappearing entities in Wikipedia are discovered earlier than the update on Wikipedia, with the average lead-time is over one month.

Summarization-based Data Augmentation for Document Classification
Yueguan Wang | Naoki Yoshinaga
Proceedings of the 4th New Frontiers in Summarization Workshop

Despite the prevalence of pretrained language models in natural language understanding tasks, understanding lengthy text such as document is still challenging due to the data sparseness problem. Inspired by that humans develop their ability of understanding lengthy text form reading shorter text, we propose a simple yet effective summarization-based data augmentation, SUMMaug, for document classification. We first obtain easy-to-learn examples for the target document classification task by summarizing the input of the original training examples, while optionally merging the original labels to conform to the summarized input. We then use the generated pseudo examples to perform curriculum learning. Experimental results on two datasets confirmed the advantage of our method compared to existing baseline methods in terms of robustness and accuracy. We release our code and data at https://github.com/etsurin/summaug.

Rethinking Response Evaluation from Interlocutor’s Eye for Open-Domain Dialogue Systems
Yuma Tsuta | Naoki Yoshinaga | Shoetsu Sato | Masashi Toyoda
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics: Student Research Workshop

A Unified Generative Approach to Product Attribute-Value Identification
Keiji Shinzato | Naoki Yoshinaga | Yandi Xia | Wei-Te Chen
Findings of the Association for Computational Linguistics: ACL 2023

Product attribute-value identification (PAVI) has been studied to link products on e-commerce sites with their attribute values (e.g., ⟨Material, Cotton⟩) using product text as clues. Technical demands from real-world e-commerce platforms require PAVI methods to handle unseen values, multi-attribute values, and canonicalized values, which are only partly addressed in existing extraction- and classification-based approaches. Motivated by this, we explore a generative approach to the PAVI task. We finetune a pre-trained generative model, T5, to decode a set of attribute-value pairs as a target sequence from the given product text. Since the attribute value pairs are unordered set elements, how to linearize them will matter; we, thus, explore methods of composing an attribute-value pair and ordering the pairs for the task. Experimental results confirm that our generation-based approach outperforms the existing extraction and classification-based methods on large-scale real-world datasets meant for those methods.

Back to Patterns: Efficient Japanese Morphological Analysis with Feature-Sequence Trie
Naoki Yoshinaga
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Accurate neural models are much less efficient than non-neural models and are useless for processing billions of social media posts or handling user queries in real time with a limited budget. This study revisits the fastest pattern-based NLP methods to make them as accurate as possible, thus yielding a strikingly simple yet surprisingly accurate morphological analyzer for Japanese. The proposed method induces reliable patterns from a morphological dictionary and annotated data. Experimental results on two standard datasets confirm that the method exhibits comparable accuracy to learning-based baselines, while boasting a remarkable throughput of over 1,000,000 sentences per second on a single modern CPU. The source code is available at https://www.tkl.iis.u-tokyo.ac.jp/ynaga/jagger/

Self-Adaptive Named Entity Recognition by Retrieving Unstructured Knowledge
Kosuke Nishida | Naoki Yoshinaga | Kyosuke Nishida
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

Although named entity recognition (NER) helps us to extract domain-specific entities from text (e.g., artists in the music domain), it is costly to create a large amount of training data or a structured knowledge base to perform accurate NER in the target domain. Here, we propose self-adaptive NER, which retrieves external knowledge from unstructured text to learn the usages of entities that have not been learned well. To retrieve useful knowledge for NER, we design an effective two-stage model that retrieves unstructured knowledge using uncertain entities as queries. Our model predicts the entities in the input and then finds those of which the prediction is not confident. Then, it retrieves knowledge by using these uncertain entities as queries and concatenates the retrieved text to the original input to revise the prediction. Experiments on CrossNER datasets demonstrated that our model outperforms strong baselines by 2.35 points in F1 metric.

2022

Entity Embedding Completion for Wide-Coverage Entity Disambiguation
Daisuke Oba | Ikuya Yamada | Naoki Yoshinaga | Masashi Toyoda
Findings of the Association for Computational Linguistics: EMNLP 2022

Entity disambiguation (ED) is typically solved by learning to classify a given mention into one of the entities in the model’s entity vocabulary by referring to their embeddings. However, this approach cannot address mentions of entities that are not covered by the entity vocabulary. Aiming to enhance the applicability of ED models, we propose a method of extending a state-of-the-art ED model by dynamically computing embeddings of out-of-vocabulary entities. Specifically, our method computes embeddings from entity descriptions and mention contexts. Experiments with standard benchmark datasets show that the extended model performs comparable to or better than existing models whose entity embeddings are trained for all candidate entities as well as embedding-free models. We release our source code and model checkpoints at https://github.com/studio-ousia/steel.

Building Large-Scale Japanese Pronunciation-Annotated Corpora for Reading Heteronymous Logograms
Fumikazu Sato | Naoki Yoshinaga | Masaru Kitsuregawa
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Although screen readers enable visually impaired people to read written text via speech, the ambiguities in pronunciations of heteronyms cause wrong reading, which has a serious impact on the text understanding. Especially in Japanese, there are many common heteronyms expressed by logograms (Chinese characters or kanji) that have totally different pronunciations (and meanings). In this study, to improve the accuracy of pronunciation prediction, we construct two large-scale Japanese corpora that annotate kanji characters with their pronunciations. Using existing language resources on i) book titles compiled by the National Diet Library and ii) the books in a Japanese digital library called Aozora Bunko and their Braille translations, we develop two large-scale pronunciation-annotated corpora for training pronunciation prediction models. We first extract sentence-level alignments between the Aozora Bunko text and its pronunciation converted from the Braille data. We then perform dictionary-based pattern matching based on morphological dictionaries to find word-level pronunciation alignments. We have ultimately obtained the Book Title corpus with 336M characters (16.4M book titles) and the Aozora Bunko corpus with 52M characters (1.6M sentences). We analyzed pronunciation distributions for 203 common heteronyms, and trained a BERT-based pronunciation prediction model for 93 heteronyms, which achieved an average accuracy of 0.939.

Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product Attribute Extraction
Keiji Shinzato | Naoki Yoshinaga | Yandi Xia | Wei-Te Chen
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

A key challenge in attribute value extraction (AVE) from e-commerce sites is how to handle a large number of attributes for diverse products. Although this challenge is partially addressed by a question answering (QA) approach which finds a value in product data for a given query (attribute), it does not work effectively for rare and ambiguous queries. We thus propose simple knowledge-driven query expansion based on possible answers (values) of a query (attribute) for QA-based AVE. We retrieve values of a query (attribute) from the training data to expand the query. We train a model with two tricks, knowledge dropout and knowledge token mixing, which mimic the imperfection of the value knowledge in testing. Experimental results on our cleaned version of AliExpress dataset show that our method improves the performance of AVE (+6.08 macro F1), especially for rare and ambiguous attributes (+7.82 and +6.86 macro F1, respectively).

2021

Fine-grained Typing of Emerging Entities in Microblogs
Satoshi Akasaki | Naoki Yoshinaga | Masashi Toyoda
Findings of the Association for Computational Linguistics: EMNLP 2021

Analyzing microblogs where we post what we experience enables us to perform various applications such as social-trend analysis and entity recommendation. To track emerging trends in a variety of areas, we want to categorize information on emerging entities (e.g., Avatar 2) in microblog posts according to their types (e.g., Film). We thus introduce a new entity typing task that assigns a fine-grained type to each emerging entity when a burst of posts containing that entity is first observed in a microblog. The challenge is to perform typing from noisy microblog posts without relying on prior knowledge of the target entity. To tackle this task, we build large-scale Twitter datasets for English and Japanese using time-sensitive distant supervision. We then propose a modular neural typing model that encodes not only the entity and its contexts but also meta information in multiple posts. To type ‘homographic’ emerging entities (e.g., ‘Go’ means an emerging programming language and a classic board game), which contexts are noisy, we devise a context selector that finds related contexts of the target entity. Experiments on the Twitter datasets confirm the effectiveness of our typing model and the context selector.

Exploratory Model Analysis Using Data-Driven Neuron Representations
Daisuke Oba | Naoki Yoshinaga | Masashi Toyoda
Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP

Probing classifiers have been extensively used to inspect whether a model component captures specific linguistic phenomena. This top-down approach is, however, costly when we have no probable hypothesis on the association between the target model component and phenomena. In this study, aiming to provide a flexible, exploratory analysis of a neural model at various levels ranging from individual neurons to the model as a whole, we present a bottom-up approach to inspect the target neural model by using neuron representations obtained from a massive corpus of text. We first feed massive amount of text to the target model and collect sentences that strongly activate each neuron. We then abstract the collected sentences to obtain neuron representations that help us interpret the corresponding neurons; we augment the sentences with linguistic annotations (e.g., part-of-speech tags) and various metadata (e.g., topic and sentiment), and apply pattern mining and clustering techniques to the augmented sentences. We demonstrate the utility of our method by inspecting the pre-trained BERT. Our exploratory analysis reveals that i) specific phrases and domains of text are captured by individual neurons in BERT, ii) a group of neurons simultaneously capture the same linguistic phenomena, and iii) deeper-level layers capture more specific linguistic phenomena.

Context-aware Decoder for Neural Machine Translation using a Target-side Document-Level Language Model
Amane Sugiyama | Naoki Yoshinaga
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Although many end-to-end context-aware neural machine translation models have been proposed to incorporate inter-sentential contexts in translation, these models can be trained only in domains where parallel documents with sentential alignments exist. We therefore present a simple method to perform context-aware decoding with any pre-trained sentence-level translation model by using a document-level language model. Our context-aware decoder is built upon sentence-level parallel data and target-side document-level monolingual data. From a theoretical viewpoint, our core contribution is the novel representation of contextual information using point-wise mutual information between context and the current sentence. We demonstrate the effectiveness of our method on English to Russian translation, by evaluating with BLEU and contrastive tests for context-aware translation.

Speculative Sampling in Variational Autoencoders for Dialogue Response Generation
Shoetsu Sato | Naoki Yoshinaga | Masashi Toyoda | Masaru Kitsuregawa
Findings of the Association for Computational Linguistics: EMNLP 2021

Variational autoencoders have been studied as a promising approach to model one-to-many mappings from context to response in chat response generation. However, they often fail to learn proper mappings. One of the reasons for this failure is the discrepancy between a response and a latent variable sampled from an approximated distribution in training. Inappropriately sampled latent variables hinder models from constructing a modulated latent space. As a result, the models stop handling uncertainty in conversations. To resolve that, we propose speculative sampling of latent variables. Our method chooses the most probable one from redundantly sampled latent variables for tying up the variable with a given response. We confirm the efficacy of our method in response generation with massive dialogue data constructed from Twitter posts.

2020

uBLEU: Uncertainty-Aware Automatic Evaluation Method for Open-Domain Dialogue Systems
Yuma Tsuta | Naoki Yoshinaga | Masashi Toyoda
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

Because open-domain dialogues allow diverse responses, basic reference-based metrics such as BLEU do not work well unless we prepare a massive reference set of high-quality responses for input utterances. To reduce this burden, a human-aided, uncertainty-aware metric, ΔBLEU, has been proposed; it embeds human judgment on the quality of reference outputs into the computation of multiple-reference BLEU. In this study, we instead propose a fully automatic, uncertainty-aware evaluation method for open-domain dialogue systems, υBLEU. This method first collects diverse reference responses from massive dialogue data and then annotates their quality judgments by using a neural network trained on automatically collected training data. Experimental results on massive Twitter data confirmed that υBLEU is comparable to ΔBLEU in terms of its correlation with human judgment and that the state of the art automatic evaluation method, RUBER, is improved by integrating υBLEU.

Vocabulary Adaptation for Domain Adaptation in Neural Machine Translation
Shoetsu Sato | Jin Sakuma | Naoki Yoshinaga | Masashi Toyoda | Masaru Kitsuregawa
Findings of the Association for Computational Linguistics: EMNLP 2020

Neural network methods exhibit strong performance only in a few resource-rich domains. Practitioners therefore employ domain adaptation from resource-rich domains that are, in most cases, distant from the target domain. Domain adaptation between distant domains (e.g., movie subtitles and research papers), however, cannot be performed effectively due to mismatches in vocabulary; it will encounter many domain-specific words (e.g., “angstrom”) and words whose meanings shift across domains (e.g., “conductor”). In this study, aiming to solve these vocabulary mismatches in domain adaptation for neural machine translation (NMT), we propose vocabulary adaptation, a simple method for effective fine-tuning that adapts embedding layers in a given pretrained NMT model to the target domain. Prior to fine-tuning, our method replaces the embedding layers of the NMT model by projecting general word embeddings induced from monolingual data in a target domain onto a source-domain embedding space. Experimental results indicate that our method improves the performance of conventional fine-tuning by 3.86 and 3.28 BLEU points in En-Ja and De-En translation, respectively.

Robust Backed-off Estimation of Out-of-Vocabulary Embeddings
Nobukazu Fukuda | Naoki Yoshinaga | Masaru Kitsuregawa
Findings of the Association for Computational Linguistics: EMNLP 2020

Out-of-vocabulary (oov) words cause serious troubles in solving natural language tasks with a neural network. Existing approaches to this problem resort to using subwords, which are shorter and more ambiguous units than words, in order to represent oov words with a bag of subwords. In this study, inspired by the processes for creating words from known words, we propose a robust method of estimating oov word embeddings by referring to pre-trained word embeddings for known words with similar surfaces to target oov words. We collect known words by segmenting oov words and by approximate string matching, and we then aggregate their pre-trained embeddings. Experimental results show that the obtained oov word embeddings improve not only word similarity tasks but also downstream tasks in Twitter and biomedical domains where oov words often appear, even when the computed oov embeddings are integrated into a bert-based strong baseline.

2019

Data augmentation using back-translation for context-aware neural machine translation
Amane Sugiyama | Naoki Yoshinaga
Proceedings of the Fourth Workshop on Discourse in Machine Translation (DiscoMT 2019)

A single sentence does not always convey information that is enough to translate it into other languages. Some target languages need to add or specialize words that are omitted or ambiguous in the source languages (e.g, zero pronouns in translating Japanese to English or epicene pronouns in translating English to French). To translate such ambiguous sentences, we need contexts beyond a single sentence, and have so far explored context-aware neural machine translation (NMT). However, a large amount of parallel corpora is not easily available to train accurate context-aware NMT models. In this study, we first obtain large-scale pseudo parallel corpora by back-translating monolingual data, and then investigate its impact on the translation accuracy of context-aware NMT models. We evaluated context-aware NMT models trained with small parallel corpora and the large-scale pseudo parallel corpora on English-Japanese and English-French datasets to demonstrate the large impact of the data augmentation for context-aware NMT models.

Multilingual Model Using Cross-Task Embedding Projection
Jin Sakuma | Naoki Yoshinaga
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

We present a method for applying a neural network trained on one (resource-rich) language for a given task to other (resource-poor) languages. We accomplish this by inducing a mapping from pre-trained cross-lingual word embeddings to the embedding layer of the neural network trained on the resource-rich language. To perform element-wise cross-task embedding projection, we invent locally linear mapping which assumes and preserves the local topology across the semantic spaces before and after the projection. Experimental results on topic classification task and sentiment analysis task showed that the fully task-specific multilingual model obtained using our method outperformed the existing multilingual models with embedding layers fixed to pre-trained cross-lingual word embeddings.

Learning to Describe Unknown Phrases with Local and Global Contexts
Shonosuke Ishiwatari | Hiroaki Hayashi | Naoki Yoshinaga | Graham Neubig | Shoetsu Sato | Masashi Toyoda | Masaru Kitsuregawa
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

When reading a text, it is common to become stuck on unfamiliar words and phrases, such as polysemous words with novel senses, rarely used idioms, internet slang, or emerging entities. If we humans cannot figure out the meaning of those expressions from the immediate local context, we consult dictionaries for definitions or search documents or the web to find other global context to help in interpretation. Can machines help us do this work? Which type of context is more important for machines to solve the problem? To answer these questions, we undertake a task of describing a given phrase in natural language based on its local and global contexts. To solve this task, we propose a neural description model that consists of two context encoders and a description decoder. In contrast to the existing methods for non-standard English explanation [Ni+ 2017] and definition generation [Noraset+ 2017; Gadetsky+ 2018], our model appropriately takes important clues from both local and global contexts. Experimental results on three existing datasets (including WordNet, Oxford and Urban Dictionaries) and a dataset newly created from Wikipedia demonstrate the effectiveness of our method over previous work.

Modeling Personal Biases in Language Use by Inducing Personalized Word Embeddings
Daisuke Oba | Naoki Yoshinaga | Shoetsu Sato | Satoshi Akasaki | Masashi Toyoda
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

There exist biases in individual’s language use; the same word (e.g., cool) is used for expressing different meanings (e.g., temperature range) or different words (e.g., cloudy, hazy) are used for describing the same meaning. In this study, we propose a method of modeling such personal biases in word meanings (hereafter, semantic variations) with personalized word embeddings obtained by solving a task on subjective text while regarding words used by different individuals as different words. To prevent personalized word embeddings from being contaminated by other irrelevant biases, we solve a task of identifying a review-target (objective output) from a given review. To stabilize the training of this extreme multi-class classification, we perform a multi-task learning with metadata identification. Experimental results with reviews retrieved from RateBeer confirmed that the obtained personalized word embeddings improved the accuracy of sentiment analysis as well as the target task. Analysis of the obtained personalized word embeddings revealed trends in semantic variations related to frequent and adjective words.

On the Relation between Position Information and Sentence Length in Neural Machine Translation
Masato Neishi | Naoki Yoshinaga
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

Long sentences have been one of the major challenges in neural machine translation (NMT). Although some approaches such as the attention mechanism have partially remedied the problem, we found that the current standard NMT model, Transformer, has difficulty in translating long sentences compared to the former standard, Recurrent Neural Network (RNN)-based model. One of the key differences of these NMT models is how the model handles position information which is essential to process sequential data. In this study, we focus on the position information type of NMT models, and hypothesize that relative position is better than absolute position. To examine the hypothesis, we propose RNN-Transformer which replaces positional encoding layer of Transformer by RNN, and then compare RNN-based model and four variants of Transformer. Experiments on ASPEC English-to-Japanese and WMT2014 English-to-German translation tasks demonstrate that relative position helps translating sentences longer than those in the training data. Further experiments on length-controlled training data reveal that absolute position actually causes overfitting to the sentence length.

2017

Chunk-based Decoder for Neural Machine Translation
Shonosuke Ishiwatari | Jingtao Yao | Shujie Liu | Mu Li | Ming Zhou | Naoki Yoshinaga | Masaru Kitsuregawa | Weijia Jia
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Chunks (or phrases) once played a pivotal role in machine translation. By using a chunk rather than a word as the basic translation unit, local (intra-chunk) and global (inter-chunk) word orders and dependencies can be easily modeled. The chunk structure, despite its importance, has not been considered in the decoders used for neural machine translation (NMT). In this paper, we propose chunk-based decoders for (NMT), each of which consists of a chunk-level decoder and a word-level decoder. The chunk-level decoder models global dependencies while the word-level decoder decides the local word order in a chunk. To output a target sentence, the chunk-level decoder generates a chunk representation containing global information, which the word-level decoder then uses as a basis to predict the words inside the chunk. Experimental results show that our proposed decoders can significantly improve translation performance in a WAT ‘16 English-to-Japanese translation task.

A Bag of Useful Tricks for Practical Neural Machine Translation: Embedding Layer Initialization and Large Batch Size
Masato Neishi | Jin Sakuma | Satoshi Tohda | Shonosuke Ishiwatari | Naoki Yoshinaga | Masashi Toyoda
Proceedings of the 4th Workshop on Asian Translation (WAT2017)

In this paper, we describe the team UT-IIS’s system and results for the WAT 2017 translation tasks. We further investigated several tricks including a novel technique for initializing embedding layers using only the parallel corpus, which increased the BLEU score by 1.28, found a practical large batch size of 256, and gained insights regarding hyperparameter settings. Ultimately, our system obtained a better result than the state-of-the-art system of WAT 2016. Our code is available on https://github.com/nem6ishi/wat17.

Modeling Situations in Neural Chat Bots
Shoetsu Sato | Naoki Yoshinaga | Masashi Toyoda | Masaru Kitsuregawa
Proceedings of ACL 2017, Student Research Workshop

2016

Kotonush: Understanding Concepts Based on Values behind Social Media
Tatsuya Iwanari | Kohei Ohara | Naoki Yoshinaga | Nobuhiro Kaji | Masashi Toyoda | Masaru Kitsuregawa
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations

Kotonush, a system that clarifies people’s values on various concepts on the basis of what they write about on social media, is presented. The values are represented by ordering sets of concepts (e.g., London, Berlin, and Rome) in accordance with a common attribute intensity expressed by an adjective (e.g., entertaining). We exploit social media text written by different demographics and at different times in order to induce specific orderings for comparison. The system combines a text-to-ordering module with an interactive querying interface enabled by massive hyponymy relations and provides mechanisms to compare the induced orderings from various viewpoints. We empirically evaluate Kotonush and present some case studies, featuring real-world concept orderings with different domains on Twitter, to demonstrate the usefulness of our system.

2015

Accurate Cross-lingual Projection between Count-based Word Vectors by Exploiting Translatable Context Pairs
Shonosuke Ishiwatari | Nobuhiro Kaji | Naoki Yoshinaga | Masashi Toyoda | Masaru Kitsuregawa
Proceedings of the Nineteenth Conference on Computational Natural Language Learning

2014

A Self-adaptive Classifier for Efficient Text-stream Processing
Naoki Yoshinaga | Masaru Kitsuregawa
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

2013

Modeling User Leniency and Product Popularity for Sentiment Classification
Wenliang Gao | Naoki Yoshinaga | Nobuhiro Kaji | Masaru Kitsuregawa
Proceedings of the Sixth International Joint Conference on Natural Language Processing

Collective Sentiment Classification Based on User Leniency and Product Popularity
Wenliang Gao | Naoki Yoshinaga | Nobuhiro Kaji | Masaru Kitsuregawa
Proceedings of the 27th Pacific Asia Conference on Language, Information, and Computation (PACLIC 27)

Predicting and Eliciting Addressee’s Emotion in Online Dialogue
Takayuki Hasegawa | Nobuhiro Kaji | Naoki Yoshinaga | Masashi Toyoda
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2012

Identifying Constant and Unique Relations by using Time-Series Text
Yohei Takaku | Nobuhiro Kaji | Naoki Yoshinaga | Masashi Toyoda
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

2011

Sentiment Classification in Resource-Scarce Languages by using Label Propagation
Yong Ren | Nobuhiro Kaji | Naoki Yoshinaga | Masashi Toyoda | Masaru Kitsuregawa
Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation

2010

Kernel Slicing: Scalable Online Training with Conjunctive Features
Naoki Yoshinaga | Masaru Kitsuregawa
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

Efficient Staggered Decoding for Sequence Labeling
Nobuhiro Kaji | Yasuhiro Fujiwara | Naoki Yoshinaga | Masaru Kitsuregawa
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

2009

Polynomial to Linear: Efficient Classification with Conjunctive Features
Naoki Yoshinaga | Masaru Kitsuregawa
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

2008

Boosting Precision and Recall of Hyponymy Relation Acquisition from Hierarchical Layouts in Wikipedia
Asuka Sumida | Naoki Yoshinaga | Kentaro Torisawa
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper proposes an extension of Sumida and Torisawas method of acquiring hyponymy relations from hierachical layouts in Wikipedia (Sumida and Torisawa, 2008). We extract hyponymy relation candidates (HRCs) from the hierachical layouts in Wikipedia by regarding all subordinate items of an item x in the hierachical layouts as xs hyponym candidates, while Sumida and Torisawa (2008) extracted only direct subordinate items of an item x as xs hyponym candidates. We then select plausible hyponymy relations from the acquired HRCs by running a filter based on machine learning with novel features, which even improve the precision of the resulting hyponymy relations. Experimental results show that we acquired more than 1.34 million hyponymy relations with a precision of 90.1%.

Towards Accurate Probabilistic Lexicons for Lexicalized Grammars
Naoki Yoshinaga
Proceedings of the Ninth International Workshop on Tree Adjoining Grammar and Related Frameworks (TAG+9)

2004

Generalizing Subcategorization Frames Acquired from Corpora Using Lexicalized Grammars
Naoki Yoshinaga | Jun’ichi Tsujii
Proceedings of the 7th International Workshop on Tree Adjoining Grammar and Related Formalisms

Context-free Approximation of LTAG towards CFG Filtering
Kenta Oouchida | Naoki Yoshinaga | Jun’ichi Tsujii
Proceedings of the 7th International Workshop on Tree Adjoining Grammar and Related Formalisms

Improving the Accuracy of Subcategorizations Acquired from Corpora
Naoki Yoshinaga
Proceedings of the ACL Student Research Workshop

2003

A Debug Tool for Practical Grammar Development
Akane Yakushiji | Yuka Tateisi | Yusuke Miyao | Naoki Yoshinaga | Jun’ichi Tsujii
The Companion Volume to the Proceedings of 41st Annual Meeting of the Association for Computational Linguistics

Comparison between CFG Filtering Techniques for LTAG and HPSG
Naoki Yoshinaga | Kentaro Torisawa | Jun’ichi Tsujii
The Companion Volume to the Proceedings of 41st Annual Meeting of the Association for Computational Linguistics

2002

A Formal Proof of Strong Equivalence for a Grammar Conversion from LTAG to HPSG-style
Naoki Yoshinaga | Yusuke Miyao | Jun’ichi Tsujii
Proceedings of the Sixth International Workshop on Tree Adjoining Grammar and Related Frameworks (TAG+6)

2001

Resource Sharing Amongst HPSG and LTAG Communities by a Method of Grammar Conversion between FB-LTAG and HPSG
Naoki Yoshinaga | Yusuke Miyao | Kentaro Torisawa | Jun’ichi Tsujii
Proceedings of the ACL 2001 Workshop on Sharing Tools and Resources

Co-authors

Venues