Atsuki Yamaguchi


2024

pdf bib
An Empirical Study on Cross-lingual Vocabulary Adaptation for Efficient Language Model Inference
Atsuki Yamaguchi | Aline Villavicencio | Nikolaos Aletras
Findings of the Association for Computational Linguistics: EMNLP 2024

The development of state-of-the-art generative large language models (LLMs) disproportionately relies on English-centric tokenizers, vocabulary and pre-training data. Despite the fact that some LLMs have multilingual capabilities, recent studies have shown that their inference efficiency deteriorates when generating text in languages other than English. This results in increased inference time and costs. Cross-lingual vocabulary adaptation (CVA) methods have been proposed for adapting models to a target language aiming to improve downstream performance. However, the effectiveness of these methods on increasing inference efficiency of generative LLMs has yet to be explored. In this paper, we perform an empirical study of five CVA methods on four generative LLMs (including monolingual and multilingual models) across four typologically-diverse languages and four natural language understanding tasks. We find that CVA substantially contributes to LLM inference speedups of up to 271.5%. We also show that adapting LLMs that have been pre-trained on more balanced multilingual data results in downstream performance comparable to the original models.

pdf bib
JFLD: A Japanese Benchmark for Deductive Reasoning Based on Formal Logic
Terufumi Morishita | Atsuki Yamaguchi | Gaku Morio | Hikaru Tomonari | Osamu Imaichi | Yasuhiro Sogawa
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Large language models (LLMs) have proficiently solved a broad range of tasks with their rich knowledge but often struggle with logical reasoning. To foster the research on logical reasoning, many benchmarks have been proposed so far. However, most of these benchmarks are limited to English, hindering the evaluation of LLMs specialized for each language. To address this, we propose **JFLD** (**J**apanese **F**ormal **L**ogic **D**eduction), a deductive reasoning benchmark for Japanese. JFLD assess whether LLMs can generate logical steps to (dis-)prove a given hypothesis based on a given set of facts. Its key features are assessing pure logical reasoning abilities isolated from knowledge and assessing various reasoning rules. We evaluate various Japanese LLMs and see that they are still poor at logical reasoning, thus highlighting a substantial need for future research.

2023

pdf bib
Hitachi at SemEval-2023 Task 3: Exploring Cross-lingual Multi-task Strategies for Genre and Framing Detection in Online News
Yuta Koreeda | Ken-ichi Yokote | Hiroaki Ozaki | Atsuki Yamaguchi | Masaya Tsunokake | Yasuhiro Sogawa
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

This paper explains the participation of team Hitachi to SemEval-2023 Task 3 “Detecting the genre, the framing, and the persuasion techniques in online news in a multi-lingual setup.” Based on the multilingual, multi-task nature of the task and the low-resource setting, we investigated different cross-lingual and multi-task strategies for training the pretrained language models. Through extensive experiments, we found that (a) cross-lingual/multi-task training, and (b) collecting an external balanced dataset, can benefit the genre and framing detection. We constructed ensemble models from the results and achieved the highest macro-averaged F1 scores in Italian and Russian genre categorization subtasks.

pdf bib
Hitachi at SemEval-2023 Task 4: Exploring Various Task Formulations Reveals the Importance of Description Texts on Human Values
Masaya Tsunokake | Atsuki Yamaguchi | Yuta Koreeda | Hiroaki Ozaki | Yasuhiro Sogawa
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

This paper describes our participation in SemEval-2023 Task 4, ValueEval: Identification of Human Values behind Arguments. The aim of this task is to identify whether or not an input text supports each of the 20 pre-defined human values. Previous work on human value detection has shown the effectiveness of a sequence classification approach using BERT. However, little is known about what type of task formulation is suitable for the task. To this end, this paper explores various task formulations, including sequence classification, question answering, and question answering with chain-of-thought prompting and evaluates their performances on the shared task dataset. Experiments show that a zero-shot approach is not as effective as other methods, and there is no one approach that is optimal in every scenario. Our analysis also reveals that utilizing the descriptions of human values can help to improve performance.

pdf bib
How do different tokenizers perform on downstream tasks in scriptio continua languages?: A case study in Japanese
Takuro Fujii | Koki Shibata | Atsuki Yamaguchi | Terufumi Morishita | Yasuhiro Sogawa
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)

This paper investigates the effect of tokenizers on the downstream performance of pretrained language models (PLMs) in scriptio continua languages where no explicit spaces exist between words, using Japanese as a case study. The tokenizer for such languages often consists of a morphological analyzer and a subword tokenizer, requiring us to conduct a comprehensive study of all possible pairs. However, previous studies lack this comprehensiveness. We therefore train extensive sets of tokenizers, build a PLM using each, and measure the downstream performance on a wide range of tasks. Our results demonstrate that each downstream task has a different optimal morphological analyzer, and that it is better to use Byte-Pair-Encoding or Unigram rather than WordPiece as a subword tokenizer, regardless of the type of task.

pdf bib
How does the task complexity of masked pretraining objectives affect downstream performance?
Atsuki Yamaguchi | Hiroaki Ozaki | Terufumi Morishita | Gaku Morio | Yasuhiro Sogawa
Findings of the Association for Computational Linguistics: ACL 2023

Masked language modeling (MLM) is a widely used self-supervised pretraining objective, where a model needs to predict an original token that is replaced with a mask given contexts. Although simpler and computationally efficient pretraining objectives, e.g., predicting the first character of a masked token, have recently shown comparable results to MLM, no objectives with a masking scheme actually outperform it in downstream tasks. Motivated by the assumption that their lack of complexity plays a vital role in the degradation, we validate whether more complex masked objectives can achieve better results and investigate how much complexity they should have to perform comparably to MLM. Our results using GLUE, SQuAD, and Universal Dependencies benchmarks demonstrate that more complicated objectives tend to show better downstream results with at least half of the MLM complexity needed to perform comparably to MLM. Finally, we discuss how we should pretrain a model using a masked objective from the task complexity perspective.

2022

pdf bib
Hitachi at SemEval-2022 Task 2: On the Effectiveness of Span-based Classification Approaches for Multilingual Idiomaticity Detection
Atsuki Yamaguchi | Gaku Morio | Hiroaki Ozaki | Yasuhiro Sogawa
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

In this paper, we describe our system for SemEval-2022 Task 2: Multilingual Idiomaticity Detection and Sentence Embedding. The task aims at detecting idiomaticity in an input sequence (Subtask A) and modeling representation of sentences that contain potential idiomatic multiword expressions (MWEs) (Subtask B) in three languages. We focus on the zero-shot setting of Subtask A and propose two span-based idiomaticity classification methods: MWE span-based classification and idiomatic MWE span prediction-based classification. We use several cross-lingual pre-trained language models (InfoXLM, XLM-R, and others) as our backbone network. Our best-performing system, fine-tuned with the span-based idiomaticity classification, ranked fifth in the zero-shot setting of Subtask A and exhibited a macro F1 score of 0.7466.

pdf bib
Hitachi at SemEval-2022 Task 10: Comparing Graph- and Seq2Seq-based Models Highlights Difficulty in Structured Sentiment Analysis
Gaku Morio | Hiroaki Ozaki | Atsuki Yamaguchi | Yasuhiro Sogawa
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

This paper describes our participation in SemEval-2022 Task 10, a structured sentiment analysis. In this task, we have to parse opinions considering both structure- and context-dependent subjective aspects, which is different from typical dependency parsing. Some of the major parser types have recently been used for semantic and syntactic parsing, while it is still unknown which type can capture structured sentiments well due to their subjective aspects. To this end, we compared two different types of state-of-the-art parser, namely graph-based and seq2seq-based. Our in-depth analyses suggest that, even though graph-based parser generally outperforms the seq2seq-based one, with strong pre-trained language models both parsers can essentially output acceptable and reasonable predictions. The analyses highlight that the difficulty derived from subjective aspects in structured sentiment analysis remains an essential challenge.

2021

pdf bib
Dialogue Act-based Breakdown Detection in Negotiation Dialogues
Atsuki Yamaguchi | Kosui Iwasa | Katsuhide Fujita
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Thanks to the success of goal-oriented negotiation dialogue systems, studies of negotiation dialogue have gained momentum in terms of both human-human negotiation support and dialogue systems. However, the field suffers from a paucity of available negotiation corpora, which hinders further development and makes it difficult to test new methodologies in novel negotiation settings. Here, we share a human-human negotiation dialogue dataset in a job interview scenario that features increased complexities in terms of the number of possible solutions and a utility function. We test the proposed corpus using a breakdown detection task for human-human negotiation support. We also introduce a dialogue act-based breakdown detection method, focusing on dialogue flow that is applicable to various corpora. Our results show that our proposed method features comparable detection performance to text-based approaches in existing corpora and better results in the proposed dataset.

pdf bib
Frustratingly Simple Pretraining Alternatives to Masked Language Modeling
Atsuki Yamaguchi | George Chrysostomou | Katerina Margatina | Nikolaos Aletras
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Masked language modeling (MLM), a self-supervised pretraining objective, is widely used in natural language processing for learning text representations. MLM trains a model to predict a random sample of input tokens that have been replaced by a [MASK] placeholder in a multi-class setting over the entire vocabulary. When pretraining, it is common to use alongside MLM other auxiliary objectives on the token or sequence level to improve downstream performance (e.g. next sentence prediction). However, no previous work so far has attempted in examining whether other simpler linguistically intuitive or not objectives can be used standalone as main pretraining objectives. In this paper, we explore five simple pretraining objectives based on token-level classification tasks as replacements of MLM. Empirical results on GLUE and SQUAD show that our proposed methods achieve comparable or better performance to MLM using a BERT-BASE architecture. We further validate our methods using smaller models, showing that pretraining a model with 41% of the BERT-BASE’s parameters, BERT-MEDIUM results in only a 1% drop in GLUE scores with our best objective.