Takuya Matsuzaki - ACL Anthology

Takuya Matsuzaki

2025

Natural Language Translation of Formal Proofs through Informalization of Proof Steps and Recursive Summarization along Proof Structure
Seiji Hattori | Takuya Matsuzaki | Makoto Fujiwara
Proceedings of the 18th International Natural Language Generation Conference

This paper proposes a natural language translation method for machine-verifiable formal proofs that leverages the informalization (verbalization of formal language proof steps) and summarization capabilities of LLMs. For evaluation, it was applied to formal proof data created in accordance with natural language proofs taken from an undergraduate-level textbook, and the quality of the generated natural language proofs was analyzed in comparison with the original natural language proofs. Furthermore, we will demonstrate that this method can output highly readable and accurate natural language proofs by applying it to existing formal proof library of the Lean proof assistant.

ARxHYOKA at TAQEEM2025: Comparative Approaches to Arabic Essay Trait Scoring
Mohamad Alnajjar | Ahmad Almoustafa | Tomohiro Nishiyama | Shoko Wakamiya | Eiji Aramaki | Takuya Matsuzaki
Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks

Character-Aware English-to-Japanese Translation of Fictional Dialogue Using Speaker Embeddings and Back-Translation
Ayuna Nagato | Takuya Matsuzaki
Proceedings of the Tenth Conference on Machine Translation

In Japanese, the form of utterances often reflect speaker-specific character traits, such as gender and personality, through the choise of linguistic elements including personal pronouns and sentence-final particles. However, such elements are not always available in English and a character’s traits are often not directly expressed in English utterances, which can lead to character-inconsistent translations of English novels into Japanese. To address this, we propose a character-aware translation framework that incorporates speaker embeddings. We first train a speaker embedding model by masking the expressions in Japanese utterances that manifest the speaker’s traits and learning to predict them. The resulting embeddings are then injected into a machine translation model. Experimental results show that our proposed method outperforms conventional fine-tuning in preserving speaker-specific character traits in translations.

Timestep Embeddings Trigger Collapse in Diffusion Text Generation
Ryota Nosaka | Takuya Matsuzaki
Proceedings of the 29th Conference on Computational Natural Language Learning

Diffusion models have achieved remarkable success in various generative tasks, particularly in image and audio synthesis, which work by iteratively refining random noise into realistic data. Recent studies have highlighted the potential of diffusion models for text generation, but several challenges remain unresolved. One significant issue is that the model begins to degrade a previous sample rather than improve it after a certain timestep in the generation process, resulting in broken text. In this paper, we reveal that timestep embeddings are a principal cause of the collapse problem by analyzing their interactions with word embeddings. Further, we propose two key methods: (a) a simple lightweight word embedding technique that enhances model analyzability as well as learning efficiency; (b) a novel regularization on both word and timestep embeddings. Experimental results demonstrate that our approach effectively mitigates the collapse problem and can lead to a considerable improvement in the quality of generated text.

2023

Absolute Position Embedding Learns Sinusoid-like Waves for Attention Based on Relative Position
Yuji Yamamoto | Takuya Matsuzaki
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Attention weight is a clue to interpret how a Transformer-based model makes an inference. In some attention heads, the attention focuses on the neighbors of each token. This allows the output vector of each token to depend on the surrounding tokens and contributes to make the inference context-dependent. We analyze the mechanism behind the concentration of attention on nearby tokens. We show that the phenomenon emerges as follows: (1) learned position embedding has sinusoid-like components, (2) such components are transmitted to the query and the key in the self-attention, (3) the attention head shifts the phases of the sinusoid-like components so that the attention concentrates on nearby tokens at specific relative positions. In other words, a certain type of Transformer-based model acquires the sinusoidal positional encoding to some extent on its own through Masked Language Modeling.

2017

Automated Historical Fact-Checking by Passage Retrieval, Word Statistics, and Virtual Question-Answering
Mio Kobayashi | Ai Ishii | Chikara Hoshino | Hiroshi Miyashita | Takuya Matsuzaki
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

This paper presents a hybrid approach to the verification of statements about historical facts. The test data was collected from the world history examinations in a standardized achievement test for high school students. The data includes various kinds of false statements that were carefully written so as to deceive the students while they can be disproven on the basis of the teaching materials. Our system predicts the truth or falsehood of a statement based on text search, word cooccurrence statistics, factoid-style question answering, and temporal relation recognition. These features contribute to the judgement complementarily and achieved the state-of-the-art accuracy.

Coreference Resolution on Math Problem Text in Japanese
Takumi Ito | Takuya Matsuzaki | Satoshi Sato
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

This paper describes a coreference resolution system for math problem text. Case frame dictionaries and a math taxonomy are utilized for supplying domain knowledge. The system deals with various anaphoric phenomena beyond well-studied entity coreferences.

Semantic Parsing of Pre-university Math Problems
Takuya Matsuzaki | Takumi Ito | Hidenao Iwane | Hirokazu Anai | Noriko H. Arai
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We have been developing an end-to-end math problem solving system that accepts natural language input. The current paper focuses on how we analyze the problem sentences to produce logical forms. We chose a hybrid approach combining a shallow syntactic analyzer and a manually-developed lexicalized grammar. A feature of the grammar is that it is extensively typed on the basis of a formal ontology for pre-university math. These types are helpful in semantic disambiguation inside and across sentences. Experimental results show that the hybrid system produces a well-formed logical form with 88% precision and 56% recall.

2016

Translation Errors and Incomprehensibility: a Case Study using Machine-Translated Second Language Proficiency Tests
Takuya Matsuzaki | Akira Fujita | Naoya Todo | Noriko H. Arai
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper reports on an experiment where 795 human participants answered to the questions taken from second language proficiency tests that were translated to their native language. The output of three machine translation systems and two different human translations were used as the test material. We classified the translation errors in the questions according to an error taxonomy and analyzed the participants’ response on the basis of the type and frequency of the translation errors. Through the analysis, we identified several types of errors that deteriorated most the accuracy of the participants’ answers, their confidence on the answers, and their overall evaluation of the translation quality.

2015

Evaluating Machine Translation Systems with Second Language Proficiency Tests
Takuya Matsuzaki | Akira Fujita | Naoya Todo | Noriko H. Arai
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

2014

Logical Inference on Dependency-based Compositional Semantics
Ran Tian | Yusuke Miyao | Takuya Matsuzaki
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Efficient Logical Inference for Semantic Processing
Ran Tian | Yusuke Miyao | Takuya Matsuzaki
Proceedings of the ACL 2014 Workshop on Semantic Parsing

2013

Integrating Multiple Dependency Corpora for Inducing Wide-coverage Japanese CCG Resources
Sumire Uematsu | Takuya Matsuzaki | Hiroki Hanaoka | Yusuke Miyao | Hideki Mima
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The Complexity of Math Problems – Linguistic, or Computational?
Takuya Matsuzaki | Hidenao Iwane | Hirokazu Anai | Noriko Arai
Proceedings of the Sixth International Joint Conference on Natural Language Processing

Deep Context-Free Grammar for Chinese with Broad-Coverage
Xiangli Wang | Yi Zhang | Yusuke Miyao | Takuya Matsuzaki | Junichi Tsujii
Proceedings of the Seventh SIGHAN Workshop on Chinese Language Processing

2012

Akamon: An Open Source Toolkit for Tree/Forest-Based Statistical Machine Translation
Xianchao Wu | Takuya Matsuzaki | Jun’ichi Tsujii
Proceedings of the ACL 2012 System Demonstrations

Coordination Structure Analysis using Dual Decomposition
Atsushi Hanamoto | Takuya Matsuzaki | Jun’ichi Tsujii
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

Incremental Joint Approach to Word Segmentation, POS Tagging, and Dependency Parsing in Chinese
Jun Hatori | Takuya Matsuzaki | Yusuke Miyao | Jun’ichi Tsujii
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2011

Exploring Difficulties in Parsing Imperatives and Questions
Tadayoshi Hara | Takuya Matsuzaki | Yusuke Miyao | Jun’ichi Tsujii
Proceedings of 5th International Joint Conference on Natural Language Processing

Learning the Optimal Use of Dependency-parsing Information for Finding Translations with Comparable Corpora
Daniel Andrade | Takuya Matsuzaki | Jun’ichi Tsujii
Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web

Analysis of the Difficulties in Chinese Deep Parsing
Kun Yu | Yusuke Miyao | Takuya Matsuzaki | Xiangli Wang | Junichi Tsujii
Proceedings of the 12th International Conference on Parsing Technologies

A Collaborative Annotation between Human Annotators and a Statistical Parser
Shun’ya Iwasawa | Hiroki Hanaoka | Takuya Matsuzaki | Yusuke Miyao | Jun’ichi Tsujii
Proceedings of the 5th Linguistic Annotation Workshop

Effective Use of Function Words for Rule Generalization in Forest-Based Translation
Xianchao Wu | Takuya Matsuzaki | Jun’ichi Tsujii
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

Incremental Joint POS Tagging and Dependency Parsing in Chinese
Jun Hatori | Takuya Matsuzaki | Yusuke Miyao | Jun’ichi Tsujii
Proceedings of 5th International Joint Conference on Natural Language Processing

2010

The Deep Re-Annotation in a Chinese Scientific Treebank
Kun Yu | Xiangli Wang | Yusuke Miyao | Takuya Matsuzaki | Junichi Tsujii
Proceedings of the Fourth Linguistic Annotation Workshop

Fine-Grained Tree-to-String Translation Rule Extraction
Xianchao Wu | Takuya Matsuzaki | Jun’ichi Tsujii
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

A Simple Approach for HPSG Supertagging Using Dependency Information
Yao-zhong Zhang | Takuya Matsuzaki | Jun’ichi Tsujii
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

Semi-automatically Developing Chinese HPSG Grammar from the Penn Chinese Treebank for Deep Parsing
Kun Yu | Yusuke Miyao | Xiangli Wang | Takuya Matsuzaki | Junichi Tsujii
Coling 2010: Posters

Forest-guided Supertagger Training
Yao-zhong Zhang | Takuya Matsuzaki | Jun’ichi Tsujii
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

2009

HPSG Supertagging: A Sequence Labeling View
Yao-zhong Zhang | Takuya Matsuzaki | Jun’ichi Tsujii
Proceedings of the 11th International Conference on Parsing Technologies (IWPT’09)

The UOT system
Xianchao Wu | Takuya Matsuzaki | Naoaki Okazaki | Yusuke Miyao | Jun’ichi Tsujii
Proceedings of the 6th International Workshop on Spoken Language Translation: Evaluation Campaign

We present the UOT Machine Translation System that was used in the IWSLT-09 evaluation campaign. This year, we participated in the BTEC track for Chinese-to-English translation. Our system is based on a string-to-tree framework. To integrate deep syntactic information, we propose the use of parse trees and semantic dependencies on English sentences described respectively by Head-driven Phrase Structure Grammar and Predicate-Argument Structures. We report the results of our system on both the development and test sets.

Deterministic Shift-Reduce Parsing for Unification-Based Grammars by Using Default Unification
Takashi Ninomiya | Takuya Matsuzaki | Nobuyuki Shimizu | Hiroshi Nakagawa
Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009)

Design of Chinese HPSG Framework for Data-Driven Parsing
Xiangli Wang | Shunya Iwasawa | Yusuke Miyao | Takuya Matsuzaki | Kun Yu | Jun’ichi Tsujii
Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, Volume 2

A Discriminative Latent Variable Chinese Segmenter with Hybrid Word/Character Information
Xu Sun | Yaozhong Zhang | Takuya Matsuzaki | Yoshimasa Tsuruoka | Jun’ichi Tsujii
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

2008

Task-oriented Evaluation of Syntactic Parsers and Their Representations
Yusuke Miyao | Rune Sætre | Kenji Sagae | Takuya Matsuzaki | Jun’ichi Tsujii
Proceedings of ACL-08: HLT

Comparative Parser Performance Analysis across Grammar Frameworks through Automatic Tree Conversion using Synchronous Grammars
Takuya Matsuzaki | Jun’ichi Tsujii
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

2007

A log-linear model with an n-gram reference distribution for accurate HPSG parsing
Takashi Ninomiya | Takuya Matsuzaki | Yusuke Miyao | Jun’ichi Tsujii
Proceedings of the Tenth International Conference on Parsing Technologies

2006

Extremely Lexicalized Models for Accurate and Fast HPSG Parsing
Takashi Ninomiya | Takuya Matsuzaki | Yoshimasa Tsuruoka | Yusuke Miyao | Jun’ichi Tsujii
Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing

2005

Probabilistic CFG with Latent Annotations
Takuya Matsuzaki | Yusuke Miyao | Jun’ichi Tsujii
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05)

2003

An efficient clustering algorithm for class-based language models
Takuya Matsuzaki | Yusuke Miyao | Jun’ichi Tsujii
Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003

Co-authors

Venues