Yusuke Miyao


2024

pdf bib
A Multi-Perspective Analysis of Memorization in Large Language Models
Bowen Chen | Namgi Han | Yusuke Miyao
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Large Language Models (LLMs) can generate the same sequences contained in the pre-train corpora, known as memorization.Previous research studied it at a macro level, leaving micro yet important questions under-explored, e.g., what makes sentences memorized, the dynamics when generating memorized sequence, its connection to unmemorized sequence, and its predictability.We answer the above questions by analyzing the relationship of memorization with outputs from LLM, namely, embeddings, probability distributions, and generated tokens.A memorization score is calculated as the overlap between generated tokens and actual continuations when the LLM is prompted with a context sequence from the pre-train corpora.Our findings reveal:(1) The inter-correlation between memorized/unmemorized sentences, model size, continuation size, and context size, as well as the transition dynamics between sentences of different memorization scores,(2) A sudden drop and increase in the frequency of input tokens when generating memorized/unmemorized sequences (boundary effect),(3) Cluster of sentences with different memorization scores in the embedding space,(4) An inverse boundary effect in the entropy of probability distributions for generated memorized/unmemorized sequences,(5) The predictability of memorization is related to model size and continuation length. In addition, we show a Transformer model trained by the hidden states of LLM can predict unmemorized tokens.

pdf bib
A Comprehensive Evaluation of Inductive Reasoning Capabilities and Problem Solving in Large Language Models
Chen Bowen | Rune Sætre | Yusuke Miyao
Findings of the Association for Computational Linguistics: EACL 2024

Inductive reasoning is fundamental to both human and artificial intelligence. The inductive reasoning abilities of current Large Language Models (LLMs) are evaluated in this research.We argue that only considering induction of rules is too narrow and unrealistic, since inductive reasoning is usually mixed with other abilities, like rules application, results/rules validation, and updated information integration.We probed the LLMs with a set of designed symbolic tasks and found that even state-of-the-art (SotA) LLMs fail significantly, showing the inability of LLMs to perform these intuitively simple tasks.Furthermore, we found that perfect accuracy in a small-size problem does not guarantee the same accuracy in a larger-size version of the same problem, provoking the question of how we can assess the LLMs’ actual problem-solving capabilities.We also argue that Chain-of-Thought prompts help the LLMs by decomposing the problem-solving process, but the LLMs still learn limitedly.Furthermore, we reveal that few-shot examples assist LLM generalization in out-of-domain (OOD) cases, albeit limited. The LLM starts to fail when the problem deviates from the provided few-shot examples.

pdf bib
Unsupervised Parsing by Searching for Frequent Word Sequences among Sentences with Equivalent Predicate-Argument Structures
Junjie Chen | Xiangheng He | Danushka Bollegala | Yusuke Miyao
Findings of the Association for Computational Linguistics: ACL 2024

Unsupervised constituency parsing focuses on identifying word sequences that form a syntactic unit (i.e., constituents) in target sentences. Linguists identify the constituent by evaluating a set of Predicate-Argument Structure (PAS) equivalent sentences where we find the constituent appears more frequently than non-constituents (i.e., the constituent corresponds to a frequent word sequence within the sentence set). However, such frequency information is unavailable in previous parsing methods that identify the constituent by observing sentences with diverse PAS. In this study, we empirically show that constituents correspond to frequent word sequences in the PAS-equivalent sentence set. We propose a frequency-based parser, span-overlap, that (1) computes the span-overlap score as the word sequence’s frequency in the PAS-equivalent sentence set and (2) identifies the constituent structure by finding a constituent tree with the maximum span-overlap score. The parser achieves state-of-the-art level parsing accuracy, outperforming existing unsupervised parsers in eight out of ten languages. Additionally, we discover a multilingual phenomenon: participant-denoting constituents tend to have higher span-overlap scores than equal-length event-denoting constituents, meaning that the former tend to appear more frequently in the PAS-equivalent sentence set than the latter. The phenomenon indicates a statistical difference between the two constituent types, laying the foundation for future labeled unsupervised parsing research.

pdf bib
Transferability of Syntax-Aware Graph Neural Networks in Zero-Shot Cross-Lingual Semantic Role Labeling
Rachel Sidney Devianti | Yusuke Miyao
Findings of the Association for Computational Linguistics: EMNLP 2024

Recent models in cross-lingual semantic role labeling (SRL) barely analyze the applicability of their network selection.We believe that network selection is important since it affects the transferability of cross-lingual models, i.e., how the model can extract universal features from source languages to label target languages.Therefore, we comprehensively compare the transferability of different graph neural network (GNN)-based models enriched with universal dependency trees.GNN-based models include transformer-based, graph convolutional network-based, and graph attention network (GAT)-based models.We focus our study on a zero-shot setting by training the models in English and evaluating the models in 23 target languages provided by the Universal Proposition Bank.Based on our experiments, we consistently show that syntax from universal dependency trees is essential for cross-lingual SRL models to achieve better transferability.Dependency-aware self-attention with relative position representations (SAN-RPRs) transfer best across languages, especially in the long-range dependency distance.We also show that dependency-aware two-attention relational GATs transfer better than SAN-RPRs in languages where most arguments lie in a 1-2 dependency distance.

pdf bib
Introducing Spatial Information and a Novel Evaluation Scheme for Open-Domain Live Commentary Generation
Erica Kido Shimomoto | Edison Marrese-Taylor | Ichiro Kobayashi | Hiroya Takamura | Yusuke Miyao
Findings of the Association for Computational Linguistics: EMNLP 2024

This paper focuses on the task of open-domain live commentary generation. Compared to domain-specific work in this task, this setting proved particularly challenging due to the absence of domain-specific features. Aiming to bridge this gap, we integrate spatial information by proposing an utterance generation model with a novel spatial graph that is flexible to deal with the open-domain characteristics of the commentaries and significantly improves performance. Furthermore, we propose a novel evaluation scheme, more suitable for live commentary generation, that uses LLMs to automatically check whether generated utterances address essential aspects of the video via the answerability of questions extracted directly from the videos using LVLMs. Our results suggest that using a combination of our answerability score and a standard machine translation metric is likely a more reliable way to evaluate the performance in this task.

pdf bib
Self-Emotion Blended Dialogue Generation in Social Simulation Agents
Qiang Zhang | Jason Naradowsky | Yusuke Miyao
Proceedings of the 25th Annual Meeting of the Special Interest Group on Discourse and Dialogue

When engaging in conversations, dialogue agents in a virtual simulation environment may exhibit their own emotional states that are unrelated to the immediate conversational context, a phenomenon known as self-emotion. This study explores how such self-emotion affects the agents’ behaviors in dialogue strategies and decision-making within a large language model (LLM)-driven simulation framework. In a dialogue strategy prediction experiment, we analyze the dialogue strategy choices employed by agents both with and without self-emotion, comparing them to those of humans. The results show that incorporating self-emotion helps agents exhibit more human-like dialogue strategies. In an independent experiment comparing the performance of models fine-tuned on GPT-4 generated dialogue datasets, we demonstrate that self-emotion can lead to better overall naturalness and humanness. Finally, in a virtual simulation environment where agents have free discussions, we show that self-emotion of agents can significantly influence the decision-making process of the agents, leading to approximately a 50% change in decisions.

pdf bib
Forecasting Implicit Emotions Elicited in Conversations
Yurie Koga | Shunsuke Kando | Yusuke Miyao
Proceedings of the 17th International Natural Language Generation Conference

This paper aims to forecast the implicit emotion elicited in the dialogue partner by a textual input utterance. Forecasting the interlocutor’s emotion is beneficial for natural language generation in dialogue systems to avoid generating utterances that make the users uncomfortable. Previous studies forecast the emotion conveyed in the interlocutor’s response, assuming it will explicitly reflect their elicited emotion. However, true emotions are not always expressed verbally. We propose a new task to directly forecast the implicit emotion elicited by an input utterance, which does not rely on this assumption. We compare this task with related ones to investigate the impact of dialogue history and one’s own utterance on predicting explicit and implicit emotions. Our result highlights the importance of dialogue history for predicting implicit emotions. It also reveals that, unlike explicit emotions, implicit emotions show limited improvement in predictive performance with one’s own utterance, and that they are more difficult to predict than explicit emotions. We find that even a large language model (LLM) struggles to forecast implicit emotions accurately.

pdf bib
Leveraging Plug-and-Play Models for Rhetorical Structure Control in Text Generation
Yuka Yokogawa | Tatsuya Ishigaki | Hiroya Takamura | Yusuke Miyao | Ichiro Kobayashi
Proceedings of the 17th International Natural Language Generation Conference

We propose a method that extends a BART-based language generator using a plug-and-play model to control the rhetorical structure of generated text. Our approach considers rhetorical relations between clauses and generates sentences that reflect this structure using plug-and-play language models. We evaluated our method using the Newsela corpus, which consists of texts at various levels of English proficiency. Our experiments demonstrated that our method outperforms the vanilla BART in terms of the correctness of output discourse and rhetorical structures. In existing methods, the rhetorical structure tends to deteriorate when compared to the baseline, the vanilla BART, as measured by n-gram overlap metrics such as BLEU. However, our proposed method does not exhibit this significant deterioration, demonstrating its advantage.

pdf bib
Language Model Based Unsupervised Dependency Parsing with Conditional Mutual Information and Grammatical Constraints
Junjie Chen | Xiangheng He | Yusuke Miyao
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Previous methods based on Large Language Models (LLM) perform unsupervised dependency parsing by maximizing bi-lexical dependence scores. However, these previous methods adopt dependence scores that are difficult to interpret. These methods cannot incorporate grammatical constraints that previous grammar-based parsing research has shown beneficial to improving parsing performance. In this work, we apply Conditional Mutual Information (CMI), an interpretable metric, to measure the bi-lexical dependence and incorporate grammatical constraints into LLM-based unsupervised parsing. We incorporate Part-Of-Speech information as a grammatical constraint at the CMI estimation stage and integrate two additional grammatical constraints at the subsequent tree decoding stage. We find that the CMI score positively correlates with syntactic dependencies and has a stronger correlation with the syntactic dependencies than baseline scores. Our experiment confirms the benefits and applicability of the proposed grammatical constraints across five languages and eight datasets. The CMI parsing model outperforms state-of-the-art LLM-based models and similarly constrained grammar-based models. Our analysis reveals that the CMI model is strong in retrieving dependency relations with rich lexical interactions but is weak in retrieving relations with sparse lexical interactions, indicating a potential limitation in CMI-based unsupervised parsing methods.

pdf bib
The Impact of Language on Arithmetic Proficiency: A Multilingual Investigation with Cross-Agent Checking Computation
Chung-Chi Chen | Hiroya Takamura | Ichiro Kobayashi | Yusuke Miyao
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers)

This paper critically examines the arithmetic capabilities of Large Language Models (LLMs), uncovering significant limitations in their performance. Our research reveals a notable decline in accuracy for complex calculations involving large numbers, with addition and subtraction tasks showing varying degrees of proficiency. Additionally, we challenge the notion that arithmetic is language-independent, finding up to a 10% difference in performance across twenty languages. The study also compares self-verification methods with cross-agent collaborations, showing that a single model often outperforms collaborative approaches in basic arithmetic tasks. These findings suggest a need to reassess the effectiveness of LLMs in tasks requiring numerical accuracy and precision.

pdf bib
Evaluating Intention Detection Capability of Large Language Models in Persuasive Dialogues
Hiromasa Sakurai | Yusuke Miyao
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We investigate intention detection in persuasive multi-turn dialogs employing the largest available Large Language Models (LLMs).Much of the prior research measures the intention detection capability of machine learning models without considering the conversational history.To evaluate LLMs’ intention detection capability in conversation, we modified the existing datasets of persuasive conversation and created datasets using a multiple-choice paradigm.It is crucial to consider others’ perspectives through their utterances when engaging in a persuasive conversation, especially when making a request or reply that is inconvenient for others.This feature makes the persuasive dialogue suitable for the dataset of measuring intention detection capability.We incorporate the concept of ‘face acts,’ which categorize how utterances affect mental states.This approach enables us to measure intention detection capability by focusing on crucial intentions and to conduct comprehensible analysis according to intention types.

pdf bib
Integrating Headedness Information into an Auto-generated Multilingual CCGbank for Improved Semantic Interpretation
Tu-Anh Tran | Yusuke Miyao
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Previously, we introduced a method to generate a multilingual Combinatory Categorial Grammar (CCG) treebank by converting from the Universal Dependencies (UD). However, the method only produces bare CCG derivations without any accompanying semantic representations, which makes it difficult to obtain satisfactory analyses for constructions that involve non-local dependencies, such as control/raising or relative clauses, and limits the general applicability of the treebank. In this work, we present an algorithm that adds semantic representations to existing CCG derivations, in the form of predicate-argument structures. Through hand-crafted rules, we enhance each CCG category with headedness information, with which both local and non-local dependencies can be properly projected. This information is extracted from various sources, including UD, Enhanced UD, and proposition banks. Evaluation of our projected dependencies on the English PropBank and the Universal PropBank 2.0 shows that they can capture most of the semantic dependencies in the target corpora. Further error analysis measures the effectiveness of our algorithm for each language tested, and reveals several issues with the previous method and source data.

pdf bib
What Is Needed for Intra-document Disambiguation of Math Identifiers?
Takuto Asakura | Yusuke Miyao
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

In automated scientific document analysis, accurately interpreting math formulae is imperative alongside comprehending natural language. Ambiguity in math identifiers within a single document poses significant challenges to understanding math formulae. While disambiguating math identifiers across documents has seen some progress, resolving ambiguity within a document remains inadequately researched due to complexity and insufficient datasets. The level of difficulty and information required to accomplish this task was uncertain. This study aims to determine which information is necessary for the intra-document disambiguation of math identifiers. Our findings indicate that the position data and local formula structure surrounding the identifiers, including modifiers, are particularly critical. For our study, we expanded a dataset for formula grounding and doubled its size to include annotations for 27,655 math identifier occurrences. We have created a multi-layer perceptron model that performs similarly to humans, with an 85% accuracy and a kappa value of 0.73, outperforming rule-based baselines. We trained and evaluated the model with papers in natural language processing (NLP). Our findings were also confirmed valid in fields other than NLP by applying the trained models to papers from various fields. These results will aid in improving mathematical language processing, such as mathematical information retrieval.

pdf bib
Who Said What: Formalization and Benchmarks for the Task of Quote Attribution
Wenjie Zhong | Jason Naradowsky | Hiroya Takamura | Ichiro Kobayashi | Yusuke Miyao
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

The task of quote attribution seeks to pair textual utterances with the name of their speakers. Despite continuing research efforts on the task, models are rarely evaluated systematically against previous models in comparable settings on the same datasets. This has resulted in a poor understanding of the relative strengths and weaknesses of various approaches. In this work we formalize the task of quote attribution, and in doing so, establish a basis of comparison across existing models. We present an exhaustive benchmark of known models, including natural extensions to larger LLM base models, on all available datasets in both English and Chinese. Our benchmarking results reveal that the CEQA model attains state-of-the-art performance among all supervised methods, and ChatGPT, operating in a four-shot setting, demonstrates performance on par with or surpassing that of supervised methods on some datasets. Detailed error analysis identify several key factors contributing to prediction errors.

2023

pdf bib
Tree-shape Uncertainty for Analyzing the Inherent Branching Bias of Unsupervised Parsing Models
Taiga Ishii | Yusuke Miyao
Proceedings of the 27th Conference on Computational Natural Language Learning (CoNLL)

This paper presents the formalization of tree-shape uncertainty that enables us to analyze the inherent branching bias of unsupervised parsing models using raw texts alone. Previous work analyzed the branching bias of unsupervised parsing models by comparing the outputs of trained parsers with gold syntactic trees. However, such approaches do not consider the fact that texts can be generated by different grammars with different syntactic trees, possibly failing to clearly separate the inherent bias of the model and the bias in train data learned by the model. To this end, we formulate tree-shape uncertainty and derive sufficient conditions that can be used for creating texts that are expected to contain no biased information on branching. In the experiment, we show that training parsers on such unbiased texts can effectively detect the branching bias of existing unsupervised parsing models. Such bias may depend only on the algorithm, or it may depend on seemingly unrelated dataset statistics such as sequence length and vocabulary size.

pdf bib
Improving Numeracy by Input Reframing and Quantitative Pre-Finetuning Task
Chung-Chi Chen | Hiroya Takamura | Ichiro Kobayashi | Yusuke Miyao
Findings of the Association for Computational Linguistics: EACL 2023

Numbers have unique characteristics to words. Teaching models to understand numbers in text is an open-ended research question. Instead of discussing the required calculation skills, this paper focuses on a more fundamental topic: understanding numerals. We point out that innumeracy—the inability to handle basic numeral concepts—exists in most pretrained language models (LMs), and we propose a method to solve this issue by exploring the notation of numbers. Further, we discuss whether changing notation and pre-finetuning along with the comparing-number task can improve performance in three benchmark datasets containing quantitative-related tasks. The results of this study indicate that input reframing and the proposed pre-finetuning task is useful for RoBERTa.

pdf bib
Ask an Expert: Leveraging Language Models to Improve Strategic Reasoning in Goal-Oriented Dialogue Models
Qiang Zhang | Jason Naradowsky | Yusuke Miyao
Findings of the Association for Computational Linguistics: ACL 2023

Existing dialogue models may encounter scenarios which are not well-represented in the training data, and as a result generate responses that are unnatural, inappropriate, or unhelpful. We propose the “Ask an Expert” framework in which the model is trained with access to an “expert” which it can consult at each turn. Advice is solicited via a structured dialogue with the expert, and the model is optimized to selectively utilize (or ignore) it given the context and dialogue history. In this work the expert takes the form of an LLM.We evaluate this framework in a mental health support domain, where the structure of the expert conversation is outlined by pre-specified prompts which reflect a reasoning strategy taught to practitioners in the field. Blenderbot models utilizing “Ask an Expert” show quality improvements across all expert sizes, including those with fewer parameters than the dialogue model itself. Our best model provides a ~10% improvement over baselines, approaching human-level scores on “engingingness” and “helpfulness” metrics.

pdf bib
Towards Parameter-Efficient Integration of Pre-Trained Language Models In Temporal Video Grounding
Erica Kido Shimomoto | Edison Marrese-Taylor | Hiroya Takamura | Ichiro Kobayashi | Hideki Nakayama | Yusuke Miyao
Findings of the Association for Computational Linguistics: ACL 2023

This paper explores the task of Temporal Video Grounding (TVG) where, given an untrimmed video and a query sentence, the goal is to recognize and determine temporal boundaries of action instances in the video described by natural language queries. Recent works tackled this task by improving query inputs with large pre-trained language models (PLM), at the cost of more expensive training. However, the effects of this integration are unclear, as these works also propose improvements in the visual inputs. Therefore, this paper studies the role of query sentence representation with PLMs in TVG and assesses the applicability of parameter-efficient training with NLP adapters. We couple popular PLMs with a selection of existing approaches and test different adapters to reduce the impact of the additional parameters. Our results on three challenging datasets show that, with the same visual inputs, TVG models greatly benefited from the PLM integration and fine-tuning, stressing the importance of the text query representation in this task. Furthermore, adapters were an effective alternative to full fine-tuning, even though they are not tailored to our task, allowing PLM integration in larger TVG models and delivering results comparable to SOTA models. Finally, our results shed light on which adapters work best in different scenarios.

pdf bib
Mind the Gap Between Conversations for Improved Long-Term Dialogue Generation
Qiang Zhang | Jason Naradowsky | Yusuke Miyao
Findings of the Association for Computational Linguistics: EMNLP 2023

Knowing how to end and resume conversations over time is a natural part of communication, allowing for discussions to span weeks, months, or years. The duration of gaps between conversations dictates which topics are relevant and which questions to ask, and dialogue systems which do not explicitly model time may generate responses that are unnatural. In this work we explore the idea of making dialogue models aware of time, and present GapChat, a multi-session dialogue dataset in which the time between each session varies. While the dataset is constructed in real-time, progress on events in speakers’ lives is simulated in order to create realistic dialogues occurring across a long timespan. We expose time information to the model and compare different representations of time and event progress. In human evaluation we show that time-aware models perform better in metrics that judge the relevance of the chosen topics and the information gained from the conversation.

pdf bib
Fiction-Writing Mode: An Effective Control for Human-Machine Collaborative Writing
Wenjie Zhong | Jason Naradowsky | Hiroya Takamura | Ichiro Kobayashi | Yusuke Miyao
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

We explore the idea of incorporating concepts from writing skills curricula into human-machine collaborative writing scenarios, focusing on adding writing modes as a control for text generation models. Using crowd-sourced workers, we annotate a corpus of narrative text paragraphs with writing mode labels. Classifiers trained on this data achieve an average accuracy of ~87% on held-out data. We fine-tune a set of large language models to condition on writing mode labels, and show that the generated text is recognized as belonging to the specified mode with high accuracy. To study the ability of writing modes to provide fine-grained control over generated text, we devise a novel turn-based text reconstruction game to evaluate the difference between the generated text and the author’s intention. We show that authors prefer text suggestions made by writing mode-controlled models on average 61.1% of the time, with satisfaction scores 0.5 higher on a 5-point ordinal scale. When evaluated by humans, stories generated via collaboration with writing mode-controlled models achieve high similarity with the professionally written target story. We conclude by identifying the most common mistakes found in the generated stories.

pdf bib
Constructing a Japanese Business Email Corpus Based on Social Situations
Muxuan Liu | Tatsuya Ishigaki | Yusuke Miyao | Hiroya Takamura | Ichiro Kobayashi
Proceedings of the 37th Pacific Asia Conference on Language, Information and Computation

pdf bib
Comprehensive Evaluation of Translation Error Correction Models
Masatoshi Otake | Yusuke Miyao
Proceedings of the 37th Pacific Asia Conference on Language, Information and Computation

pdf bib
Audio Commentary System for Real-Time Racing Game Play
Tatsuya Ishigaki | Goran Topić | Yumi Hamazono | Ichiro Kobayashi | Yusuke Miyao | Hiroya Takamura
Proceedings of the 16th International Natural Language Generation Conference: System Demonstrations

Live commentaries are essential for enhancing spectators’ enjoyment and understanding during sports events or e-sports streams. We introduce a live audio commentator system designed specifically for a racing game, driven by the high demand in the e-sports field. While a player is playing a racing game, our system tracks real-time user play data including speed and steer rotations, and generates commentary to accompany the live stream. Human evaluation suggested that generated commentary enhances enjoyment and understanding of races compared to streams without commentary. Incorporating additional modules to improve diversity and detect irregular events, such as course-outs and collisions, further increases the preference for the output commentaries.

2022

pdf bib
Modeling Syntactic-Semantic Dependency Correlations in Semantic Role Labeling Using Mixture Models
Junjie Chen | Xiangheng He | Yusuke Miyao
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In this paper, we propose a mixture model-based end-to-end method to model the syntactic-semantic dependency correlation in Semantic Role Labeling (SRL). Semantic dependencies in SRL are modeled as a distribution over semantic dependency labels conditioned on a predicate and an argument word. The semantic label distribution varies depending on Shortest Syntactic Dependency Path (SSDP) hop patterns. We target the variation of semantic label distributions using a mixture model, separately estimating semantic label distributions for different hop patterns and probabilistically clustering hop patterns with similar semantic label distributions. Experiments show that the proposed method successfully learns a cluster assignment reflecting the variation of semantic label distributions. Modeling the variation improves performance in predicting short distance semantic dependencies, in addition to the improvement on long distance semantic dependencies that previous syntax-aware methods have achieved. The proposed method achieves a small but statistically significant improvement over baseline methods in English, German, and Spanish and obtains competitive performance with state-of-the-art methods in English.

pdf bib
StoryER: Automatic Story Evaluation via Ranking, Rating and Reasoning
Hong Chen | Duc Vo | Hiroya Takamura | Yusuke Miyao | Hideki Nakayama
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Existing automatic story evaluation methods place a premium on story lexical level coherence, deviating from human preference.We go beyond this limitation by considering a novel Story Evaluation method that mimics human preference when judging a story, namely StoryER, which consists of three sub-tasks: Ranking, Rating and Reasoning.Given either a machine-generated or a human-written story, StoryER requires the machine to output 1) a preference score that corresponds to human preference, 2) specific ratings and their corresponding confidences and 3) comments for various aspects (e.g., opening, character-shaping).To support these tasks, we introduce a well-annotated dataset comprising (i) 100k ranked story pairs; and (ii) a set of 46k ratings and comments on various aspects of the story.We finetune Longformer-Encoder-Decoder (LED) on the collected dataset, with the encoder responsible for preference score and aspect prediction and the decoder for comment generation.Our comprehensive experiments result a competitive benchmark for each task, showing the high correlation to human preference.In addition, we have witnessed the joint learning of the preference scores, the aspect ratings, and the comments brings gain each single task.Our dataset and benchmarks are publicly available to advance the research of story evaluation tasks.

pdf bib
Open-domain Video Commentary Generation
Edison Marrese-Taylor | Yumi Hamazono | Tatsuya Ishigaki | Goran Topić | Yusuke Miyao | Ichiro Kobayashi | Hiroya Takamura
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Live commentary plays an important role in sports broadcasts and video games, making spectators more excited and immersed. In this context, though approaches for automatically generating such commentary have been proposed in the past, they have been generally concerned with specific fields, where it is possible to leverage domain-specific information. In light of this, we propose the task of generating video commentary in an open-domain fashion. We detail the construction of a new large-scale dataset of transcribed commentary aligned with videos containing various human actions in a variety of domains, and propose approaches based on well-known neural architectures to tackle the task. To understand the strengths and limitations of current approaches, we present an in-depth empirical study based on our data. Our results suggest clear trade-offs between textual and visual inputs for the models and highlight the importance of relying on external knowledge in this open-domain setting, resulting in a set of robust baselines for our task.

pdf bib
Rethinking Offensive Text Detection as a Multi-Hop Reasoning Problem
Qiang Zhang | Jason Naradowsky | Yusuke Miyao
Findings of the Association for Computational Linguistics: ACL 2022

We introduce the task of implicit offensive text detection in dialogues, where a statement may have either an offensive or non-offensive interpretation, depending on the listener and context. We argue that reasoning is crucial for understanding this broader class of offensive utterances, and release SLIGHT, a dataset to support research on this task. Experiments using the data show that state-of-the-art methods of offense detection perform poorly when asked to detect implicitly offensive statements, achieving only ∼ 11% accuracy. In contrast to existing offensive text detection datasets, SLIGHT features human-annotated chains of reasoning which describe the mental process by which an offensive interpretation can be reached from each ambiguous statement. We explore the potential for a multi-hop reasoning approach by utilizing existing entailment models to score the probability of these chains, and show that even naive reasoning models can yield improved performance in most situations. Analysis of the chains provides insight into the human interpretation process and emphasizes the importance of incorporating additional commonsense knowledge.

pdf bib
Syntactic and Semantic Uniformity for Semantic Parsing and Task-Oriented Dialogue Systems
Bowen Chen | Yusuke Miyao
Findings of the Association for Computational Linguistics: EMNLP 2022

This paper proposes a data representation framework for semantic parsing and task-oriented dialogue systems, aiming to achieve a uniform representation for syntactically and semantically diverse machine-readable formats.Current NLP systems heavily rely on adapting pre-trained language models to specific tasks, and this approach has been proven effective for modeling natural language texts.However, little attention has been paid to the representation of machine-readable formats, such as database queries and dialogue states.We present a method for converting original machine-readable formats of semantic parsing and task-oriented dialogue datasets into a syntactically and semantically uniform representation.We define a meta grammar for syntactically uniform representations and translate semantically equivalent functions into a uniform vocabulary.Empirical experiments on 13 datasets show that accuracy consistently improves over original formats, revealing the advantage of the proposed representation.Additionally, we show that the proposed representation allows for transfer learning across datasets.

pdf bib
A Subspace-Based Analysis of Structured and Unstructured Representations in Image-Text Retrieval
Erica K. Shimomoto | Edison Marrese-Taylor | Hiroya Takamura | Ichiro Kobayashi | Yusuke Miyao
Proceedings of the Workshop on Unimodal and Multimodal Induction of Linguistic Structures (UM-IoS)

In this paper, we specifically look at the image-text retrieval problem. Recent multimodal frameworks have shown that structured inputs and fine-tuning lead to consistent performance improvement. However, this paradigm has been challenged recently with newer Transformer-based models that can reach zero-shot state-of-the-art results despite not explicitly using structured data during pre-training. Since such strategies lead to increased computational resources, we seek to better understand their role in image-text retrieval by analyzing visual and text representations extracted with three multimodal frameworks – SGM, UNITER, and CLIP. To perform such analysis, we represent a single image or text as low-dimensional linear subspaces and perform retrieval based on subspace similarity. We chose this representation as subspaces give us the flexibility to model an entity based on feature sets, allowing us to observe how integrating or reducing information changes the representation of each entity. We analyze the performance of the selected models’ features on two standard benchmark datasets. Our results indicate that heavily pre-training models can already lead to features with critical information representing each entity, with zero-shot UNITER features performing consistently better than fine-tuned features. Furthermore, while models can benefit from structured inputs, learning representations for objects and relationships separately, such as in SGM, likely causes a loss of crucial contextual information needed to obtain a compact cluster that can effectively represent a single entity.

pdf bib
Multilingual Syntax-aware Language Modeling through Dependency Tree Conversion
Shunsuke Kando | Hiroshi Noji | Yusuke Miyao
Proceedings of the Sixth Workshop on Structured Prediction for NLP

Incorporating stronger syntactic biases into neural language models (LMs) is a long-standing goal, but research in this area often focuses on modeling English text, where constituent treebanks are readily available. Extending constituent tree-based LMs to the multilingual setting, where dependency treebanks are more common, is possible via dependency-to-constituency conversion methods. However, this raises the question of which tree formats are best for learning the model, and for which languages. We investigate this question by training recurrent neural network grammars (RNNGs) using various conversion methods, and evaluating them empirically in a multilingual setting. We examine the effect on LM performance across nine conversion methods and five languages through seven types of syntactic tests. On average, the performance of our best model represents a 19 % increase in accuracy over the worst choice across all languages. Our best model shows the advantage over sequential/overparameterized LMs, suggesting the positive effect of syntax injection in a multilingual setting. Our experiments highlight the importance of choosing the right tree formalism, and provide insights into making an informed decision.

pdf bib
Building Dataset for Grounding of Formulae — Annotating Coreference Relations Among Math Identifiers
Takuto Asakura | Yusuke Miyao | Akiko Aizawa
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Grounding the meaning of each symbol in math formulae is important for automated understanding of scientific documents. Generally speaking, the meanings of math symbols are not necessarily constant, and the same symbol is used in multiple meanings. Therefore, coreference relations between symbols need to be identified for grounding, and the task has aspects of both description alignment and coreference analysis. In this study, we annotated 15 papers selected from arXiv.org with the grounding information. In total, 12,352 occurrences of math identifiers in these papers were annotated, and all coreference relations between them were made explicit in each paper. The constructed dataset shows that regardless of the ambiguity of symbols in math formulae, coreference relations can be labeled with a high inter-annotator agreement. The constructed dataset enables us to achieve automation of formula grounding, and in turn, make deeper use of the knowledge in scientific documents using techniques such as math information extraction. The built grounding dataset is available at https://sigmathling.kwarc.info/resources/grounding- dataset/.

pdf bib
Development of a Multilingual CCG Treebank via Universal Dependencies Conversion
Tu-Anh Tran | Yusuke Miyao
Proceedings of the Thirteenth Language Resources and Evaluation Conference

This paper introduces an algorithm to convert Universal Dependencies (UD) treebanks to Combinatory Categorial Grammar (CCG) treebanks. As CCG encodes almost all grammatical information into the lexicon, obtaining a high-quality CCG derivation from a dependency tree is a challenging task. Our algorithm relies on hand-crafted rules to assign categories to constituents, and a non-statistical parser to derive full CCG parses given the assigned categories. To evaluate our converted treebanks, we perform lexical, sentential, and syntactic rule coverage analysis, as well as CCG parsing experiments. Finally, we discuss how our method handles complex constructions, and propose possible future extensions.

pdf bib
Collection and Analysis of Travel Agency Task Dialogues with Age-Diverse Speakers
Michimasa Inaba | Yuya Chiba | Ryuichiro Higashinaka | Kazunori Komatani | Yusuke Miyao | Takayuki Nagai
Proceedings of the Thirteenth Language Resources and Evaluation Conference

When individuals communicate with each other, they use different vocabulary, speaking speed, facial expressions, and body language depending on the people they talk to. This paper focuses on the speaker’s age as a factor that affects the change in communication. We collected a multimodal dialogue corpus with a wide range of speaker ages. As a dialogue task, we focus on travel, which interests people of all ages, and we set up a task based on a tourism consultation between an operator and a customer at a travel agency. This paper provides details of the dialogue task, the collection procedure and annotations, and the analysis on the characteristics of the dialogues and facial expressions focusing on the age of the speakers. Results of the analysis suggest that the adult speakers have more independent opinions, the older speakers more frequently express their opinions frequently compared with other age groups, and the operators expressed a smile more frequently to the minor speakers.

2021

pdf bib
Leveraging Partial Dependency Trees to Control Image Captions
Wenjie Zhong | Yusuke Miyao
Proceedings of the Second Workshop on Advances in Language and Vision Research

Controlling the generation of image captions attracts lots of attention recently. In this paper, we propose a framework leveraging partial syntactic dependency trees as control signals to make image captions include specified words and their syntactic structures. To achieve this purpose, we propose a Syntactic Dependency Structure Aware Model (SDSAM), which explicitly learns to generate the syntactic structures of image captions to include given partial dependency trees. In addition, we come up with a metric to evaluate how many specified words and their syntactic dependencies are included in generated captions. We carry out experiments on two standard datasets: Microsoft COCO and Flickr30k. Empirical results show that image captions generated by our model are effectively controlled in terms of specified words and their syntactic structures. The code is available on GitHub.

pdf bib
Bayesian Argumentation-Scheme Networks: A Probabilistic Model of Argument Validity Facilitated by Argumentation Schemes
Takahiro Kondo | Koki Washio | Katsuhiko Hayashi | Yusuke Miyao
Proceedings of the 8th Workshop on Argument Mining

We propose a methodology for representing the reasoning structure of arguments using Bayesian networks and predicate logic facilitated by argumentation schemes. We express the meaning of text segments using predicate logic and map the boolean values of predicate logic expressions to nodes in a Bayesian network. The reasoning structure among text segments is described with a directed acyclic graph. While our formalism is highly expressive and capable of describing the informal logic of human arguments, it is too open-ended to actually build a network for an argument. It is not at all obvious which segment of argumentative text should be considered as a node in a Bayesian network, and how to decide the dependencies among nodes. To alleviate the difficulty, we provide abstract network fragments, called idioms, which represent typical argument justification patterns derived from argumentation schemes. The network construction process is decomposed into idiom selection, idiom instantiation, and idiom combination. We define 17 idioms in total by referring to argumentation schemes as well as analyzing actual arguments and fitting idioms to them. We also create a dataset consisting of pairs of an argumentative text and a corresponding Bayesian network. Our dataset contains about 2,400 pairs, which is large in the research area of argumentation schemes.

pdf bib
Generating Racing Game Commentary from Vision, Language, and Structured Data
Tatsuya Ishigaki | Goran Topic | Yumi Hamazono | Hiroshi Noji | Ichiro Kobayashi | Yusuke Miyao | Hiroya Takamura
Proceedings of the 14th International Conference on Natural Language Generation

We propose the task of automatically generating commentaries for races in a motor racing game, from vision, structured numerical, and textual data. Commentaries provide information to support spectators in understanding events in races. Commentary generation models need to interpret the race situation and generate the correct content at the right moment. We divide the task into two subtasks: utterance timing identification and utterance generation. Because existing datasets do not have such alignments of data in multiple modalities, this setting has not been explored in depth. In this study, we introduce a new large-scale dataset that contains aligned video data, structured numerical data, and transcribed commentaries that consist of 129,226 utterances in 1,389 races in a game. Our analysis reveals that the characteristics of commentaries change over time or from viewpoints. Our experiments on the subtasks show that it is still challenging for a state-of-the-art vision encoder to capture useful information from videos to generate accurate commentaries. We make the dataset and baseline implementation publicly available for further research.

pdf bib
Unpredictable Attributes in Market Comment Generation
Yumi Hamazono | Tatsuya Ishigaki | Yusuke Miyao | Hiroya Takamura | Ichiro Kobayashi
Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation

pdf bib
Talking with the Theorem Prover to Interactively Solve Natural Language Inference
Atsushi Sumita | Yusuke Miyao | Koji Mineshima
Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation

2020

pdf bib
Towards Grounding of Formulae
Takuto Asakura | André Greiner-Petter | Akiko Aizawa | Yusuke Miyao
Proceedings of the First Workshop on Scholarly Document Processing

A large amount of scientific knowledge is represented within mixed forms of natural language texts and mathematical formulae. Therefore, a collaboration of natural language processing and formula analyses, so-called mathematical language processing, is necessary to enable computers to understand and retrieve information from the documents. However, as we will show in this project, a mathematical notation can change its meaning even within the scope of a single paragraph. This flexibility makes it difficult to extract the exact meaning of a mathematical formula. In this project, we will propose a new task direction for grounding mathematical formulae. Particularly, we are addressing the widespread misconception of various research projects in mathematical information retrieval, which presume that mathematical notations have a fixed meaning within a single document. We manually annotated a long scientific paper to illustrate the task concept. Our high inter-annotator agreement shows that the task is well understood for humans. Our results indicate that it is worthwhile to grow the techniques for the proposed task to contribute to the further progress of mathematical language processing.

pdf bib
Learning with Contrastive Examples for Data-to-Text Generation
Yui Uehara | Tatsuya Ishigaki | Kasumi Aoki | Hiroshi Noji | Keiichi Goshima | Ichiro Kobayashi | Hiroya Takamura | Yusuke Miyao
Proceedings of the 28th International Conference on Computational Linguistics

Existing models for data-to-text tasks generate fluent but sometimes incorrect sentences e.g., “Nikkei gains” is generated when “Nikkei drops” is expected. We investigate models trained on contrastive examples i.e., incorrect sentences or terms, in addition to correct ones to reduce such errors. We first create rules to produce contrastive examples from correct ones by replacing frequent crucial terms such as “gain” or “drop”. We then use learning methods with several losses that exploit contrastive examples. Experiments on the market comment generation task show that 1) exploiting contrastive examples improves the capability of generating sentences with better lexical choice, without degrading the fluency, 2) the choice of the loss function is an important factor because the performances on different metrics depend on the types of loss functions, and 3) the use of the examples produced by some specific rules further improves performance. Human evaluation also supports the effectiveness of using contrastive examples.

pdf bib
An empirical analysis of existing systems and datasets toward general simple question answering
Namgi Han | Goran Topic | Hiroshi Noji | Hiroya Takamura | Yusuke Miyao
Proceedings of the 28th International Conference on Computational Linguistics

In this paper, we evaluate the progress of our field toward solving simple factoid questions over a knowledge base, a practically important problem in natural language interface to database. As in other natural language understanding tasks, a common practice for this task is to train and evaluate a model on a single dataset, and recent studies suggest that SimpleQuestions, the most popular and largest dataset, is nearly solved under this setting. However, this common setting does not evaluate the robustness of the systems outside of the distribution of the used training data. We rigorously evaluate such robustness of existing systems using different datasets. Our analysis, including shifting of training and test datasets and training on a union of the datasets, suggests that our progress in solving SimpleQuestions dataset does not indicate the success of more general simple question answering. We discuss a possible future direction toward this goal.

pdf bib
Analyzing Word Embedding Through Structural Equation Modeling
Namgi Han | Katsuhiko Hayashi | Yusuke Miyao
Proceedings of the Twelfth Language Resources and Evaluation Conference

Many researchers have tried to predict the accuracies of extrinsic evaluation by using intrinsic evaluation to evaluate word embedding. The relationship between intrinsic and extrinsic evaluation, however, has only been studied with simple correlation analysis, which has difficulty capturing complex cause-effect relationships and integrating external factors such as the hyperparameters of word embedding. To tackle this problem, we employ partial least squares path modeling (PLS-PM), a method of structural equation modeling developed for causal analysis. We propose a causal diagram consisting of the evaluation results on the BATS, VecEval, and SentEval datasets, with a causal hypothesis that linguistic knowledge encoded in word embedding contributes to solving downstream tasks. Our PLS-PM models are estimated with 600 word embeddings, and we prove the existence of causal relations between linguistic knowledge evaluated on BATS and the accuracies of downstream tasks evaluated on VecEval and SentEval in our PLS-PM models. Moreover, we show that the PLS-PM models are useful for analyzing the effect of hyperparameters, including the training algorithm, corpus, dimension, and context window, and for validating the effectiveness of intrinsic evaluation.

pdf bib
Market Comment Generation from Data with Noisy Alignments
Yumi Hamazono | Yui Uehara | Hiroshi Noji | Yusuke Miyao | Hiroya Takamura | Ichiro Kobayashi
Proceedings of the 13th International Conference on Natural Language Generation

End-to-end models on data-to-text learn the mapping of data and text from the aligned pairs in the dataset. However, these alignments are not always obtained reliably, especially for the time-series data, for which real time comments are given to some situation and there might be a delay in the comment delivery time compared to the actual event time. To handle this issue of possible noisy alignments in the dataset, we propose a neural network model with multi-timestep data and a copy mechanism, which allows the models to learn the correspondences between data and text from the dataset with noisier alignments. We focus on generating market comments in Japanese that are delivered each time an event occurs in the market. The core idea of our approach is to utilize multi-timestep data, which is not only the latest market price data when the comment is delivered, but also the data obtained at several timesteps earlier. On top of this, we employ a copy mechanism that is suitable for referring to the content of data records in the market price data. We confirm the superiority of our proposal by two evaluation metrics and show the accuracy improvement of the sentence generation using the time series data by our proposed method.

pdf bib
Utterance-Unit Annotation for the JSL Dialogue Corpus: Toward a Multimodal Approach to Corpus Linguistics
Mayumi Bono | Rui Sakaida | Tomohiro Okada | Yusuke Miyao
Proceedings of the LREC2020 9th Workshop on the Representation and Processing of Sign Languages: Sign Language Resources in the Service of the Language Community, Technological Challenges and Application Perspectives

This paper describes a method for annotating the Japanese Sign Language (JSL) dialogue corpus. We developed a way to identify interactional boundaries and define a ‘utterance unit’ in sign language using various multimodal features accompanying signing. The utterance unit is an original concept for segmenting and annotating sign language dialogue referring to signer’s native sense from the perspectives of Conversation Analysis (CA) and Interaction Studies. First of all, we postulated that we should identify a fundamental concept of interaction-specific unit for understanding interactional mechanisms, such as turn-taking (Sacks et al. 1974), in sign-language social interactions. Obviously, it does should not relying on a spoken language writing system for storing signings in corpora and making translations. We believe that there are two kinds of possible applications for utterance units: one is to develop corpus linguistics research for both signed and spoken corpora; the other is to build an informatics system that includes, but is not limited to, a machine translation system for sign languages.

pdf bib
A System for Worldwide COVID-19 Information Aggregation
Akiko Aizawa | Frederic Bergeron | Junjie Chen | Fei Cheng | Katsuhiko Hayashi | Kentaro Inui | Hiroyoshi Ito | Daisuke Kawahara | Masaru Kitsuregawa | Hirokazu Kiyomaru | Masaki Kobayashi | Takashi Kodama | Sadao Kurohashi | Qianying Liu | Masaki Matsubara | Yusuke Miyao | Atsuyuki Morishima | Yugo Murawaki | Kazumasa Omura | Haiyue Song | Eiichiro Sumita | Shinji Suzuki | Ribeka Tanaka | Yu Tanaka | Masashi Toyoda | Nobuhiro Ueda | Honai Ueoka | Masao Utiyama | Ying Zhong
Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020

The global pandemic of COVID-19 has made the public pay close attention to related news, covering various domains, such as sanitation, treatment, and effects on education. Meanwhile, the COVID-19 condition is very different among the countries (e.g., policies and development of the epidemic), and thus citizens would be interested in news in foreign countries. We build a system for worldwide COVID-19 information aggregation containing reliable articles from 10 regions in 7 languages sorted by topics. Our reliable COVID-19 related website dataset collected through crowdsourcing ensures the quality of the articles. A neural machine translation module translates articles in other languages into Japanese and English. A BERT-based topic-classifier trained on our article-topic pair dataset helps users find their interested information efficiently by putting articles into different categories.

pdf bib
Comparing Neural Network Parsers for a Less-resourced and Morphologically-rich Language: Amharic Dependency Parser
Binyam Ephrem Seyoum | Yusuke Miyao | Baye Yimam Mekonnen
Proceedings of the first workshop on Resources for African Indigenous Languages

In this paper, we compare four state-of-the-art neural network dependency parsers for the Semitic language Amharic. As Amharic is a morphologically-rich and less-resourced language, the out-of-vocabulary (OOV) problem will be higher when we develop data-driven models. This fact limits researchers to develop neural network parsers because the neural network requires large quantities of data to train a model. We empirically evaluate neural network parsers when a small Amharic treebank is used for training. Based on our experiment, we obtain an 83.79 LAS score using the UDPipe system. Better accuracy is achieved when the neural parsing system uses external resources like word embedding. Using such resources, the LAS score for UDPipe improves to 85.26. Our experiment shows that the neural networks can learn dependency relations better from limited data while segmentation and POS tagging require much data.

2019

pdf bib
Learning to Select, Track, and Generate for Data-to-Text
Hayate Iso | Yui Uehara | Tatsuya Ishigaki | Hiroshi Noji | Eiji Aramaki | Ichiro Kobayashi | Yusuke Miyao | Naoaki Okazaki | Hiroya Takamura
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

We propose a data-to-text generation model with two modules, one for tracking and the other for text generation. Our tracking module selects and keeps track of salient information and memorizes which record has been mentioned. Our generation module generates a summary conditioned on the state of tracking module. Our proposed model is considered to simulate the human-like writing process that gradually selects the information by determining the intermediate variables while writing the summary. In addition, we also explore the effectiveness of the writer information for generations. Experimental results show that our proposed model outperforms existing models in all evaluation metrics even without writer information. Incorporating writer information further improves the performance, contributing to content planning and surface realization.

pdf bib
Does My Rebuttal Matter? Insights from a Major NLP Conference
Yang Gao | Steffen Eger | Ilia Kuznetsov | Iryna Gurevych | Yusuke Miyao
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Peer review is a core element of the scientific process, particularly in conference-centered fields such as ML and NLP. However, only few studies have evaluated its properties empirically. Aiming to fill this gap, we present a corpus that contains over 4k reviews and 1.2k author responses from ACL-2018. We quantitatively and qualitatively assess the corpus. This includes a pilot study on paper weaknesses given by reviewers and on quality of author responses. We then focus on the role of the rebuttal phase, and propose a novel task to predict after-rebuttal (i.e., final) scores from initial reviews and author responses. Although author responses do have a marginal (and statistically significant) influence on the final scores, especially for borderline papers, our results suggest that a reviewer’s final score is largely determined by her initial score and the distance to the other reviewers’ initial scores. In this context, we discuss the conformity bias inherent to peer reviewing, a bias that has largely been overlooked in previous research. We hope our analyses will help better assess the usefulness of the rebuttal phase in NLP conferences.

pdf bib
Controlling Contents in Data-to-Document Generation with Human-Designed Topic Labels
Kasumi Aoki | Akira Miyazawa | Tatsuya Ishigaki | Tatsuya Aoki | Hiroshi Noji | Keiichi Goshima | Ichiro Kobayashi | Hiroya Takamura | Yusuke Miyao
Proceedings of the 12th International Conference on Natural Language Generation

We propose a data-to-document generator that can easily control the contents of output texts based on a neural language model. Conventional data-to-text model is useful when a reader seeks a global summary of data because it has only to describe an important part that has been extracted beforehand. However, because depending on users, it differs what they are interested in, so it is necessary to develop a method to generate various summaries according to users’ interests. We develop a model to generate various summaries and to control their contents by providing the explicit targets for a reference to the model as controllable factors. In the experiments, we used five-minute or one-hour charts of 9 indicators (e.g., Nikkei225), as time-series data, and daily summaries of Nikkei Quick News as textual data. We conducted comparative experiments using two pieces of information: human-designed topic labels indicating the contents of a sentence and automatically extracted keywords as the referential information for generation.

2018

pdf bib
Inducing Temporal Relations from Time Anchor Annotation
Fei Cheng | Yusuke Miyao
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Recognizing temporal relations among events and time expressions has been an essential but challenging task in natural language processing. Conventional annotation of judging temporal relations puts a heavy load on annotators. In reality, the existing annotated corpora include annotations on only “salient” event pairs, or on pairs in a fixed window of sentences. In this paper, we propose a new approach to obtain temporal relations from absolute time value (a.k.a. time anchors), which is suitable for texts containing rich temporal information such as news articles. We start from time anchors for events and time expressions, and temporal relation annotations are induced automatically by computing relative order of two time anchors. This proposal shows several advantages over the current methods for temporal relation annotation: it requires less annotation effort, can induce inter-sentence relations easily, and increases informativeness of temporal relations. We compare the empirical statistics and automatic recognition results with our data against a previous temporal relation corpus. We also reveal that our data contributes to a significant improvement of the downstream time anchor prediction task, demonstrating 14.1 point increase in overall accuracy.

pdf bib
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Iryna Gurevych | Yusuke Miyao
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Iryna Gurevych | Yusuke Miyao
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
An Empirical Investigation of Error Types in Vietnamese Parsing
Quy Nguyen | Yusuke Miyao | Hiroshi Noji | Nhung Nguyen
Proceedings of the 27th International Conference on Computational Linguistics

Syntactic parsing plays a crucial role in improving the quality of natural language processing tasks. Although there have been several research projects on syntactic parsing in Vietnamese, the parsing quality has been far inferior than those reported in major languages, such as English and Chinese. In this work, we evaluated representative constituency parsing models on a Vietnamese Treebank to look for the most suitable parsing method for Vietnamese. We then combined the advantages of automatic and manual analysis to investigate errors produced by the experimented parsers and find the reasons for them. Our analysis focused on three possible sources of parsing errors, namely limited training data, part-of-speech (POS) tagging errors, and ambiguous constructions. As a result, we found that the last two sources, which frequently appear in Vietnamese text, significantly attributed to the poor performance of Vietnamese parsing.

pdf bib
Universal Dependencies Version 2 for Japanese
Masayuki Asahara | Hiroshi Kanayama | Takaaki Tanaka | Yusuke Miyao | Sumire Uematsu | Shinsuke Mori | Yuji Matsumoto | Mai Omura | Yugo Murawaki
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Universal Dependencies for Amharic
Binyam Ephrem Seyoum | Yusuke Miyao | Baye Yimam Mekonnen
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Coordinate Structures in Universal Dependencies for Head-final Languages
Hiroshi Kanayama | Na-Rae Han | Masayuki Asahara | Jena D. Hwang | Yusuke Miyao | Jinho D. Choi | Yuji Matsumoto
Proceedings of the Second Workshop on Universal Dependencies (UDW 2018)

This paper discusses the representation of coordinate structures in the Universal Dependencies framework for two head-final languages, Japanese and Korean. UD applies a strict principle that makes the head of coordination the left-most conjunct. However, the guideline may produce syntactic trees which are difficult to accept in head-final languages. This paper describes the status in the current Japanese and Korean corpora and proposes alternative designs suitable for these languages.

pdf bib
Generating Market Comments Referring to External Resources
Tatsuya Aoki | Akira Miyazawa | Tatsuya Ishigaki | Keiichi Goshima | Kasumi Aoki | Ichiro Kobayashi | Hiroya Takamura | Yusuke Miyao
Proceedings of the 11th International Conference on Natural Language Generation

Comments on a stock market often include the reason or cause of changes in stock prices, such as “Nikkei turns lower as yen’s rise hits exporters.” Generating such informative sentences requires capturing the relationship between different resources, including a target stock price. In this paper, we propose a model for automatically generating such informative market comments that refer to external resources. We evaluated our model through an automatic metric in terms of BLEU and human evaluation done by an expert in finance. The results show that our model outperforms the existing model both in BLEU scores and human judgment.

2017

pdf bib
On-demand Injection of Lexical Knowledge for Recognising Textual Entailment
Pascual Martínez-Gómez | Koji Mineshima | Yusuke Miyao | Daisuke Bekki
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

We approach the recognition of textual entailment using logical semantic representations and a theorem prover. In this setup, lexical divergences that preserve semantic entailment between the source and target texts need to be explicitly stated. However, recognising subsentential semantic relations is not trivial. We address this problem by monitoring the proof of the theorem and detecting unprovable sub-goals that share predicate arguments with logical premises. If a linguistic relation exists, then an appropriate axiom is constructed on-demand and the theorem proving continues. Experiments show that this approach is effective and precise, producing a system that outperforms other logic-based systems and is competitive with state-of-the-art statistical methods.

pdf bib
Proceedings of the 15th International Conference on Parsing Technologies
Yusuke Miyao | Kenji Sagae
Proceedings of the 15th International Conference on Parsing Technologies

pdf bib
Evaluation Metrics for Automatically Generated Metaphorical Expressions
Akira Miyazawa | Yusuke Miyao
Proceedings of the 12th International Conference on Computational Semantics (IWCS) — Short papers

pdf bib
Learning to Generate Market Comments from Stock Prices
Soichiro Murakami | Akihiko Watanabe | Akira Miyazawa | Keiichi Goshima | Toshihiko Yanase | Hiroya Takamura | Yusuke Miyao
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

This paper presents a novel encoder-decoder model for automatically generating market comments from stock prices. The model first encodes both short- and long-term series of stock prices so that it can mention short- and long-term changes in stock prices. In the decoding phase, our model can also generate a numerical value by selecting an appropriate arithmetic operation such as subtraction or rounding, and applying it to the input stock prices. Empirical experiments show that our best model generates market comments at the fluency and the informativeness approaching human-generated reference texts.

pdf bib
Classifying Temporal Relations by Bidirectional LSTM over Dependency Paths
Fei Cheng | Yusuke Miyao
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Temporal relation classification is becoming an active research field. Lots of methods have been proposed, while most of them focus on extracting features from external resources. Less attention has been paid to a significant advance in a closely related task: relation extraction. In this work, we borrow a state-of-the-art method in relation extraction by adopting bidirectional long short-term memory (Bi-LSTM) along dependency paths (DP). We make a “common root” assumption to extend DP representations of cross-sentence links. In the final comparison to two state-of-the-art systems on TimeBank-Dense, our model achieves comparable performance, without using external knowledge, as well as manually annotated attributes of entities (class, tense, polarity, etc.).

2016

pdf bib
Rule Extraction for Tree-to-Tree Transducers by Cost Minimization
Pascual Martínez-Gómez | Yusuke Miyao
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Using Left-corner Parsing to Encode Universal Structural Constraints in Grammar Induction
Hiroshi Noji | Yusuke Miyao | Mark Johnson
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Building compositional semantics and higher-order inference system for a wide-coverage Japanese CCG parser
Koji Mineshima | Ribeka Tanaka | Pascual Martínez-Gómez | Yusuke Miyao | Daisuke Bekki
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Paraphrase for Open Question Answering: New Dataset and Methods
Ying Xu | Pascual Martínez-Gómez | Yusuke Miyao | Randy Goebel
Proceedings of the Workshop on Human-Computer Question Answering

pdf bib
Challenges and Solutions for Consistent Annotation of Vietnamese Treebank
Quy Nguyen | Yusuke Miyao | Ha Le | Ngan Nguyen
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Treebanks are important resources for researchers in natural language processing, speech recognition, theoretical linguistics, etc. To strengthen the automatic processing of the Vietnamese language, a Vietnamese treebank has been built. However, the quality of this treebank is not satisfactory and is a possible source for the low performance of Vietnamese language processing. We have been building a new treebank for Vietnamese with about 40,000 sentences annotated with three layers: word segmentation, part-of-speech tagging, and bracketing. In this paper, we describe several challenges of Vietnamese language and how we solve them in developing annotation guidelines. We also present our methods to improve the quality of the annotation guidelines and ensure annotation accuracy and consistency. Experiment results show that inter-annotator agreement ratios and accuracy are higher than 90% which is satisfactory.

pdf bib
Universal Dependencies for Japanese
Takaaki Tanaka | Yusuke Miyao | Masayuki Asahara | Sumire Uematsu | Hiroshi Kanayama | Shinsuke Mori | Yuji Matsumoto
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We present an attempt to port the international syntactic annotation scheme, Universal Dependencies, to the Japanese language in this paper. Since the Japanese syntactic structure is usually annotated on the basis of unique chunk-based dependencies, we first introduce word-based dependencies by using a word unit called the Short Unit Word, which usually corresponds to an entry in the lexicon UniDic. Porting is done by mapping the part-of-speech tagset in UniDic to the universal part-of-speech tagset, and converting a constituent-based treebank to a typed dependency tree. The conversion is not straightforward, and we discuss the problems that arose in the conversion and the current solutions. A treebank consisting of 10,000 sentences was built by converting the existent resources and currently released to the public.

pdf bib
Typed Entity and Relation Annotation on Computer Science Papers
Yuka Tateisi | Tomoko Ohta | Sampo Pyysalo | Yusuke Miyao | Akiko Aizawa
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We describe our ongoing effort to establish an annotation scheme for describing the semantic structures of research articles in the computer science domain, with the intended use of developing search systems that can refine their results by the roles of the entities denoted by the query keys. In our scheme, mentions of entities are annotated with ontology-based types, and the roles of the entities are annotated as relations with other entities described in the text. So far, we have annotated 400 abstracts from the ACL anthology and the ACM digital library. In this paper, the scheme and the annotated dataset are described, along with the problems found in the course of annotation. We also show the results of automatic annotation and evaluate the corpus in a practical setting in application to topic extraction.

pdf bib
Towards Comparability of Linguistic Graph Banks for Semantic Parsing
Stephan Oepen | Marco Kuhlmann | Yusuke Miyao | Daniel Zeman | Silvie Cinková | Dan Flickinger | Jan Hajič | Angelina Ivanova | Zdeňka Urešová
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We announce a new language resource for research on semantic parsing, a large, carefully curated collection of semantic dependency graphs representing multiple linguistic traditions. This resource is called SDP~2016 and provides an update and extension to previous versions used as Semantic Dependency Parsing target representations in the 2014 and 2015 Semantic Evaluation Exercises. For a common core of English text, this third edition comprises semantic dependency graphs from four distinct frameworks, packaged in a unified abstract format and aligned at the sentence and token levels. SDP 2016 is the first general release of this resource and available for licensing from the Linguistic Data Consortium in May 2016. The data is accompanied by an open-source SDP utility toolkit and system results from previous contrastive parsing evaluations against these target representations.

pdf bib
Generating Video Description using Sequence-to-sequence Model with Temporal Attention
Natsuda Laokulrat | Sang Phan | Noriki Nishida | Raphael Shu | Yo Ehara | Naoaki Okazaki | Yusuke Miyao | Hideki Nakayama
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Automatic video description generation has recently been getting attention after rapid advancement in image caption generation. Automatically generating description for a video is more challenging than for an image due to its temporal dynamics of frames. Most of the work relied on Recurrent Neural Network (RNN) and recently attentional mechanisms have also been applied to make the model learn to focus on some frames of the video while generating each word in a describing sentence. In this paper, we focus on a sequence-to-sequence approach with temporal attention mechanism. We analyze and compare the results from different attention model configuration. By applying the temporal attention mechanism to the system, we can achieve a METEOR score of 0.310 on Microsoft Video Description dataset, which outperformed the state-of-the-art system so far.

pdf bib
Video Event Detection by Exploiting Word Dependencies from Image Captions
Sang Phan | Yusuke Miyao | Duy-Dinh Le | Shin’ichi Satoh
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Video event detection is a challenging problem in information and multimedia retrieval. Different from single action detection, event detection requires a richer level of semantic information from video. In order to overcome this challenge, existing solutions often represent videos using high level features such as concepts. However, concept-based representation can be confusing because it does not encode the relationship between concepts. This issue can be addressed by exploiting the co-occurrences of the concepts, however, it often leads to a very huge number of possible combinations. In this paper, we propose a new approach to obtain the relationship between concepts by exploiting the syntactic dependencies between words in the image captions. The main advantage of this approach is that it significantly reduces the number of informative combinations between concepts. We conduct extensive experiments to analyze the effectiveness of using the new dependency representation for event detection on two large-scale TRECVID Multimedia Event Detection 2013 and 2014 datasets. Experimental results show that i) Dependency features are more discriminative than concept-based features. ii) Dependency features can be combined with our current event detection system to further improve the performance. For instance, the relative improvement can be as far as 8.6% on the MEDTEST14 10Ex setting.

pdf bib
ccg2lambda: A Compositional Semantics System
Pascual Martínez-Gómez | Koji Mineshima | Yusuke Miyao | Daisuke Bekki
Proceedings of ACL-2016 System Demonstrations

pdf bib
Jigg: A Framework for an Easy Natural Language Processing Pipeline
Hiroshi Noji | Yusuke Miyao
Proceedings of ACL-2016 System Demonstrations

2015

pdf bib
SemEval 2015 Task 18: Broad-Coverage Semantic Dependency Parsing
Stephan Oepen | Marco Kuhlmann | Yusuke Miyao | Daniel Zeman | Silvie Cinková | Dan Flickinger | Jan Hajič | Zdeňka Urešová
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf bib
Higher-order logical inference with compositional semantics
Koji Mineshima | Pascual Martínez-Gómez | Yusuke Miyao | Daisuke Bekki
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Incorporating Complementary Annotation to a CCGbank for Improving Derivations for Japanese
Sumire Uematsu | Yusuke Miyao
Proceedings of the 14th International Conference on Parsing Technologies

pdf bib
Optimal Shift-Reduce Constituent Parsing with Structured Perceptron
Le Quang Thang | Hiroshi Noji | Yusuke Miyao
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf bib
Discriminative Preordering Meets Kendall’s 𝜏 Maximization
Sho Hoshino | Yusuke Miyao | Katsuhito Sudoh | Katsuhiko Hayashi | Masaaki Nagata
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

pdf bib
A Lexicalized Tree Kernel for Open Information Extraction
Ying Xu | Christoph Ringlstetter | Mi-Young Kim | Grzegorz Kondrak | Randy Goebel | Yusuke Miyao
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

pdf bib
Paraphrase Detection Based on Identical Phrase and Similar Word Matching
Hoang-Quoc Nguyen-Son | Yusuke Miyao | Isao Echizen
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation

2014

pdf bib
Annotation of Computer Science Papers for Semantic Relation Extrac-tion
Yuka Tateisi | Yo Shidahara | Yusuke Miyao | Akiko Aizawa
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We designed a new annotation scheme for formalising relation structures in research papers, through the investigation of computer science papers. The annotation scheme is based on the hypothesis that identifying the role of entities and events that are described in a paper is useful for intelligent information retrieval in academic literature, and the role can be determined by the relationship between the author and the described entities or events, and relationships among them. Using the scheme, we have annotated research abstracts from the IPSJ Journal published in Japanese by the Information Processing Society of Japan. On the basis of the annotated corpus, we have developed a prototype information extraction system which has the facility to classify sentences according to the relationship between entities mentioned, to help find the role of the entity in which the searcher is interested.

pdf bib
Overview of Todai Robot Project and Evaluation Framework of its NLP-based Problem Solving
Akira Fujita | Akihiro Kameda | Ai Kawazoe | Yusuke Miyao
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We introduce the organization of the Todai Robot Project and discuss its achievements. The Todai Robot Project task focuses on benchmarking NLP systems for problem solving. This task encourages NLP-based systems to solve real high-school examinations. We describe the details of the method to manage question resources and their correct answers, answering tools and participation by researchers in the task. We also analyse the answering accuracy of the developed systems by comparing the systems’ answers with answers given by human test-takers.

pdf bib
Encoding Generalized Quantifiers in Dependency-based Compositional Semantics
Yubing Dong | Ran Tian | Yusuke Miyao
Proceedings of the 28th Pacific Asia Conference on Language, Information and Computing

pdf bib
SemEval 2014 Task 8: Broad-Coverage Semantic Dependency Parsing
Stephan Oepen | Marco Kuhlmann | Yusuke Miyao | Daniel Zeman | Dan Flickinger | Jan Hajič | Angelina Ivanova | Yi Zhang
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf bib
In-House: An Ensemble of Pre-Existing Off-the-Shelf Parsers
Yusuke Miyao | Stephan Oepen | Daniel Zeman
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf bib
Efficient Logical Inference for Semantic Processing
Ran Tian | Yusuke Miyao | Takuya Matsuzaki
Proceedings of the ACL 2014 Workshop on Semantic Parsing

pdf bib
Significance of Bridging Real-world Documents and NLP Technologies
Tadayoshi Hara | Goran Topić | Yusuke Miyao | Akiko Aizawa
Proceedings of the Workshop on Open Infrastructures and Analysis Frameworks for HLT

pdf bib
Japanese to English Machine Translation using Preordering and Compositional Distributed Semantics
Sho Hoshino | Hubert Soyer | Yusuke Miyao | Akiko Aizawa
Proceedings of the 1st Workshop on Asian Translation (WAT2014)

pdf bib
Formalizing Word Sampling for Vocabulary Prediction as Graph-based Active Learning
Yo Ehara | Yusuke Miyao | Hidekazu Oiwa | Issei Sato | Hiroshi Nakagawa
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Logical Inference on Dependency-based Compositional Semantics
Ran Tian | Yusuke Miyao | Takuya Matsuzaki
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Left-corner Transitions on Dependency Parsing
Hiroshi Noji | Yusuke Miyao
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

2013

pdf bib
Improvements to the Bayesian Topic N-Gram Models
Hiroshi Noji | Daichi Mochihashi | Yusuke Miyao
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
Integrating Multiple Dependency Corpora for Inducing Wide-coverage Japanese CCG Resources
Sumire Uematsu | Takuya Matsuzaki | Hiroki Hanaoka | Yusuke Miyao | Hideki Mima
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Building Japanese Textual Entailment Specialized Data Sets for Inference of Basic Sentence Relations
Kimi Kaneko | Yusuke Miyao | Daisuke Bekki
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Utilizing State-of-the-art Parsers to Diagnose Problems in Treebank Annotation for a Less Resourced Language
Quy Nguyen | Ngan Nguyen | Yusuke Miyao
Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse

pdf bib
Relation Annotation for Understanding Research Papers
Yuka Tateisi | Yo Shidahara | Yusuke Miyao | Akiko Aizawa
Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse

pdf bib
Using unlabeled dependency parsing for pre-reordering for Chinese-to-Japanese statistical machine translation
Dan Han | Pascual Martínez-Gómez | Yusuke Miyao | Katsuhito Sudoh | Masaaki Nagata
Proceedings of the Second Workshop on Hybrid Approaches to Translation

pdf bib
Deep Context-Free Grammar for Chinese with Broad-Coverage
Xiangli Wang | Yi Zhang | Yusuke Miyao | Takuya Matsuzaki | Junichi Tsujii
Proceedings of the Seventh SIGHAN Workshop on Chinese Language Processing

pdf bib
Effects of Parsing Errors on Pre-Reordering Performance for Chinese-to-Japanese SMT
Dan Han | Pascual Martínez-Gómez | Yusuke Miyao | Katsuhito Sudoh | Masaaki Nagata
Proceedings of the 27th Pacific Asia Conference on Language, Information, and Computation (PACLIC 27)

pdf bib
Alignment-based Annotation of Proofreading Texts toward Professional Writing Assistance
Ngan Nguyen | Yusuke Miyao
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
Two-Stage Pre-ordering for Japanese-to-English Statistical Machine Translation
Sho Hoshino | Yusuke Miyao | Katsuhito Sudoh | Masaaki Nagata
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
University Entrance Examinations as a Benchmark Resource for NLP-based Problem Solving
Yusuke Miyao | Ai Kawazoe
Proceedings of the Sixth International Joint Conference on Natural Language Processing

2012

pdf bib
Bayesian Symbol-Refined Tree Substitution Grammars for Syntactic Parsing
Hiroyuki Shindo | Yusuke Miyao | Akinori Fujino | Masaaki Nagata
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Incremental Joint Approach to Word Segmentation, POS Tagging, and Dependency Parsing in Chinese
Jun Hatori | Takuya Matsuzaki | Yusuke Miyao | Jun’ichi Tsujii
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Comparing Different Criteria for Vietnamese Word Segmentation
Quy T. Nguyen | Ngan L.T. Nguyen | Yusuke Miyao
Proceedings of the 3rd Workshop on South and Southeast Asian Natural Language Processing

pdf bib
Answering Yes/No Questions via Question Inversion
Hiroshi Kanayama | Yusuke Miyao | John Prager
Proceedings of COLING 2012

pdf bib
Annotating Factive Verbs
Alvin Grissom II | Yusuke Miyao
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We have created a scheme for annotating corpora designed to capture relevant aspects of factivity in verb-complement constructions. Factivity constructions are a well-known linguistic phenomenon that embed presuppositions about the state of the world into a clause. These embedded presuppositions provide implicit information about facts assumed to be true in the world, and are thus potentially valuable in areas of research such as textual entailment. We attempt to address both clear-cut cases of factivity and non-factivity, as well as account for the fluidity and ambiguous nature of some realizations of this construction. Our extensible scheme is designed to account for distinctions between claims, performatives, atypical uses of factivity, and the authority of the one making the utterance. We introduce a simple XML-based syntax for the annotation of factive verbs and clauses, in order to capture this information. We also provide an analysis of the issues which led to these annotative decisions, in the hope that these analyses will be beneficial to those dealing with factivity in a practical context.

pdf bib
Building Japanese Predicate-argument Structure Corpus using Lexical Conceptual Structure
Yuichiroh Matsubayashi | Yusuke Miyao | Akiko Aizawa
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper introduces our study on creating a Japanese corpus that is annotated using semantically-motivated predicate-argument structures. We propose an annotation framework based on Lexical Conceptual Structure (LCS), where semantic roles of arguments are represented through a semantic structure decomposed by several primitive predicates. As a first stage of the project, we extended Jackendoff 's LCS theory to increase generality of expression and coverage for verbs frequently appearing in the corpus, and successfully created LCS structures for 60 frequent Japanese predicates in Kyoto university Text Corpus (KTC). In this paper, we report our framework for creating the corpus and the current status of creating an LCS dictionary for Japanese predicates.

pdf bib
Framework of Semantic Role Assignment based on Extended Lexical Conceptual Structure: Comparison with VerbNet and FrameNet
Yuichiroh Matsubayashi | Yusuke Miyao | Akiko Aizawa
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

2011

pdf bib
Parsing Natural Language Queries for Life Science Knowledge
Tadayoshi Hara | Yuka Tateisi | Jin-Dong Kim | Yusuke Miyao
Proceedings of BioNLP 2011 Workshop

pdf bib
Learning with Lookahead: Can History-Based Models Rival Globally Optimized Models?
Yoshimasa Tsuruoka | Yusuke Miyao | Jun’ichi Kazama
Proceedings of the Fifteenth Conference on Computational Natural Language Learning

pdf bib
A Collaborative Annotation between Human Annotators and a Statistical Parser
Shun’ya Iwasawa | Hiroki Hanaoka | Takuya Matsuzaki | Yusuke Miyao | Jun’ichi Tsujii
Proceedings of the 5th Linguistic Annotation Workshop

pdf bib
Analysis of the Difficulties in Chinese Deep Parsing
Kun Yu | Yusuke Miyao | Takuya Matsuzaki | Xiangli Wang | Junichi Tsujii
Proceedings of the 12th International Conference on Parsing Technologies

pdf bib
Exploring Difficulties in Parsing Imperatives and Questions
Tadayoshi Hara | Takuya Matsuzaki | Yusuke Miyao | Jun’ichi Tsujii
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
Incremental Joint POS Tagging and Dependency Parsing in Chinese
Jun Hatori | Takuya Matsuzaki | Yusuke Miyao | Jun’ichi Tsujii
Proceedings of 5th International Joint Conference on Natural Language Processing

2010

pdf bib
Wide-Coverage NLP with Linguistically Expressive Grammars
Julia Hockenmaier | Yusuke Miyao | Josef van Genabith
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts

pdf bib
A Modular Architecture for the Wide-Coverage Translation of Natural Language Texts into Predicate Logic Formulas
Yusuke Miyao | Alastair Butler | Kei Yoshimoto | Jun’ichi Tsujii
Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation

pdf bib
The Deep Re-Annotation in a Chinese Scientific Treebank
Kun Yu | Xiangli Wang | Yusuke Miyao | Takuya Matsuzaki | Junichi Tsujii
Proceedings of the Fourth Linguistic Annotation Workshop

pdf bib
Entity-Focused Sentence Simplification for Relation Extraction
Makoto Miwa | Rune Sætre | Yusuke Miyao | Jun’ichi Tsujii
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

pdf bib
Semi-automatically Developing Chinese HPSG Grammar from the Penn Chinese Treebank for Deep Parsing
Kun Yu | Yusuke Miyao | Xiangli Wang | Takuya Matsuzaki | Junichi Tsujii
Coling 2010: Posters

2009

pdf bib
The UOT system
Xianchao Wu | Takuya Matsuzaki | Naoaki Okazaki | Yusuke Miyao | Jun’ichi Tsujii
Proceedings of the 6th International Workshop on Spoken Language Translation: Evaluation Campaign

We present the UOT Machine Translation System that was used in the IWSLT-09 evaluation campaign. This year, we participated in the BTEC track for Chinese-to-English translation. Our system is based on a string-to-tree framework. To integrate deep syntactic information, we propose the use of parse trees and semantic dependencies on English sentences described respectively by Head-driven Phrase Structure Grammar and Predicate-Argument Structures. We report the results of our system on both the development and test sets.

pdf bib
A Rich Feature Vector for Protein-Protein Interaction Extraction from Multiple Corpora
Makoto Miwa | Rune Sætre | Yusuke Miyao | Jun’ichi Tsujii
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
Descriptive and Empirical Approaches to Capturing Underlying Dependencies among Parsing Errors
Tadayoshi Hara | Yusuke Miyao | Jun’ichi Tsujii
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
Supervised Learning of a Probabilistic Lexicon of Verb Semantic Classes
Yusuke Miyao | Jun’ichi Tsujii
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
Design of Chinese HPSG Framework for Data-Driven Parsing
Xiangli Wang | Shunya Iwasawa | Yusuke Miyao | Takuya Matsuzaki | Kun Yu | Jun’ichi Tsujii
Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, Volume 2

pdf bib
Effective Analysis of Causes and Inter-dependencies of Parsing Errors
Tadayoshi Hara | Yusuke Miyao | Jun’ichi Tsujii
Proceedings of the 11th International Conference on Parsing Technologies (IWPT’09)

2008

pdf bib
Evaluating the Effects of Treebank Size in a Practical Application for Parsing
Kenji Sagae | Yusuke Miyao | Rune Saetre | Jun’ichi Tsujii
Software Engineering, Testing, and Quality Assurance for Natural Language Processing

pdf bib
Coling 2008: Proceedings of the workshop on Cross-Framework and Cross-Domain Parser Evaluation
Johan Bos | Edward Briscoe | Aoife Cahill | John Carroll | Stephen Clark | Ann Copestake | Dan Flickinger | Josef van Genabith | Julia Hockenmaier | Aravind Joshi | Ronald Kaplan | Tracy Holloway King | Sandra Kuebler | Dekang Lin | Jan Tore Lønning | Christopher Manning | Yusuke Miyao | Joakim Nivre | Stephan Oepen | Kenji Sagae | Nianwen Xue | Yi Zhang
Coling 2008: Proceedings of the workshop on Cross-Framework and Cross-Domain Parser Evaluation

pdf bib
Parser Evaluation Across Frameworks without Format Conversion
Wai Lok Tam | Yo Sato | Yusuke Miyao | Junichi Tsujii
Coling 2008: Proceedings of the workshop on Cross-Framework and Cross-Domain Parser Evaluation

pdf bib
Task-oriented Evaluation of Syntactic Parsers and Their Representations
Yusuke Miyao | Rune Sætre | Kenji Sagae | Takuya Matsuzaki | Jun’ichi Tsujii
Proceedings of ACL-08: HLT

pdf bib
Towards Data and Goal Oriented Analysis: Tool Inter-operability and Combinatorial Comparison
Yoshinobu Kano | Ngan Nguyen | Rune Sætre | Kazuhiro Yoshida | Keiichiro Fukamachi | Yusuke Miyao | Yoshimasa Tsuruoka | Sophia Ananiadou | Jun’ichi Tsujii
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-II

pdf bib
GENIA-GR: a Grammatical Relation Corpus for Parser Evaluation in the Biomedical Domain
Yuka Tateisi | Yusuke Miyao | Kenji Sagae | Jun’ichi Tsujii
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We report the construction of a corpus for parser evaluation in the biomedical domain. A 50-abstract subset (492 sentences) of the GENIA corpus (Kim et al., 2003) is annotated with labeled head-dependent relations using the grammatical relations (GR) evaluation scheme (Carroll et al., 1998) ,which has been used for parser evaluation in the newswire domain.

pdf bib
Word Sense Disambiguation for All Words using Tree-Structured Conditional Random Fields
Jun Hatori | Yusuke Miyao | Jun’ichi Tsujii
Coling 2008: Companion volume: Posters

pdf bib
Exact Inference for Multi-label Classification using Sparse Graphical Models
Yusuke Miyao | Jun’ichi Tsujii
Coling 2008: Companion volume: Posters

pdf bib
Feature Forest Models for Probabilistic HPSG Parsing
Yusuke Miyao | Jun’ichi Tsujii
Computational Linguistics, Volume 34, Number 1, March 2008

2007

pdf bib
HPSG Parsing with Shallow Dependency Constraints
Kenji Sagae | Yusuke Miyao | Jun’ichi Tsujii
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

pdf bib
Evaluating Impact of Re-training a Lexical Disambiguation Model on Domain Adaptation of an HPSG Parser
Tadayoshi Hara | Yusuke Miyao | Jun’ichi Tsujii
Proceedings of the Tenth International Conference on Parsing Technologies

pdf bib
A log-linear model with an n-gram reference distribution for accurate HPSG parsing
Takashi Ninomiya | Takuya Matsuzaki | Yusuke Miyao | Jun’ichi Tsujii
Proceedings of the Tenth International Conference on Parsing Technologies

2006

pdf bib
Improving the Scalability of Semi-Markov Conditional Random Fields for Named Entity Recognition
Daisuke Okanohara | Yusuke Miyao | Yoshimasa Tsuruoka | Jun’ichi Tsujii
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf bib
Semantic Retrieval for the Accurate Identification of Relational Concepts in Massive Textbases
Yusuke Miyao | Tomoko Ohta | Katsuya Masuda | Yoshimasa Tsuruoka | Kazuhiro Yoshida | Takashi Ninomiya | Jun’ichi Tsujii
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf bib
Translating HPSG-Style Outputs of a Robust Parser into Typed Dynamic Logic
Manabu Sato | Daisuke Bekki | Yusuke Miyao | Jun’ichi Tsujii
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

pdf bib
Trimming CFG Parse Trees for Sentence Compression Using Machine Learning Approaches
Yuya Unno | Takashi Ninomiya | Yusuke Miyao | Jun’ichi Tsujii
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

pdf bib
An Intelligent Search Engine and GUI-based Efficient MEDLINE Search Tool Based on Deep Syntactic Parsing
Tomoko Ohta | Yusuke Miyao | Takashi Ninomiya | Yoshimasa Tsuruoka | Akane Yakushiji | Katsuya Masuda | Jumpei Takeuchi | Kazuhiro Yoshida | Tadayoshi Hara | Jin-Dong Kim | Yuka Tateisi | Jun’ichi Tsujii
Proceedings of the COLING/ACL 2006 Interactive Presentation Sessions

pdf bib
Extremely Lexicalized Models for Accurate and Fast HPSG Parsing
Takashi Ninomiya | Takuya Matsuzaki | Yoshimasa Tsuruoka | Yusuke Miyao | Jun’ichi Tsujii
Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing

pdf bib
Automatic Construction of Predicate-argument Structure Patterns for Biomedical Information Extraction
Akane Yakushiji | Yusuke Miyao | Tomoko Ohta | Yuka Tateisi | Jun’ichi Tsujii
Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing

2005

pdf bib
Adapting a Probabilistic Disambiguation Model of an HPSG Parser to a New Domain
Tadayoshi Hara | Yusuke Miyao | Jun’ichi Tsujii
Second International Joint Conference on Natural Language Processing: Full Papers

pdf bib
Probabilistic Models for Disambiguation of an HPSG-Based Chart Generator
Hiroko Nakanishi | Yusuke Miyao | Jun’ichi Tsujii
Proceedings of the Ninth International Workshop on Parsing Technology

pdf bib
Efficacy of Beam Thresholding, Unification Filtering and Hybrid Parsing in Probabilistic HPSG Parsing
Takashi Ninomiya | Yoshimasa Tsuruoka | Yusuke Miyao | Jun’ichi Tsujii
Proceedings of the Ninth International Workshop on Parsing Technology

pdf bib
Probabilistic CFG with Latent Annotations
Takuya Matsuzaki | Yusuke Miyao | Jun’ichi Tsujii
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05)

pdf bib
Probabilistic Disambiguation Models for Wide-Coverage HPSG Parsing
Yusuke Miyao | Jun’ichi Tsujii
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05)

2004

pdf bib
Finding Anchor Verbs for Biomedical IE Using Predicate-Argument Structures
Akane Yakushiji | Yuka Tateisi | Yusuke Miyao | Jun’ichi Tsujii
Proceedings of the ACL Interactive Poster and Demonstration Sessions

pdf bib
Deep Linguistic Analysis for the Accurate Identification of Predicate-Argument Relations
Yusuke Miyao | Jun’ichi Tsujii
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

2003

pdf bib
A Robust Retrieval Engine for Proximal and Structural Search
Katsuya Masuda | Takashi Ninomiya | Yusuke Miyao | Tomoko Ohta | Jun’ichi Tsujii
Companion Volume of the Proceedings of HLT-NAACL 2003 - Short Papers

pdf bib
A Debug Tool for Practical Grammar Development
Akane Yakushiji | Yuka Tateisi | Yusuke Miyao | Naoki Yoshinaga | Jun’ichi Tsujii
The Companion Volume to the Proceedings of 41st Annual Meeting of the Association for Computational Linguistics

pdf bib
Lexicalized Grammar Acquisition
Yusuke Miyao | Takashi Ninomiya | Jun’ichi Tsujii
10th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
A model of syntactic disambiguation based on lexicalized grammars
Yusuke Miyao | Jun’ichi Tsujii
Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003

pdf bib
An efficient clustering algorithm for class-based language models
Takuya Matsuzaki | Yusuke Miyao | Jun’ichi Tsujii
Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003

2002

pdf bib
A Formal Proof of Strong Equivalence for a Grammar Conversion from LTAG to HPSG-style
Naoki Yoshinaga | Yusuke Miyao | Jun’ichi Tsujii
Proceedings of the Sixth International Workshop on Tree Adjoining Grammar and Related Frameworks (TAG+6)

pdf bib
Clustering for obtaining syntactic classes of words from automatically extracted LTAG grammars
Tadayoshi Hara | Yusuke Miyao | Jun’ichi Tsujii
Proceedings of the Sixth International Workshop on Tree Adjoining Grammar and Related Frameworks (TAG+6)

pdf bib
Lenient Default Unification for Robust Processing within Unification Based Grammar Formalisms
Takashi Ninomiya | Yusuke Miyao | Jun-Ichi Tsujii
COLING 2002: The 19th International Conference on Computational Linguistics

2001

pdf bib
Resource Sharing Amongst HPSG and LTAG Communities by a Method of Grammar Conversion between FB-LTAG and HPSG
Naoki Yoshinaga | Yusuke Miyao | Kentaro Torisawa | Jun’ichi Tsujii
Proceedings of the ACL 2001 Workshop on Sharing Tools and Resources

1999

pdf bib
Packing of Feature Structures for Efficient Unification of Disjunctive Feature Structures
Yusuke Miyao
Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics

1998

pdf bib
Packing of feature structures for optimizing the HPSG-style grammar translated from TAG
Yusuke Miyao | Kentaro Torisawa | Yuka Tateisi | Jun’ichi Tsujii
Proceedings of the Fourth International Workshop on Tree Adjoining Grammars and Related Frameworks (TAG+4)

pdf bib
Translating the XTAG English grammar to HPSG
Yuka Tateisi | Kentaro Torisawa | Yusuke Miyao | Jun’ichi Tsujii
Proceedings of the Fourth International Workshop on Tree Adjoining Grammars and Related Frameworks (TAG+4)

Search
Co-authors