In Japan, the number of single-person households, particularly among the elderly, is increasing. Consequently, opportunities for people to narrate are being reduced. To address this issue, conversational agents, e.g., communication robots and smart speakers, are expected to play the role of the listener. To realize these agents, this paper describes the collection of conversational responses by listeners that demonstrate attentive listening attitudes toward narrative speakers, and a method to annotate existing narrative speech with responsive utterances is proposed. To summarize, 148,962 responsive utterances by 11 listeners were collected in a narrative corpus comprising 13,234 utterance units. The collected responsive utterances were analyzed in terms of response frequency, diversity, coverage, and naturalness. These results demonstrated that diverse and natural responsive utterances were collected by the proposed method in an efficient and comprehensive manner. To demonstrate the practical use of the collected responsive utterances, an experiment was conducted, in which response generation timings were detected in narratives.
Utilizing citations for research artifacts (e.g., dataset, software) in scholarly papers contributes to efficient expansion of research artifact repositories and various applications e.g., the search, recommendation, and evaluation of such artifacts. This study focuses on citations using URLs (URL citations) and aims to identify and analyze research artifact citations automatically. This paper addresses the classification task for each URL citation to identify (1) the role that the referenced resources play in research activities, (2) the type of referenced resources, and (3) the reason why the author cited the resources. This paper proposes the classification method using section titles and footnote texts as new input features. We extracted URL citations from international conference papers as experimental data. We performed 5-fold cross-validation using the data and computed the classification performance of our method. The results demonstrate that our method is effective in all tasks. An additional experiment demonstrates that using cited URLs as input features is also effective.
This paper proposes a new representation for CCG derivations. CCG derivations are represented as trees whose nodes are labeled with categories strictly restricted by CCG rule schemata. This characteristic is not suitable for span-based parsing models because they predict node labels independently. In other words, span-based models may generate invalid CCG derivations that violate the rule schemata. Our proposed representation decomposes CCG derivations into several independent pieces and prevents the span-based parsing models from violating the schemata. Our experimental result shows that an off-the-shelf span-based parser with our representation is comparable with previous CCG parsers.
Nowadays, spoken dialogue agents such as communication robots and smart speakers listen to narratives of humans. In order for such an agent to be recognized as a listener of narratives and convey the attitude of attentive listening, it is necessary to generate responsive utterances. Moreover, responsive utterances can express empathy to narratives and showing an appropriate degree of empathy to narratives is significant for enhancing speaker’s motivation. The degree of empathy shown by responsive utterances is thought to depend on their type. However, the relation between responsive utterances and degrees of the empathy has not been explored yet. This paper describes the classification of responsive utterances based on the degree of empathy in order to explain that relation. In this research, responsive utterances are classified into five levels based on the effect of utterances and literature on attentive listening. Quantitative evaluations using 37,995 responsive utterances showed the appropriateness of the proposed classification.
A gapping construction consists of a coordinated structure where redundant elements are elided from all but one conjuncts. This paper proposes a method of parsing sentences with gapping to recover elided elements. The proposed method is based on constituent trees annotated with grammatical and semantic roles that are useful for identifying elided elements. Our method outperforms the previous method in terms of F-measure and recall.
The Penn Treebank (PTB) represents syntactic structures as graphs due to nonlocal dependencies. This paper proposes a method that approximates PTB graph-structured representations by trees. By our approximation method, we can reduce nonlocal dependency identification and constituency parsing into single tree-based parsing. An experimental result demonstrates that our approximation method with an off-the-shelf tree-based constituency parser significantly outperforms the previous methods in nonlocal dependency identification.
This paper provides a new method to correct annotation errors in a treebank. The previous error correction method constructs a pseudo parallel corpus where incorrect partial parse trees are paired with correct ones, and extracts error correction rules from the parallel corpus. By applying these rules to a treebank, the method corrects errors. However, this method does not achieve wide coverage of error correction. To achieve wide coverage, our method adopts a different approach. In our method, we consider that an infrequent pattern which can be transformed to a frequent one is an annotation error pattern. Based on a tree mining technique, our method seeks such infrequent tree patterns, and constructs error correction rules each of which consists of an infrequent pattern and a corresponding frequent pattern. We conducted an experiment using the Penn Treebank. We obtained 1,987 rules which are not constructed by the previous method, and the rules achieved good precision.
In spoken dialogues, if a spoken dialogue system does not respond at all during users utterances, the user might feel uneasy because the user does not know whether or not the system has recognized the utterances. In particular, back-channel utterances, which the system outputs as voices such as yeah and uh huh in English have important roles for a driver in in-car speech dialogues because the driver does not look owards a listener while driving. This paper describes construction of a back-channel utterance corpus and its analysis to develop the system which can output back-channel utterances at the proper timing in the responsive in-car speech dialogue. First, we constructed the back-channel utterance corpus by integrating the back-channel utterances that four subjects provided for the drivers utterances in 60 dialogues in the CIAIR in-car speech dialogue corpus. Next, we analyzed the corpus and revealed the relation between back-channel utterance timings and information on bunsetsu, clause, pause and rate of speech. Based on the analysis, we examined the possibility of detecting back-channel utterance timings by machine learning technique. As the result of the experiment, we confirmed that our technique achieved as same detection capability as a human.
With the development of speech and language processing, speech translation systems have been developed. These studies target spoken dialogues, and employ consecutive interpretation, which uses a sentence as the translation unit. On the other hand, there exist a few researches about simultaneous interpreting, and recently, the language resources for promoting simultaneous interpreting research, such as the publication of an analytical large-scale corpus, has been prepared. For the future, it is necessary to make the corpora more practical toward realization of a simultaneous interpreting system. In this paper, we describe the construction of a bilingual corpus which can be used for simultaneous lecture interpreting research. Simultaneous lecture interpreting systems are required to recognize translation units in the middle of a sentence, and generate its translation at the proper timing. We constructed the bilingual lecture corpus by the following steps. First, we segmented sentences in the lecture data into semantically meaningful units for the simultaneous interpreting. And then, we assigned the translations to these units from the viewpoint of the simultaneous interpreting. In addition, we investigated the possibility of automatically detecting the simultaneous interpreting timing from our corpus.
Recently, language resources (LRs) are becoming indispensable for linguistic researches. However, existing LRs are often not fully utilized because their variety of usage is not well known, indicating that their intrinsic value is not recognized very well either. Regarding this issue, lists of usage information might improve LR searches and lead to their efficient use. In this research, therefore, we collect a list of usage information for each LR from academic articles to promote the efficient utilization of LRs. This paper proposes to construct a text corpus annotated with usage information (UI corpus). In particular, we automatically extract sentences containing LR names from academic articles. Then, the extracted sentences are annotated with usage information by two annotators in a cascaded manner. We show that the UI corpus contributes to efficient LR searches by combining the UI corpus with a metadata database of LRs and comparing the number of LRs retrieved with and without the UI corpus.
Recently, language resources (LRs) are becoming indispensable for linguistic research. Unfortunately, it is not easy to find their usages by searching the web even though they must be described in the Internet or academic articles. This indicates that the intrinsic value of LRs is not recognized very well. In this research, therefore, we extract a list of usage information for each LR to promote the efficient utilization of LRs. In this paper, we proposed a method for extracting a list of usage information from academic articles by using rules based on syntactic information. The rules are generated by focusing on the syntactic features that are observed in the sentences describing usage information. As a result of experiments, we achieved 72.9% in recall and 78.4% in precision for the closed test and 60.9% in recall and 72.7% in precision for the open test.
The National Institute of Information and Communications Technology (NICT) and Nagoya University have been jointly constructing a large scale database named SHACHI by collecting detailed meta-information on language resources (LRs) in Asia and Western countries, for the purpose of effectively combining LRs. The purpose of this project is to investigate languages, tag sets, and formats compiled in LRs throughout the world, to systematically store LR metadata, to create a search function for this information, and to ultimately utilize all this for a more efficient development of LRs. This metadata database contains more than 2,000 compiled LRs such as corpora, dictionaries, thesauruses and lexicons, forming a large scale metadata of LRs archive. Its metadata, an extended version of OLAC metadata set conforming to Dublin Core, which contain detailed meta-information, have been collected semi-automatically. This paper explains the design and the structure of the metadata database, as well as the realization of the catalogue search tool. Additionally, the website of this database is now open to the public and accessible to all Internet users.
In this paper, quantitative analyses of the delay in Japanese-to-English (J-E) and English-to-Japanese (E-J) interpretations are described. The Simultaneous Interpretation Database of Nagoya University (SIDB) was used for the analyses. Beginning time and end time of each word were provided to the corpus using HMM-based phoneme segmentation, and the time lag between the corresponding words was calculated as the word-level delay. Word-level delay was calculated for 3,722 pairs and 4,932 pairs of words for J-E and E-J interpretations, respectively. The analyses revealed that J-E interpretation has much larger delay than E-J interpretation and that the difference of word order between Japanese and English affect the degree of delay.
This paper describes the design of speech act tags for spoken dialogue corpora and its evaluation. Compared with the tags used for conventional corpus annotation, the proposed speech intention tag is specialized enough to determine system operations. However, detailed information description increases tag types. This causes an ambiguous tag selection. Therefore, we have designed an organization of tags, with focusing attention on layered tagging and context-dependent tagging. Over 35,000 utterance units in the CIAIR corpus have been tagged by hand. To evaluate the reliability of the intention tag, a tagging experiment was conducted. The reliability of tagging is evaluated by comparing the tagging among some annotators using kappa value. As a result, we confirmed that reliable data could be built. This corpus with speech intention tag could be widely used from basic research to applications of spoken dialogue. In particular, this would play an important role from the viewpoint of practical use of spoken dialogue corpora.
Recently, monologue data such as lecture and commentary by professionals have been considered as valuable intellectual resources, and have been gathering attention. On the other hand, in order to use these monologue data effectively and efficiently, it is necessary for the monologue data not only just to be accumulated but also to be structured. This paper describes the construction of a Japanese spoken monologue corpus in which dependency structure is given to each utterance. Spontaneous monologue includes a lot of very long sentences composed of two or more clauses. In these sentences, there may exist the subject or the adverb common to multi-clauses, and it may be considered that the subject or adverb depend on multi-predicates. In order to give the dependency information in a real fashion, our research allows that a bunsetsu depends on multiple bunsetsus.
The manual quantitative analysis of CIAIR simultaneous interpretation corpus and the collection of interpreting patterns This paper provides an investigation of simultaneous interpreting patterns using a bilingual spoken monologue corpus. 4,578 pairs of English-Japanese aligned utterances in CIAIR simultaneous interpretation database were used. This investigation is the largest scale as the observation of simultaneous interpreting speech. The simultaneous interpreters are required to generate the target speech simultaneously with the source speech. Therefore, they have various kinds of strategies to raise simultaneity. In this investigation, the simultaneous interpreting patterns with high frequency and high flexibility were extracted from the corpus. As a result, we collected 203 cases out of aligned utterances in which simultaneous interpretersf strategies for raising simultaneity were observed. These 203 cases could be categorized into 12 types of interpreting pattern. It was clarified that 4.5 percent of the English-Japanese monologue data were fitted in those interpreting patterns. These interpreting patterns can be expected to be used as interpreting rules of simultaneous machine interpretation.
This paper presents a corpus search system utilizing lexical dependency structure. The user's query consists of lexical dependency structure. The user's query consists of a sequence of keywords. For a given query, the system automatically generates the dependency structure patterns which consist of keywords in the query, and returns the sentences whose dependency structures match the generated patterns. The dependency structure patterns are generated by using two operations: combining and interpolation, which utilize dependency structures in the searched corpus. The operations enable the system to generate only the dependency structure patterns that occur in the corpus. The system achieves simple and intuitive corpus search and it is enough linguistically sophisticated to utilize structural information.