Akiko Aizawa

Also published as: Akiko N. Aizawa


2024

pdf bib
Unsupervised Domain Adaptation for Keyphrase Generation using Citation Contexts
Florian Boudin | Akiko Aizawa
Findings of the Association for Computational Linguistics: EMNLP 2024

Adapting keyphrase generation models to new domains typically involves few-shot fine-tuning with in-domain labeled data. However, annotating documents with keyphrases is often prohibitively expensive and impractical, requiring expert annotators. This paper presents silk, an unsupervised method designed to address this issue by extracting silver-standard keyphrases from citation contexts to create synthetic labeled data for domain adaptation. Extensive experiments across three distinct domains demonstrate that our method yields high-quality synthetic samples, resulting in significant and consistent improvements in in-domain performance over strong baselines.

pdf bib
Improving Referring Ability for Biomedical Language Models
Junfeng Jiang | Fei Cheng | Akiko Aizawa
Findings of the Association for Computational Linguistics: EMNLP 2024

Existing auto-regressive large language models (LLMs) are primarily trained using documents from general domains. In the biomedical domain, continual pre-training is a prevalent method for domain adaptation to inject professional knowledge into powerful LLMs that have been pre-trained in general domains. Previous studies typically conduct standard pre-training by randomly packing multiple documents into a long pre-training sequence. Recently, some existing works suggest that enhancing the relatedness of documents within the same pre-training sequence may be advantageous. However, these studies primarily focus on general domains, which cannot be readily applied in the biomedical domain where the distinction of fine-grained topics is harder. Is it possible to further improve the pre-training for biomedical language models (LMs) using exactly the same corpus? In this paper, we explore an improved approach to continual pre-training, which is a prevalent method for domain adaptation, by utilizing information from the citation network in this challenging scenario. Empirical studies demonstrate that our proposed LinkLM data improves both the intra-sample and inter-sample referring abilities of auto-regressive LMs in the biomedical domain, encouraging more profound consideration of task-specific pre-training sequence design for continual pre-training.

pdf bib
MAGPIE: Multi-Task Analysis of Media-Bias Generalization with Pre-Trained Identification of Expressions
Tomáš Horych | Martin Paul Wessel | Jan Philip Wahle | Terry Ruas | Jerome Waßmuth | André Greiner-Petter | Akiko Aizawa | Bela Gipp | Timo Spinde
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Media bias detection poses a complex, multifaceted problem traditionally tackled using single-task models and small in-domain datasets, consequently lacking generalizability. To address this, we introduce MAGPIE, a large-scale multi-task pre-training approach explicitly tailored for media bias detection. To enable large-scale pre-training, we construct Large Bias Mixture (LBM), a compilation of 59 bias-related tasks. MAGPIE outperforms previous approaches in media bias detection on the Bias Annotation By Experts (BABE) dataset, with a relative improvement of 3.3% F1-score. Furthermore, using a RoBERTa encoder, we show that MAGPIE needs only 15% of fine-tuning steps compared to single-task approaches. We provide insight into task learning interference and show that sentiment analysis and emotion detection help learning of all other tasks, and scaling the number of tasks leads to the best results. MAGPIE confirms that MTL is a promising approach for addressing media bias detection, enhancing the accuracy and efficiency of existing models. Furthermore, LBM is the first available resource collection focused on media bias MTL.

pdf bib
Computational Complexity of Natural Morphology Revisited
Hajime Senuma | Akiko Aizawa
Transactions of the Association for Computational Linguistics, Volume 12

This paper revisits a classical, yet fundamental, discussion of theoretical computational linguistics: the computational complexity of natural languages. Past studies have revealed that syntax, as observed in Swiss-German, is not weakly context-free. Concerning morphology, Culy (1985) employed a construction in Bambara to show that morphology is not weakly context-free; however, Manaster-Ramer (1988) pointed out that the Bambara case can be problematic because the wordhood of the construction is reliant on special tonal behaviors, and it is ambiguous whether the behaviors belong to the morphological domain. This raises doubts about whether the case can be considered a genuine morphological phenomenon. In this paper, we argue that Classical Ainu, a language we examine, also defies weak context-freeness at the morphological level. The construction we introduce is unambiguously morphological because this language’s valency-sensitive structure and valency-changing operations, such as noun incorporation, preclude its grammatical interpretation as syntactic.

2023

pdf bib
Probing Physical Reasoning with Counter-Commonsense Context
Kazushi Kondo | Saku Sugawara | Akiko Aizawa
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

In this study, we create a CConS (Counter-commonsense Contextual Size comparison) dataset to investigate how physical commonsense affects the contextualized size comparison task; the proposed dataset consists of both contexts that fit physical commonsense and those that do not. This dataset tests the ability of language models to predict the size relationship between objects under various contexts generated from our curated noun list and templates. We measure the ability of several masked language models and encoder-decoder models. The results show that while large language models can use prepositions such as “in” and “into” in the provided context to infer size relationships, they fail to use verbs and thus make incorrect judgments led by their prior physical commonsense.

pdf bib
SuperDialseg: A Large-scale Dataset for Supervised Dialogue Segmentation
Junfeng Jiang | Chengzhang Dong | Sadao Kurohashi | Akiko Aizawa
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Dialogue segmentation is a crucial task for dialogue systems allowing a better understanding of conversational texts. Despite recent progress in unsupervised dialogue segmentation methods, their performances are limited by the lack of explicit supervised signals for training. Furthermore, the precise definition of segmentation points in conversations still remains as a challenging problem, increasing the difficulty of collecting manual annotations. In this paper, we provide a feasible definition of dialogue segmentation points with the help of document-grounded dialogues and release a large-scale supervised dataset called SuperDialseg, containing 9,478 dialogues based on two prevalent document-grounded dialogue corpora, and also inherit their useful dialogue-related annotations. Moreover, we provide a benchmark including 18 models across five categories for the dialogue segmentation task with several proper evaluation metrics. Empirical studies show that supervised learning is extremely effective in in-domain datasets and models trained on SuperDialseg can achieve good generalization ability on out-of-domain data. Additionally, we also conducted human verification on the test set and the Kappa score confirmed the quality of our automatically constructed dataset. We believe our work is an important step forward in the field of dialogue segmentation.

pdf bib
Analyzing the Effectiveness of the Underlying Reasoning Tasks in Multi-hop Question Answering
Xanh Ho | Anh-Khoa Duong Nguyen | Saku Sugawara | Akiko Aizawa
Findings of the Association for Computational Linguistics: EACL 2023

To explain the predicted answers and evaluate the reasoning abilities of models, several studies have utilized underlying reasoning (UR) tasks in multi-hop question answering (QA) datasets. However, it remains an open question as to how effective UR tasks are for the QA task when training models on both tasks in an end-to-end manner. In this study, we address this question by analyzing the effectiveness of UR tasks (including both sentence-level and entity-level tasks) in three aspects: (1) QA performance, (2) reasoning shortcuts, and (3) robustness. While the previous models have not been explicitly trained on an entity-level reasoning prediction task, we build a multi-task model that performs three tasks together: sentence-level supporting facts prediction, entity-level reasoning prediction, and answer prediction. Experimental results on 2WikiMultiHopQA and HotpotQA-small datasets reveal that (1) UR tasks can improve QA performance. Using four debiased datasets that are newly created, we demonstrate that (2) UR tasks are helpful in preventing reasoning shortcuts in the multi-hop QA task. However, we find that (3) UR tasks do not contribute to improving the robustness of the model on adversarial questions, such as sub-questions and inverted questions. We encourage future studies to investigate the effectiveness of entity-level reasoning in the form of natural language questions (e.g., sub-question forms).

pdf bib
Predicting Numerals in Text Using Nearest Neighbor Language Models
Taku Sakamoto | Akiko Aizawa
Findings of the Association for Computational Linguistics: ACL 2023

Commonsense about quantitative properties is essential for a deep understanding of texts containing numerals. However, naive language models (LMs) treat numerals as string tokens; therefore, they lack an understanding of the magnitudes of numerals, resulting in a difficulty in acquiring the commonsense. In this study, we apply the k-nearest neighbor LM (kNN-LM) to the masked numeral prediction (MNP) task, which measures the quantitative commonsense of LMs.kNN-LM extends pre-trained neural LMs with the k-nearest neighbor (kNN) search.Since it can utilize patterns that appear in the datastore for prediction, we expect an improvement in numeral prediction accuracy, which is associated with a high rate of occurrence of out-of-vocabulary (OOV) words.Through experiments, we verified that the retrieval-based method is effective for fine-grained predictions of numerals from context, especially for the OOV numerals.We also compared two different context spans for context representations to improve the accuracy of kNN search by using only the words that are closely related to the masked numeral: the mask and its surrounding words, and the mask and its subsequent words.Our results reveal that using only the embeddings of mask tokens for numerals in kNN search is the most effective approach for realizing MNP tasks.

pdf bib
Solving Label Variation in Scientific Information Extraction via Multi-Task Learning
Dong Pham | Xanh Ho | Quang Thuy Ha | Akiko Aizawa
Proceedings of the 37th Pacific Asia Conference on Language, Information and Computation

2022

pdf bib
How Well Do Multi-hop Reading Comprehension Models Understand Date Information?
Xanh Ho | Saku Sugawara | Akiko Aizawa
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Several multi-hop reading comprehension datasets have been proposed to resolve the issue of reasoning shortcuts by which questions can be answered without performing multi-hop reasoning. However, the ability of multi-hop models to perform step-by-step reasoning when finding an answer to a comparison question remains unclear. It is also unclear how questions about the internal reasoning process are useful for training and evaluating question-answering (QA) systems. To evaluate the model precisely in a hierarchical manner, we first propose a dataset, HieraDate, with three probing tasks in addition to the main question: extraction, reasoning, and robustness. Our dataset is created by enhancing two previous multi-hop datasets, HotpotQA and 2WikiMultiHopQA, focusing on multi-hop questions on date information that involve both comparison and numerical reasoning. We then evaluate the ability of existing models to understand date information. Our experimental results reveal that the multi-hop models do not have the ability to subtract two dates even when they perform well in date comparison and number subtraction tasks. Other results reveal that our probing questions can help to improve the performance of the models (e.g., by +10.3 F1) on the main QA task and our dataset can be used for data augmentation to improve the robustness of the models.

pdf bib
Look to the Right: Mitigating Relative Position Bias in Extractive Question Answering
Kazutoshi Shinoda | Saku Sugawara | Akiko Aizawa
Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP

Extractive question answering (QA) models tend to exploit spurious correlations to make predictions when a training set has unintended biases. This tendency results in models not being generalizable to examples where the correlations do not hold. Determining the spurious correlations QA models can exploit is crucial in building generalizable QA models in real-world applications; moreover, a method needs to be developed that prevents these models from learning the spurious correlations even when a training set is biased. In this study, we discovered that the relative position of an answer, which is defined as the relative distance from an answer span to the closest question-context overlap word, can be exploited by QA models as superficial cues for making predictions. Specifically, we find that when the relative positions in a training set are biased, the performance on examples with relative positions unseen during training is significantly degraded. To mitigate the performance degradation for unseen relative positions, we propose an ensemble-based debiasing method that does not require prior knowledge about the distribution of relative positions. We demonstrate that the proposed method mitigates the models’ reliance on relative positions using the biased and full SQuAD dataset. We hope that this study can help enhance the generalization ability of QA models in real-world applications.

pdf bib
Cross-Modal Similarity-Based Curriculum Learning for Image Captioning
Hongkuan Zhang | Saku Sugawara | Akiko Aizawa | Lei Zhou | Ryohei Sasano | Koichi Takeda
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Image captioning models require the high-level generalization ability to describe the contents of various images in words. Most existing approaches treat the image–caption pairs equally in their training without considering the differences in their learning difficulties. Several image captioning approaches introduce curriculum learning methods that present training data with increasing levels of difficulty. However, their difficulty measurements are either based on domain-specific features or prior model training. In this paper, we propose a simple yet efficient difficulty measurement for image captioning using cross-modal similarity calculated by a pretrained vision–language model. Experiments on the COCO and Flickr30k datasets show that our proposed approach achieves superior performance and competitive convergence speed to baselines without requiring heuristics or incurring additional training costs. Moreover, the higher model performance on difficult examples and unseen data also demonstrates the generalization ability.

pdf bib
Debiasing Masks: A New Framework for Shortcut Mitigation in NLU
Johannes Mario Meissner | Saku Sugawara | Akiko Aizawa
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Debiasing language models from unwanted behaviors in Natural Language Understanding (NLU) tasks is a topic with rapidly increasing interest in the NLP community. Spurious statistical correlations in the data allow models to perform shortcuts and avoid uncovering more advanced and desirable linguistic features.A multitude of effective debiasing approaches has been proposed, but flexibility remains a major issue. For the most part, models must be retrained to find a new set of weights with debiased behavior.We propose a new debiasing method in which we identify debiased pruning masks that can be applied to a finetuned model. This enables the selective and conditional application of debiasing behaviors.We assume that bias is caused by a certain subset of weights in the network; our method is, in essence, a mask search to identify and remove biased weights.Our masks show equivalent or superior performance to the standard counterparts, while offering important benefits.Pruning masks can be stored with high efficiency in memory, and it becomes possible to switch among several debiasing behaviors (or revert back to the original biased model) at inference time. Finally, it opens the doors to further research on how biases are acquired by studying the generated masks. For example, we observed that the early layers and attention heads were pruned more aggressively, possibly hinting towards the location in which biases may be encoded.

pdf bib
Gender Biases and Where to Find Them: Exploring Gender Bias in Pre-Trained Transformer-based Language Models Using Movement Pruning
Przemyslaw Joniak | Akiko Aizawa
Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP)

Language model debiasing has emerged as an important field of study in the NLP community. Numerous debiasing techniques were proposed, but bias ablation remains an unaddressed issue. We demonstrate a novel framework for inspecting bias in pre-trained transformer-based language models via movement pruning. Given a model and a debiasing objective, our framework finds a subset of the model containing less bias than the original model. We implement our framework by pruning the model while fine-tuning it on the debasing objective. Optimized are only the pruning scores – parameters coupled with the model’s weights that act as gates. We experiment with pruning attention heads, an important building block of transformers: we prune square blocks, as well as establish a new way of pruning the entire heads. Lastly, we demonstrate the usage of our framework using gender bias, and based on our findings, we propose an improvement to an existing debiasing method. Additionally, we re-discover a bias-performance trade-off: the better the model performs, the more bias it contains.

pdf bib
Building Dataset for Grounding of Formulae — Annotating Coreference Relations Among Math Identifiers
Takuto Asakura | Yusuke Miyao | Akiko Aizawa
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Grounding the meaning of each symbol in math formulae is important for automated understanding of scientific documents. Generally speaking, the meanings of math symbols are not necessarily constant, and the same symbol is used in multiple meanings. Therefore, coreference relations between symbols need to be identified for grounding, and the task has aspects of both description alignment and coreference analysis. In this study, we annotated 15 papers selected from arXiv.org with the grounding information. In total, 12,352 occurrences of math identifiers in these papers were annotated, and all coreference relations between them were made explicit in each paper. The constructed dataset shows that regardless of the ambiguity of symbols in math formulae, coreference relations can be labeled with a high inter-annotator agreement. The constructed dataset enables us to achieve automation of formula grounding, and in turn, make deeper use of the knowledge in scientific documents using techniques such as math information extraction. The built grounding dataset is available at https://sigmathling.kwarc.info/resources/grounding- dataset/.

pdf bib
A Simple Yet Effective Corpus Construction Method for Chinese Sentence Compression
Yang Zhao | Hiroshi Kanayama | Issei Yoshida | Masayasu Muraoka | Akiko Aizawa
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Deletion-based sentence compression in the English language has made significant progress over the past few decades. However, there is a lack of large-scale and high-quality parallel corpus (i.e., (sentence, compression) pairs) for the Chinese language to train an efficient compression system. To remedy this shortcoming, we present a dependency-tree-based method to construct a Chinese corpus with 151k pairs of sentences and compression based on Chinese language-specific characteristics. Subsequently, we trained both extractive and generative neural compression models using the constructed corpus. The experimental results show that our compression model can generate high-quality compressed sentences on both automatic and human evaluation metrics compared with the baselines. The results of the faithfulness evaluation also indicated that the Chinese compression model trained on our constructed corpus can produce more faithful compressed sentences. Furthermore, a dataset with 1,000 pairs of sentences and ground truth compression was manually created for automatic evaluation, which, we believe, will benefit future research on Chinese sentence compression.

pdf bib
Incorporating the Rhetoric of Scientific Language into Sentence Embeddings using Phrase-guided Distant Supervision and Metric Learning
Kaito Sugimoto | Akiko Aizawa
Proceedings of the Third Workshop on Scholarly Document Processing

Communicative functions are an important rhetorical feature of scientific writing. Sentence embeddings that contain such features are highly valuable for the argumentative analysis of scientific documents, with applications in document alignment, recommendation, and academic writing assistance. Moreover, embeddings can provide a possible solution to the open-set problem, where models need to generalize to new communicative functions unseen at training time. However, existing sentence representation models are not suited for detecting functional similarity since they only consider lexical or semantic similarities. To remedy this, we propose a combined approach of distant supervision and metric learning to make a representation model more aware of the functional part of a sentence. We first leverage an existing academic phrase database to label sentences automatically with their functions. Then, we train an embedding model to capture similarities and dissimilarities from a rhetorical perspective. The experimental results demonstrate that the embeddings obtained from our model are more advantageous than existing models when retrieving functionally similar sentences. We also provide an extensive analysis of the performance differences between five metric learning objectives, revealing that traditional methods (e.g., softmax cross-entropy loss and triplet loss) outperform state-of-the-art techniques.

2021

pdf bib
Embracing Ambiguity: Shifting the Training Target of NLI Models
Johannes Mario Meissner | Napat Thumwanit | Saku Sugawara | Akiko Aizawa
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Natural Language Inference (NLI) datasets contain examples with highly ambiguous labels. While many research works do not pay much attention to this fact, several recent efforts have been made to acknowledge and embrace the existence of ambiguity, such as UNLI and ChaosNLI. In this paper, we explore the option of training directly on the estimated label distribution of the annotators in the NLI task, using a learning loss based on this ambiguity distribution instead of the gold-labels. We prepare AmbiNLI, a trial dataset obtained from readily available sources, and show it is possible to reduce ChaosNLI divergence scores when finetuning on this data, a promising first step towards learning how to capture linguistic ambiguity. Additionally, we show that training on the same amount of data but targeting the ambiguity distribution instead of gold-labels can result in models that achieve higher performance and learn better representations for downstream tasks.

pdf bib
Improving the Robustness of QA Models to Challenge Sets with Variational Question-Answer Pair Generation
Kazutoshi Shinoda | Saku Sugawara | Akiko Aizawa
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Student Research Workshop

Question answering (QA) models for reading comprehension have achieved human-level accuracy on in-distribution test sets. However, they have been demonstrated to lack robustness to challenge sets, whose distribution is different from that of training sets. Existing data augmentation methods mitigate this problem by simply augmenting training sets with synthetic examples sampled from the same distribution as the challenge sets. However, these methods assume that the distribution of a challenge set is known a priori, making them less applicable to unseen challenge sets. In this study, we focus on question-answer pair generation (QAG) to mitigate this problem. While most existing QAG methods aim to improve the quality of synthetic examples, we conjecture that diversity-promoting QAG can mitigate the sparsity of training sets and lead to better robustness. We present a variational QAG model that generates multiple diverse QA pairs from a paragraph. Our experiments show that our method can improve the accuracy of 12 challenge sets, as well as the in-distribution accuracy.

pdf bib
Predicting Numerals in Natural Language Text Using a Language Model Considering the Quantitative Aspects of Numerals
Taku Sakamoto | Akiko Aizawa
Proceedings of Deep Learning Inside Out (DeeLIO): The 2nd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures

Numerical common sense (NCS) is necessary to fully understand natural language text that includes numerals. NCS is knowledge about the numerical features of objects in text, such as size, weight, or color. Existing neural language models treat numerals in a text as string tokens in the same way as other words. Therefore, they cannot reflect the quantitative aspects of numerals in the training process, making it difficult to learn NCS. In this paper, we measure the NCS acquired by existing neural language models using a masked numeral prediction task as an evaluation task. In this task, we use two evaluation metrics to evaluate the language models in terms of the symbolic and quantitative aspects of the numerals, respectively. We also propose methods to reflect not only the symbolic aspect but also the quantitative aspect of numerals in the training of language models, using a loss function that depends on the magnitudes of the numerals and a regression model for the masked numeral prediction task. Finally, we quantitatively evaluate our proposed approaches on four datasets with different properties using the two metrics. Compared with methods that use existing language models, the proposed methods reduce numerical absolute errors, although exact match accuracy was reduced. This result confirms that the proposed methods, which use the magnitudes of the numerals for model training, are an effective way for models to capture NCS.

pdf bib
Benchmarking Machine Reading Comprehension: A Psychological Perspective
Saku Sugawara | Pontus Stenetorp | Akiko Aizawa
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Machine reading comprehension (MRC) has received considerable attention as a benchmark for natural language understanding. However, the conventional task design of MRC lacks explainability beyond the model interpretation, i.e., reading comprehension by a model cannot be explained in human terms. To this end, this position paper provides a theoretical basis for the design of MRC datasets based on psychology as well as psychometrics, and summarizes it in terms of the prerequisites for benchmarking MRC. We conclude that future datasets should (i) evaluate the capability of the model for constructing a coherent and grounded representation to understand context-dependent situations and (ii) ensure substantive validity by shortcut-proof questions and explanation as a part of the task design.

pdf bib
Attention-based Relational Graph Convolutional Network for Target-Oriented Opinion Words Extraction
Junfeng Jiang | An Wang | Akiko Aizawa
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Target-oriented opinion words extraction (TOWE) is a subtask of aspect-based sentiment analysis (ABSA). It aims to extract the corresponding opinion words for a given opinion target in a review sentence. Intuitively, the relation between an opinion target and an opinion word mostly relies on syntactics. In this study, we design a directed syntactic dependency graph based on a dependency tree to establish a path from the target to candidate opinions. Subsequently, we propose a novel attention-based relational graph convolutional neural network (ARGCN) to exploit syntactic information over dependency graphs. Moreover, to explicitly extract the corresponding opinion words toward the given opinion target, we effectively encode target information in our model with the target-aware representation. Empirical results demonstrate that our model significantly outperforms all of the existing models on four benchmark datasets. Extensive analysis also demonstrates the effectiveness of each component of our models. Our code is available at https://github.com/wcwowwwww/towe-eacl.

pdf bib
Communicative-Function-Based Sentence Classification for Construction of an Academic Formulaic Expression Database
Kenichi Iwatsuki | Akiko Aizawa
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Formulaic expressions (FEs), such as ‘in this paper, we propose’ are frequently used in scientific papers. FEs convey a communicative function (CF), i.e. ‘showing the aim of the paper’ in the above-mentioned example. Although CF-labelled FEs are helpful in assisting academic writing, the construction of FE databases requires manual labour for assigning CF labels. In this study, we considered a fully automated construction of a CF-labelled FE database using the top–down approach, in which the CF labels are first assigned to sentences, and then the FEs are extracted. For the CF-label assignment, we created a CF-labelled sentence dataset, on which we trained a SciBERT classifier. We show that the classifier and dataset can be used to construct FE databases of disciplines that are different from the training data. The accuracy of in-disciplinary classification was more than 80%, while cross-disciplinary classification also worked well. We also propose an FE extraction method, which was applied to the CF-labelled sentences. Finally, we constructed and published a new, large CF-labelled FE database. The evaluation of the final CF-labelled FE database showed that approximately 65% of the FEs are correct and useful, which is sufficiently high considering practical use.

pdf bib
Effect of Visual Extensions on Natural Language Understanding in Vision-and-Language Models
Taichi Iki | Akiko Aizawa
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

A method for creating a vision-and-language (V&L) model is to extend a language model through structural modifications and V&L pre-training. Such an extension aims to make a V&L model inherit the capability of natural language understanding (NLU) from the original language model. To see how well this is achieved, we propose to evaluate V&L models using an NLU benchmark (GLUE). We compare five V&L models, including single-stream and dual-stream models, trained with the same pre-training. Dual-stream models, with their higher modality independence achieved by approximately doubling the number of parameters, are expected to preserve the NLU capability better. Our main finding is that the dual-stream scores are not much different than the single-stream scores, contrary to expectation. Further analysis shows that pre-training causes the performance drop in NLU tasks with few exceptions. These results suggest that adopting a single-stream structure and devising the pre-training could be an effective method for improving the maintenance of language knowledge in V&L extensions.

pdf bib
Phrase-Level Action Reinforcement Learning for Neural Dialog Response Generation
Takato Yamazaki | Akiko Aizawa
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Neural Media Bias Detection Using Distant Supervision With BABE - Bias Annotations By Experts
Timo Spinde | Manuel Plank | Jan-David Krieger | Terry Ruas | Bela Gipp | Akiko Aizawa
Findings of the Association for Computational Linguistics: EMNLP 2021

Media coverage has a substantial effect on the public perception of events. Nevertheless, media outlets are often biased. One way to bias news articles is by altering the word choice. The automatic identification of bias by word choice is challenging, primarily due to the lack of a gold standard data set and high context dependencies. This paper presents BABE, a robust and diverse data set created by trained experts, for media bias research. We also analyze why expert labeling is essential within this domain. Our data set offers better annotation quality and higher inter-annotator agreement than existing work. It consists of 3,700 sentences balanced among topics and outlets, containing media bias labels on the word and sentence level. Based on our data, we also introduce a way to detect bias-inducing sentences in news articles automatically. Our best performing BERT-based model is pre-trained on a larger corpus consisting of distant labels. Fine-tuning and evaluating the model on our proposed supervised data set, we achieve a macro F1-score of 0.804, outperforming existing methods.

pdf bib
Can Question Generation Debias Question Answering Models? A Case Study on Question–Context Lexical Overlap
Kazutoshi Shinoda | Saku Sugawara | Akiko Aizawa
Proceedings of the 3rd Workshop on Machine Reading for Question Answering

Question answering (QA) models for reading comprehension have been demonstrated to exploit unintended dataset biases such as question–context lexical overlap. This hinders QA models from generalizing to under-represented samples such as questions with low lexical overlap. Question generation (QG), a method for augmenting QA datasets, can be a solution for such performance degradation if QG can properly debias QA datasets. However, we discover that recent neural QG models are biased towards generating questions with high lexical overlap, which can amplify the dataset bias. Moreover, our analysis reveals that data augmentation with these QG models frequently impairs the performance on questions with low lexical overlap, while improving that on questions with high lexical overlap. To address this problem, we use a synonym replacement-based approach to augment questions with low lexical overlap. We demonstrate that the proposed data augmentation approach is simple yet effective to mitigate the degradation problem with only 70k synthetic examples.

pdf bib
Maintaining Common Ground in Dynamic Environments
Takuma Udagawa | Akiko Aizawa
Transactions of the Association for Computational Linguistics, Volume 9

Common grounding is the process of creating and maintaining mutual understandings, which is a critical aspect of sophisticated human communication. While various task settings have been proposed in existing literature, they mostly focus on creating common ground under a static context and ignore the aspect of maintaining them overtime under dynamic context. In this work, we propose a novel task setting to study the ability of both creating and maintaining common ground in dynamic environments. Based on our minimal task formulation, we collected a large-scale dataset of 5,617 dialogues to enable fine-grained evaluation and analysis of various dialogue systems. Through our dataset analyses, we highlight novel challenges introduced in our setting, such as the usage of complex spatio-temporal expressions to create and maintain common ground. Finally, we conduct extensive experiments to assess the capabilities of our baseline dialogue system and discuss future prospects of our research.

2020

pdf bib
Keyphrase Generation for Scientific Document Retrieval
Florian Boudin | Ygor Gallina | Akiko Aizawa
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Sequence-to-sequence models have lead to significant progress in keyphrase generation, but it remains unknown whether they are reliable enough to be beneficial for document retrieval. This study provides empirical evidence that such models can significantly improve retrieval performance, and introduces a new extrinsic evaluation framework that allows for a better understanding of the limitations of keyphrase generation models. Using this framework, we point out and discuss the difficulties encountered with supplementing documents with -not present in text- keyphrases, and generalizing models across domains. Our code is available at https://github.com/boudinfl/ir-using-kg

pdf bib
Deconstruct to Reconstruct a Configurable Evaluation Metric for Open-Domain Dialogue Systems
Vitou Phy | Yang Zhao | Akiko Aizawa
Proceedings of the 28th International Conference on Computational Linguistics

Many automatic evaluation metrics have been proposed to score the overall quality of a response in open-domain dialogue. Generally, the overall quality is comprised of various aspects, such as relevancy, specificity, and empathy, and the importance of each aspect differs according to the task. For instance, specificity is mandatory in a food-ordering dialogue task, whereas fluency is preferred in a language-teaching dialogue system. However, existing metrics are not designed to cope with such flexibility. For example, BLEU score fundamentally relies only on word overlapping, whereas BERTScore relies on semantic similarity between reference and candidate response. Thus, they are not guaranteed to capture the required aspects, i.e., specificity. To design a metric that is flexible to a task, we first propose making these qualities manageable by grouping them into three groups: understandability, sensibleness, and likability, where likability is a combination of qualities that are essential for a task. We also propose a simple method to composite metrics of each aspect to obtain a single metric called USL-H, which stands for Understandability, Sensibleness, and Likability in Hierarchy. We demonstrated that USL-H score achieves good correlations with human judgment and maintains its configurability towards different aspects and metrics.

pdf bib
Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps
Xanh Ho | Anh-Khoa Duong Nguyen | Saku Sugawara | Akiko Aizawa
Proceedings of the 28th International Conference on Computational Linguistics

A multi-hop question answering (QA) dataset aims to test reasoning and inference skills by requiring a model to read multiple paragraphs to answer a given question. However, current datasets do not provide a complete explanation for the reasoning process from the question to the answer. Further, previous studies revealed that many examples in existing multi-hop datasets do not require multi-hop reasoning to answer a question. In this study, we present a new multi-hop QA dataset, called 2WikiMultiHopQA, which uses structured and unstructured data. In our dataset, we introduce the evidence information containing a reasoning path for multi-hop questions. The evidence information has two benefits: (i) providing a comprehensive explanation for predictions and (ii) evaluating the reasoning skills of a model. We carefully design a pipeline and a set of templates when generating a question-answer pair that guarantees the multi-hop steps and the quality of the questions. We also exploit the structured format in Wikidata and use logical rules to create questions that are natural but still require multi-hop reasoning. Through experiments, we demonstrate that our dataset is challenging for multi-hop models and it ensures that multi-hop reasoning is required.

pdf bib
A Linguistic Analysis of Visually Grounded Dialogues Based on Spatial Expressions
Takuma Udagawa | Takato Yamazaki | Akiko Aizawa
Findings of the Association for Computational Linguistics: EMNLP 2020

Recent models achieve promising results in visually grounded dialogues. However, existing datasets often contain undesirable biases and lack sophisticated linguistic analyses, which make it difficult to understand how well current models recognize their precise linguistic structures. To address this problem, we make two design choices: first, we focus on OneCommon Corpus (CITATION), a simple yet challenging common grounding dataset which contains minimal bias by design. Second, we analyze their linguistic structures based on spatial expressions and provide comprehensive and reliable annotation for 600 dialogues. We show that our annotation captures important linguistic structures including predicate-argument structure, modification and ellipsis. In our experiments, we assess the model’s understanding of these structures through reference resolution. We demonstrate that our annotation can reveal both the strengths and weaknesses of baseline models in essential levels of detail. Overall, we propose a novel framework and resource for investigating fine-grained language understanding in visually grounded dialogues.

pdf bib
Language-Conditioned Feature Pyramids for Visual Selection Tasks
Taichi Iki | Akiko Aizawa
Findings of the Association for Computational Linguistics: EMNLP 2020

Referring expression comprehension, which is the ability to locate language to an object in an image, plays an important role in creating common ground. Many models that fuse visual and linguistic features have been proposed. However, few models consider the fusion of linguistic features with multiple visual features with different sizes of receptive fields, though the proper size of the receptive field of visual features intuitively varies depending on expressions. In this paper, we introduce a neural network architecture that modulates visual features with varying sizes of receptive field by linguistic features. We evaluate our architecture on tasks related to referring expression comprehension in two visual dialogue games. The results show the advantages and broad applicability of our architecture. Source code is available at https://github.com/Alab-NII/lcfp .

pdf bib
An Evaluation Dataset for Identifying Communicative Functions of Sentences in English Scholarly Papers
Kenichi Iwatsuki | Florian Boudin | Akiko Aizawa
Proceedings of the Twelfth Language Resources and Evaluation Conference

Formulaic expressions, such as ‘in this paper we propose’, are used by authors of scholarly papers to perform communicative functions; the communicative function of the present example is ‘stating the aim of the paper’. Collecting such expressions and pairing them with their communicative functions would be highly valuable for various tasks, particularly for writing assistance. However, such collection and paring in a principled and automated manner would require high-quality annotated data, which are not available. In this study, we address this shortcoming by creating a manually annotated dataset for detecting communicative functions in sentences. Starting from a seed list of labelled formulaic expressions, we retrieved new sentences from scholarly papers in the ACL Anthology and asked multiple human evaluators to label communicative functions. To show the usefulness of our dataset, we conducted a series of experiments that determined to what extent sentence representations acquired by recent models, such as word2vec and BERT, can be employed to detect communicative functions in sentences.

pdf bib
A System for Worldwide COVID-19 Information Aggregation
Akiko Aizawa | Frederic Bergeron | Junjie Chen | Fei Cheng | Katsuhiko Hayashi | Kentaro Inui | Hiroyoshi Ito | Daisuke Kawahara | Masaru Kitsuregawa | Hirokazu Kiyomaru | Masaki Kobayashi | Takashi Kodama | Sadao Kurohashi | Qianying Liu | Masaki Matsubara | Yusuke Miyao | Atsuyuki Morishima | Yugo Murawaki | Kazumasa Omura | Haiyue Song | Eiichiro Sumita | Shinji Suzuki | Ribeka Tanaka | Yu Tanaka | Masashi Toyoda | Nobuhiro Ueda | Honai Ueoka | Masao Utiyama | Ying Zhong
Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020

The global pandemic of COVID-19 has made the public pay close attention to related news, covering various domains, such as sanitation, treatment, and effects on education. Meanwhile, the COVID-19 condition is very different among the countries (e.g., policies and development of the epidemic), and thus citizens would be interested in news in foreign countries. We build a system for worldwide COVID-19 information aggregation containing reliable articles from 10 regions in 7 languages sorted by topics. Our reliable COVID-19 related website dataset collected through crowdsourcing ensures the quality of the articles. A neural machine translation module translates articles in other languages into Japanese and English. A BERT-based topic-classifier trained on our article-topic pair dataset helps users find their interested information efficiently by putting articles into different categories.

pdf bib
Towards Grounding of Formulae
Takuto Asakura | André Greiner-Petter | Akiko Aizawa | Yusuke Miyao
Proceedings of the First Workshop on Scholarly Document Processing

A large amount of scientific knowledge is represented within mixed forms of natural language texts and mathematical formulae. Therefore, a collaboration of natural language processing and formula analyses, so-called mathematical language processing, is necessary to enable computers to understand and retrieve information from the documents. However, as we will show in this project, a mathematical notation can change its meaning even within the scope of a single paragraph. This flexibility makes it difficult to extract the exact meaning of a mathematical formula. In this project, we will propose a new task direction for grounding mathematical formulae. Particularly, we are addressing the widespread misconception of various research projects in mathematical information retrieval, which presume that mathematical notations have a fixed meaning within a single document. We manually annotated a long scientific paper to illustrate the task concept. Our high inter-annotator agreement shows that the task is well understood for humans. Our results indicate that it is worthwhile to grow the techniques for the proposed task to contribute to the further progress of mathematical language processing.

pdf bib
Virtual Citation Proximity (VCP): Empowering Document Recommender Systems by Learning a Hypothetical In-Text Citation-Proximity Metric for Uncited Documents
Paul Molloy | Joeran Beel | Akiko Aizawa
Proceedings of the 8th International Workshop on Mining Scientific Publications

The relatedness of research articles, patents, court rulings, web pages, and other document types is often calculated with citation or hyperlink-based approaches like co-citation (proximity) analysis. The main limitation of citation-based approaches is that they cannot be used for documents that receive little or no citations. We propose Virtual Citation Proximity (VCP), a Siamese Neural Network architecture, which combines the advantages of co-citation proximity analysis (diverse notions of relatedness / high recommendation performance), with the advantage of content-based filtering (high coverage). VCP is trained on a corpus of documents with textual features, and with real citation proximity as ground truth. VCP then predicts for any two documents, based on their title and abstract, in what proximity the two documents would be co-cited, if they were indeed co-cited. The prediction can be used in the same way as real citation proximity to calculate document relatedness, even for uncited documents. In our evaluation with 2 million co-citations from Wikipedia articles, VCP achieves an MAE of 0.0055, i.e. an improvement of 20% over the baseline, though the learning curve suggests that more work is needed.

2019

pdf bib
Unsupervised Rewriter for Multi-Sentence Compression
Yang Zhao | Xiaoyu Shen | Wei Bi | Akiko Aizawa
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Multi-sentence compression (MSC) aims to generate a grammatical but reduced compression from multiple input sentences while retaining their key information. Previous dominating approach for MSC is the extraction-based word graph approach. A few variants further leveraged lexical substitution to yield more abstractive compression. However, two limitations exist. First, the word graph approach that simply concatenates fragments from multiple sentences may yield non-fluent or ungrammatical compression. Second, lexical substitution is often inappropriate without the consideration of context information. To tackle the above-mentioned issues, we present a neural rewriter for multi-sentence compression that does not need any parallel corpus. Empirical studies have shown that our approach achieves comparable results upon automatic evaluation and improves the grammaticality of compression based on human evaluation. A parallel corpus with more than 140,000 (sentence group, compression) pairs is also constructed as a by-product for future research.

2018

pdf bib
Using Formulaic Expressions in Writing Assistance Systems
Kenichi Iwatsuki | Akiko Aizawa
Proceedings of the 27th International Conference on Computational Linguistics

Formulaic expressions (FEs) used in scholarly papers, such as ‘there has been little discussion about’, are helpful for non-native English speakers. However, it is time-consuming for users to manually search for an appropriate expression every time they want to consult FE dictionaries. For this reason, we tackle the task of semantic searches of FE dictionaries. At the start of our research, we identified two salient difficulties in this task. First, the paucity of example sentences in existing FE dictionaries results in a shortage of context information, which is necessary for acquiring semantic representation of FEs. Second, while a semantic category label is assigned to each FE in many FE dictionaries, it is difficult to predict the labels from user input, forcing users to manually designate the semantic category when searching. To address these difficulties, we propose a new framework for semantic searches of FEs and propose a new method to leverage both existing dictionaries and domain sentence corpora. Further, we expand an existing FE dictionary to consider building a more comprehensive and domain-specific FE dictionary and to verify the effectiveness of our method.

pdf bib
What Makes Reading Comprehension Questions Easier?
Saku Sugawara | Kentaro Inui | Satoshi Sekine | Akiko Aizawa
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

A challenge in creating a dataset for machine reading comprehension (MRC) is to collect questions that require a sophisticated understanding of language to answer beyond using superficial cues. In this work, we investigate what makes questions easier across recent 12 MRC datasets with three question styles (answer extraction, description, and multiple choice). We propose to employ simple heuristics to split each dataset into easy and hard subsets and examine the performance of two baseline models for each of the subsets. We then manually annotate questions sampled from each subset with both validity and requisite reasoning skills to investigate which skills explain the difference between easy and hard questions. From this study, we observed that (i) the baseline performances for the hard subsets remarkably degrade compared to those of entire datasets, (ii) hard questions require knowledge inference and multiple-sentence reasoning in comparison with easy questions, and (iii) multiple-choice questions tend to require a broader range of reasoning skills than answer extraction and description questions. These results suggest that one might overestimate recent advances in MRC.

pdf bib
Universal Dependencies for Ainu
Hajime Senuma | Akiko Aizawa
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
A Language Model based Evaluator for Sentence Compression
Yang Zhao | Zhiyuan Luo | Akiko Aizawa
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

We herein present a language-model-based evaluator for deletion-based sentence compression and view this task as a series of deletion-and-evaluation operations using the evaluator. More specifically, the evaluator is a syntactic neural language model that is first built by learning the syntactic and structural collocation among words. Subsequently, a series of trial-and-error deletion operations are conducted on the source sentences via a reinforcement learning framework to obtain the best target compression. An empirical study shows that the proposed model can effectively generate more readable compression, comparable or superior to several strong baselines. Furthermore, we introduce a 200-sentence test set for a large-scale dataset, setting a new baseline for the future research.

pdf bib
UC3M-NII Team at SemEval-2018 Task 7: Semantic Relation Classification in Scientific Papers via Convolutional Neural Network
Víctor Suárez-Paniagua | Isabel Segura-Bedmar | Akiko Aizawa
Proceedings of the 12th International Workshop on Semantic Evaluation

This paper reports our participation for SemEval-2018 Task 7 on extraction and classification of relationships between entities in scientific papers. Our approach is based on the use of a Convolutional Neural Network (CNN) trained on350 abstract with manually annotated entities and relations. Our hypothesis is that this deep learning model can be applied to extract and classify relations between entities for scientific papers at the same time. We use the Part-of-Speech and the distances to the target entities as part of the embedding for each word and we blind all the entities by marker names. In addition, we use sampling techniques to overcome the imbalance issues of this dataset. Our architecture obtained an F1-score of 35.4% for the relation extraction task and 18.5% for the relation classification task with a basic configuration of the one step CNN.

2017

pdf bib
Seq2seq for Morphological Reinflection: When Deep Learning Fails
Hajime Senuma | Akiko Aizawa
Proceedings of the CoNLL SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection

pdf bib
Evaluation Metrics for Machine Reading Comprehension: Prerequisite Skills and Readability
Saku Sugawara | Yusuke Kido | Hikaru Yokono | Akiko Aizawa
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Knowing the quality of reading comprehension (RC) datasets is important for the development of natural-language understanding systems. In this study, two classes of metrics were adopted for evaluating RC datasets: prerequisite skills and readability. We applied these classes to six existing datasets, including MCTest and SQuAD, and highlighted the characteristics of the datasets according to each metric and the correlation between the two classes. Our dataset analysis suggests that the readability of RC datasets does not directly affect the question difficulty and that it is possible to create an RC dataset that is easy to read but difficult to answer.

pdf bib
A Conditional Variational Framework for Dialog Generation
Xiaoyu Shen | Hui Su | Yanran Li | Wenjie Li | Shuzi Niu | Yang Zhao | Akiko Aizawa | Guoping Long
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Deep latent variable models have been shown to facilitate the response generation for open-domain dialog systems. However, these latent variables are highly randomized, leading to uncontrollable generated responses. In this paper, we propose a framework allowing conditional response generation based on specific attributes. These attributes can be either manually assigned or automatically detected. Moreover, the dialog states for both speakers are modeled separately in order to reflect personal features. We validate this framework on two different scenarios, where the attribute refers to genericness and sentiment states respectively. The experiment result testified the potential of our model, where meaningful responses can be generated in accordance with the specified attributes.

pdf bib
Toward Universal Dependencies for Ainu
Hajime Senuma | Akiko Aizawa
Proceedings of the NoDaLiDa 2017 Workshop on Universal Dependencies (UDW 2017)

2016

pdf bib
Learning Succinct Models: Pipelined Compression with L1-Regularization, Hashing, Elias-Fano Indices, and Quantization
Hajime Senuma | Akiko Aizawa
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

The recent proliferation of smart devices necessitates methods to learn small-sized models. This paper demonstrates that if there are m features in total but only n = o(√m) features are required to distinguish examples, with 𝛺(log m) training examples and reasonable settings, it is possible to obtain a good model in a succinct representation using n log2 mn + o(m) bits, by using a pipeline of existing compression methods: L1-regularized logistic regression, feature hashing, Elias–Fano indices, and randomized quantization. An experiment shows that a noun phrase chunking task for which an existing library requires 27 megabytes can be compressed to less than 13 kilobytes without notable loss of accuracy.

pdf bib
SideNoter: Scholarly Paper Browsing System based on PDF Restructuring and Text Annotation
Takeshi Abekawa | Akiko Aizawa
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations

In this paper, we discuss our ongoing efforts to construct a scientific paper browsing system that helps users to read and understand advanced technical content distributed in PDF. Since PDF is a format specifically designed for printing, layout and logical structures of documents are indistinguishably embedded in the file. It requires much effort to extract natural language text from PDF files, and reversely, display semantic annotations produced by NLP tools on the original page layout. In our browsing system, we tackle these issues caused by the gap between printable document and plain text. Our system provides ways to extract natural language sentences from PDF files together with their logical structures, and also to map arbitrary textual spans to their corresponding regions on page images. We setup a demonstration system using papers published in ACL anthology and demonstrate the enhanced search and refined recommendation functions which we plan to make widely available to NLP researchers.

pdf bib
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Tutorial Abstracts
Marcello Federico | Akiko Aizawa
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Tutorial Abstracts

pdf bib
Discourse Relation Sense Classification with Two-Step Classifiers
Yusuke Kido | Akiko Aizawa
Proceedings of the CoNLL-16 shared task

pdf bib
Typed Entity and Relation Annotation on Computer Science Papers
Yuka Tateisi | Tomoko Ohta | Sampo Pyysalo | Yusuke Miyao | Akiko Aizawa
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We describe our ongoing effort to establish an annotation scheme for describing the semantic structures of research articles in the computer science domain, with the intended use of developing search systems that can refine their results by the roles of the entities denoted by the query keys. In our scheme, mentions of entities are annotated with ontology-based types, and the roles of the entities are annotated as relations with other entities described in the text. So far, we have annotated 400 abstracts from the ACL anthology and the ACM digital library. In this paper, the scheme and the annotated dataset are described, along with the problems found in the course of annotation. We also show the results of automatic annotation and evaluate the corpus in a practical setting in application to topic extraction.

pdf bib
English-to-Japanese Translation vs. Dictation vs. Post-editing: Comparing Translation Modes in a Multilingual Setting
Michael Carl | Akiko Aizawa | Masaru Yamada
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Speech-enabled interfaces have the potential to become one of the most efficient and ergonomic environments for human-computer interaction and for text production. However, not much research has been carried out to investigate in detail the processes and strategies involved in the different modes of text production. This paper introduces and evaluates a corpus of more than 55 hours of English-to-Japanese user activity data that were collected within the ENJA15 project, in which translators were observed while writing and speaking translations (translation dictation) and during machine translation post-editing. The transcription of the spoken data, keyboard logging and eye-tracking data were recorded with Translog-II, post-processed and integrated into the CRITT Translation Process Research-DB (TPR-DB), which is publicly available under a creative commons license. The paper presents the ENJA15 data as part of a large multilingual Chinese, Danish, German, Hindi and Spanish translation process data collection of more than 760 translation sessions. It compares the ENJA15 data with the other language pairs and reviews some of its particularities.

pdf bib
Measuring Cognitive Translation Effort with Activity Units
Moritz Jonas Schaeffer | Michael Carl | Isabel Lacruz | Akiko Aizawa
Proceedings of the 19th Annual Conference of the European Association for Machine Translation

pdf bib
An Analysis of Prerequisite Skills for Reading Comprehension
Saku Sugawara | Akiko Aizawa
Proceedings of the Workshop on Uphill Battles in Language Processing: Scaling Early Achievements to Robust Methods

2015

pdf bib
CroVeWA: Crosslingual Vector-Based Writing Assistance
Hubert Soyer | Goran Topić | Pontus Stenetorp | Akiko Aizawa
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations

pdf bib
Technical Term Extraction Using Measures of Neology
Christopher Norman | Akiko Aizawa
Proceedings of the ACL 2015 Workshop on Novel Computational Approaches to Keyphrase Extraction

pdf bib
Distant-supervised Language Model for Detecting Emotional Upsurge on Twitter
Yoshinari Fujinuma | Hikaru Yokono | Pascual Martínez-Gómez | Akiko Aizawa
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation

2014

pdf bib
Corpus for Coreference Resolution on Scientific Papers
Panot Chaimongkol | Akiko Aizawa | Yuka Tateisi
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The ever-growing number of published scientific papers prompts the need for automatic knowledge extraction to help scientists keep up with the state-of-the-art in their respective fields. To construct a good knowledge extraction system, annotated corpora in the scientific domain are required to train machine learning models. As described in this paper, we have constructed an annotated corpus for coreference resolution in multiple scientific domains, based on an existing corpus. We have modified the annotation scheme from Message Understanding Conference to better suit scientific texts. Then we applied that to the corpus. The annotated corpus is then compared with corpora in general domains in terms of distribution of resolution classes and performance of the Stanford Dcoref coreference resolver. Through these comparisons, we have demonstrated quantitatively that our manually annotated corpus differs from a general-domain corpus, which suggests deep differences between general-domain texts and scientific texts and which shows that different approaches can be made to tackle coreference resolution for general texts and scientific texts.

pdf bib
Annotation of Computer Science Papers for Semantic Relation Extrac-tion
Yuka Tateisi | Yo Shidahara | Yusuke Miyao | Akiko Aizawa
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We designed a new annotation scheme for formalising relation structures in research papers, through the investigation of computer science papers. The annotation scheme is based on the hypothesis that identifying the role of entities and events that are described in a paper is useful for intelligent information retrieval in academic literature, and the role can be determined by the relationship between the author and the described entities or events, and relationships among them. Using the scheme, we have annotated research abstracts from the IPSJ Journal published in Japanese by the Information Processing Society of Japan. On the basis of the annotated corpus, we have developed a prototype information extraction system which has the facility to classify sentences according to the relationship between entities mentioned, to help find the role of the entity in which the searcher is interested.

pdf bib
Significance of Bridging Real-world Documents and NLP Technologies
Tadayoshi Hara | Goran Topić | Yusuke Miyao | Akiko Aizawa
Proceedings of the Workshop on Open Infrastructures and Analysis Frameworks for HLT

pdf bib
Japanese to English Machine Translation using Preordering and Compositional Distributed Semantics
Sho Hoshino | Hubert Soyer | Yusuke Miyao | Akiko Aizawa
Proceedings of the 1st Workshop on Asian Translation (WAT2014)

2013

pdf bib
Sense Disambiguation: From Natural Language Words to Mathematical Terms
Minh-Quoc Nghiem | Giovanni Yoko Kristianto | Goran Topić | Akiko Aizawa
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
Diagnosing Causes of Reading Difficulty using Bayesian Networks
Pascual Martínez-Gómez | Akiko Aizawa
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
Relation Annotation for Understanding Research Papers
Yuka Tateisi | Yo Shidahara | Yusuke Miyao | Akiko Aizawa
Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse

pdf bib
Modeling Comma Placement in Chinese Text for Better Readability using Linguistic Features and Gaze Information
Tadayoshi Hara | Chen Chen | Yoshinobu Kano | Akiko Aizawa
Proceedings of the Second Workshop on Predicting and Improving Text Readability for Target Reader Populations

2012

pdf bib
Recognizing Personal Characteristics of Readers using Eye-Movements and Text Features
Pascual Martínez-Gómez | Tadayoshi Hara | Akiko Aizawa
Proceedings of COLING 2012

pdf bib
Framework of Semantic Role Assignment based on Extended Lexical Conceptual Structure: Comparison with VerbNet and FrameNet
Yuichiroh Matsubayashi | Yusuke Miyao | Akiko Aizawa
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Building Japanese Predicate-argument Structure Corpus using Lexical Conceptual Structure
Yuichiroh Matsubayashi | Yusuke Miyao | Akiko Aizawa
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper introduces our study on creating a Japanese corpus that is annotated using semantically-motivated predicate-argument structures. We propose an annotation framework based on Lexical Conceptual Structure (LCS), where semantic roles of arguments are represented through a semantic structure decomposed by several primitive predicates. As a first stage of the project, we extended Jackendoff 's LCS theory to increase generality of expression and coverage for verbs frequently appearing in the corpus, and successfully created LCS structures for 60 frequent Japanese predicates in Kyoto university Text Corpus (KTC). In this paper, we report our framework for creating the corpus and the current status of creating an LCS dictionary for Japanese predicates.

pdf bib
Automatic Translation of Scholarly Terms into Patent Terms Using Synonym Extraction Techniques
Hidetsugu Nanba | Toshiyuki Takezawa | Kiyoko Uchiyama | Akiko Aizawa
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Retrieving research papers and patents is important for any researcher assessing the scope of a field with high industrial relevance. However, the terms used in patents are often more abstract or creative than those used in research papers, because they are intended to widen the scope of claims. Therefore, a method is required for translating scholarly terms into patent terms. In this paper, we propose six methods for translating scholarly terms into patent terms using two synonym extraction methods: a statistical machine translation (SMT)-based method and a distributional similarity (DS)-based method. We conducted experiments to confirm the effectiveness of our method using the dataset of the Patent Mining Task from the NTCIR-7 Workshop. The aim of the task was to classify Japanese language research papers (pairs of titles and abstracts) using the IPC system at the subclass (third level), main group (fourth level), and subgroup (the fifth and most detailed level). The results showed that an SMT-based method (SMT_ABST+IDF) performed best at the subgroup level, whereas a DS-based method (DS+IDF) performed best at the subclass level.

pdf bib
Predicting Word Fixations in Text with a CRF Model for Capturing General Reading Strategies among Readers
Tadayoshi Hara | Daichi Mochihashi | Yoshinobu Kano | Akiko Aizawa
Proceedings of the First Workshop on Eye-tracking and Natural Language Processing

2011

pdf bib
Clustering Comparable Corpora For Bilingual Lexicon Extraction
Bo Li | Eric Gaussier | Akiko Aizawa
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Analyzing the characteristics of academic paper categories by using an index of representativeness
Takafumi Suzuki | Kiyoko Uchiyama | Ryota Tomisaka | Akiko Aizawa
Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation

2010

pdf bib
Mining Coreference Relations between Formulas and Text using Wikipedia
Minh Nghiem Quoc | Keisuke Yokoi | Yuichiroh Matsubayashi | Akiko Aizawa
Proceedings of the Second Workshop on NLP Challenges in the Information Explosion Era (NLPIX 2010)

2003

pdf bib
Analysis of Source Identified Text Corpora: Exploring the Statistics of the Reused Text and Authorship
Akiko Aizawa
Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics

2002

pdf bib
A Method of Cluster-Based Indexing of Textual Data
Akiko Aizawa
COLING 2002: The 19th International Conference on Computational Linguistics

2000

pdf bib
Automatic Thesaurus Generation through Multiple Filtering
Kyo Kageura | Keita Tsuji | Akiko N. Aizawa
COLING 2000 Volume 1: The 18th International Conference on Computational Linguistics

Search
Co-authors
Fix data