Eiji Aramaki - ACL Anthology

Eiji Aramaki

2025

Investigating Neurons and Heads in Transformer-based LLMs for Typographical Errors
Kohei Tsuji | Tatsuya Hiraoka | Yuchang Cheng | Eiji Aramaki | Tomoya Iwakura
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

This paper investigates how LLMs encode inputs with typos. We hypothesize that specific neurons and attention heads recognize typos and fix them internally using local and global contexts. We introduce a method to identify typo neurons and typo heads that work actively when inputs contain typos. Our experimental results suggest the following: 1) LLMs can fix typos with local contexts when the typo neurons in either the early or late layers are activated, even if those in the other are not. 2) Typo neurons in the middle layers are the core of typo-fixing with global contexts. 3) Typo heads fix typos by widely considering the context not focusing on specific tokens. 4) Typo neurons and typo heads work not only for typo-fixing but also for understanding general contexts.

MultiMSD: A Corpus for Multilingual Medical Text Simplification from Online Medical References
Koki Horiguchi | Tomoyuki Kajiwara | Takashi Ninomiya | Shoko Wakamiya | Eiji Aramaki
Findings of the Association for Computational Linguistics: ACL 2025

We release a parallel corpus for medical text simplification, which paraphrases medical terms into expressions easily understood by patients. Medical texts written by medical practitioners contain a lot of technical terms, and patients who are non-experts are often unable to use the information effectively. Therefore, there is a strong social demand for medical text simplification that paraphrases input sentences without using medical terms. However, this task has not been sufficiently studied in non-English languages. We therefore developed parallel corpora for medical text simplification in nine languages: German, English, Spanish, French, Italian, Japanese, Portuguese, Russian, and Chinese, each with 10,000 sentence pairs, by automatic sentence alignment to online medical references for professionals and consumers. We also propose a method for training text simplification models to actively paraphrase complex expressions, including medical terms. Experimental results show that the proposed method improves the performance of medical text simplification. In addition, we confirmed that training with a multilingual dataset is more effective than training with a monolingual dataset.

Exploring LLM Annotation for Adaptation of Clinical Information Extraction Models under Data-sharing Restrictions
Seiji Shimizu | Hisada Shohei | Yutaka Uno | Shuntaro Yada | Shoko Wakamiya | Eiji Aramaki
Findings of the Association for Computational Linguistics: ACL 2025

In-hospital text data contains valuable clinical information, yet deploying fine-tuned small language models (SLMs) for information extraction remains challenging due to differences in formatting and vocabulary across institutions. Since access to the original in-hospital data (source domain) is often restricted, annotated data from the target hospital (target domain) is crucial for domain adaptation. However, clinical annotation is notoriously expensive and time-consuming, as it demands clinical and linguistic expertise. To address this issue, we leverage large language models (LLMs) to annotate the target domain data for the adaptation. We conduct experiments on four clinical information extraction tasks, including eight target domain data. Experimental results show that LLM-annotated data consistently enhances SLM performance and, with a larger number of annotated data, outperforms manual annotation in three out of four tasks.

RecordTwin: Towards Creating Safe Synthetic Clinical Corpora
Seiji Shimizu | Ibrahim Baroud | Lisa Raithel | Shuntaro Yada | Shoko Wakamiya | Eiji Aramaki
Findings of the Association for Computational Linguistics: ACL 2025

The scarcity of publicly available clinical corpora hinders developing and applying NLP tools in clinical research. While existing work tackles this issue by utilizing generative models to create high-quality synthetic corpora, their methods require learning from the original in-hospital clinical documents, turning them unfeasible in practice. To address this problem, we introduce RecordTwin, a novel synthetic corpus creation method designed to generate synthetic documents from anonymized clinical entities. In this method, we first extract and anonymize entities from in-hospital documents to ensure the information contained in the synthetic corpus is restricted. Then, we use a large language model to fill the context between anonymized entities. To do so, we use a small, privacy-preserving subset of the original documents to mimic their formatting and writing style. This approach only requires anonymized entities and a small subset of original documents in the generation process, making it more feasible in practice. To evaluate the synthetic corpus created with our method, we conduct a proof-of-concept study using a publicly available clinical database. Our results demonstrate that the synthetic corpus has a utility comparable to the original data and a safety advantage over baselines, highlighting the potential of RecordTwin for privacy-preserving synthetic corpus creation.

Enhancing Hate Speech Classifiers through a Gradient-assisted Counterfactual Text Generation Strategy
Michael Van Supranes | Shaowen Peng | Shoko Wakamiya | Eiji Aramaki
Findings of the Association for Computational Linguistics: EMNLP 2025

Counterfactual data augmentation (CDA) is a promising strategy for improving hate speech classification, but automating counterfactual text generation remains a challenge. Strong attribute control can distort meaning, while prioritizing semantic preservation may weaken attribute alignment. We propose **Gradient-assisted Energy-based Sampling (GENES)** for counterfactual text generation, which restricts accepted samples to text meeting a minimum BERTScore threshold and applies gradient-assisted proposal generation to improve attribute alignment. Compared to other methods that solely rely on either prompting, gradient-based steering, or energy-based sampling, GENES is more likely to jointly satisfy attribute alignment and semantic preservation under the same base model. When applied to data augmentation, GENES achieved the best macro F1-score in two of three test sets, and it improved robustness in detecting targeted abusive language. In some cases, GENES exceeded the performance of prompt-based methods using a GPT-4o-mini, despite relying on a smaller model (Flan-T5-Large). Based on our cross-dataset evaluation, the average performance of models aided by GENES is the best among those methods that rely on a smaller model (Flan-T5-L). These results position GENES as a possible lightweight and open-source alternative.

Multilingual Symptom Detection on Social Media: Enhancing Health-related Fact-checking with LLMs
Saidah Zahrotul Jannah | Elyanah Aco | Shaowen Peng | Shoko Wakamiya | Eiji Aramaki
Proceedings of the Eighth Fact Extraction and VERification Workshop (FEVER)

Social media has emerged as a valueable source for early pandemic detection, as repeated mentions of symptoms by users may signal the onset of an outbreak. However, to be a reliable system, validation through fact-checking and verification against official health records is essential. Without this step, systems risk spreading misinformation to the public. The effectiveness of these systems also depend on their ability to process data in multiple languages, given the multilingual nature of social media data.Yet, many NLP datasets and disease surveillance system remain heavily English-centric, leading to significant performance gaps for low-resource languages.This issue is especially critical in Southeast Asia, where symptom expression may vary culturally and linguistically.Therefore, this study evaluates the symptom detection capabilities of LLMs in social media posts across multiple languages, models, and symptoms to enhance health-related fact-checking. Our results reveal significant language-based discrepancies, with European languages outperforming under-resourced Southeast Asian languages. Furthermore, we identify symptom-specific challenges, particularly in detecting respiratory illnesses such as influenza, which LLMs tend to overpredict.The overestimation or misclassification of symptom mentions can lead to false alarms or public misinformation when deployed in real-world settings. This underscores the importance of symptom detection as a critical first step in medical fact-checking within early outbreak detection systems.

AMR-RE: Abstract Meaning Representations for Retrieval-Based In-Context Learning in Relation Extraction
Peitao Han | Lis Pereira | Fei Cheng | Wan Jou She | Eiji Aramaki
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop)

Existing in-context learning (ICL) methods for relation extraction (RE) often prioritize language similarity over structural similarity, which may result in overlooking entity relationships. We propose an AMR-enhanced retrieval-based ICL method for RE to address this issue. Our model retrieves in-context examples based on semantic structure similarity between task inputs and training samples. We conducted experiments in the supervised setting on four standard English RE datasets. The results show that our method achieves state-of-the-art performance on three datasets and competitive results on the fourth. Furthermore, our method outperforms baselines by a large margin across all datasets in the more demanding unsupervised setting.

ARxHYOKA at TAQEEM2025: Comparative Approaches to Arabic Essay Trait Scoring
Mohamad Alnajjar | Ahmad Almoustafa | Tomohiro Nishiyama | Shoko Wakamiya | Eiji Aramaki | Takuya Matsuzaki
Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks

EmplifAI: a Fine-grained Dataset for Japanese Empathetic Medical Dialogues in 28 Emotion Labels
Wan Jou She | Lis Pereira | Fei Cheng | Sakiko Yahata | Panote Siriaraya | Eiji Aramaki
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics

This paper introduces EmplifAI, a Japanese empathetic dialogue dataset designed to support patients coping with chronic medical conditions. They often experience a wide range of positive and negative emotions (e.g., hope and despair) that shift across different stages of disease management. EmplifAI addresses this complexity by providing situation-based dialogues grounded in 28 fine-grained emotion categories, adapted and validated from the GoEmotions taxonomy. The dataset includes 280 medically contextualized situations and 4,125 two-turn dialogues, collected through crowdsourcing and expert review.To evaluate emotional alignment in empathetic dialogues, we assessed model predictions on situation–dialogue pairs using BERTScore across multiple large language models (LLMs), achieving F1 scores of ≤ 0.83. Fine-tuning a baseline Japanese LLM (LLM-jp-3.1-13b-instruct4) with EmplifAI resulted in notable improvements in fluency, general empathy, and emotion-specific empathy. Furthermore, we compared the scores assigned by LLM-as-a-Judge and human raters on dialogues generated by multiple LLMs to validate our evaluation pipeline and discuss the insights and potential risks derived from the correlation analysis.

2024

Improving Self-training with Prototypical Learning for Source-Free Domain Adaptation on Clinical Text
Seiji Shimizu | Shuntaro Yada | Lisa Raithel | Eiji Aramaki
Proceedings of the 23rd Workshop on Biomedical Natural Language Processing

Domain adaptation is crucial in the clinical domain since the performance of a model trained on one domain (source) degrades seriously when applied to another domain (target). However, conventional domain adaptation methods often cannot be applied due to data sharing restrictions on source data. Source-Free Domain Adaptation (SFDA) addresses this issue by only utilizing a source model and unlabeled target data to adapt to the target domain. In SFDA, self-training is the most widely applied method involving retraining models with target data using predictions from the source model as pseudo-labels. Nevertheless, this approach is prone to contain substantial numbers of errors in pseudo-labeling and might limit model performance in the target domain. In this paper, we propose a Source-Free Prototype-based Self-training (SFPS) aiming to improve the performance of self-training. SFPS generates prototypes without accessing source data and utilizes them for prototypical learning, namely prototype-based pseudo-labeling and contrastive learning. Also, we compare entropy-based, centroid-based, and class-weights-based prototype generation methods to identify the most effective formulation of the proposed method. Experimental results across various datasets demonstrate the effectiveness of the proposed method, consistently outperforming vanilla self-training. The comparison of various prototype-generation methods identifies the most reliable generation method that improves the source model persistently. Additionally, our analysis illustrates SFPS can successfully alleviate errors in pseudo-labeling.

Generating Distributable Surrogate Corpus for Medical Multi-label Classification
Seiji Shimizu | Shuntaro Yada | Shoko Wakamiya | Eiji Aramaki
Proceedings of the First Workshop on Patient-Oriented Language Processing (CL4Health) @ LREC-COLING 2024

In medical and social media domains, annotated corpora are often hard to distribute due to copyrights and privacy issues. To overcome this situation, we propose a new method to generate a surrogate corpus for a downstream task by using a text generation model. We chose a medical multi-label classification task, MedWeb, in which patient-generated short messages express multiple symptoms. We first fine-tuned text generation models with different prompting designs on the original corpus to obtain synthetic versions of that corpus. To assess the viability of the generated corpora for the downstream task, we compared the performance of multi-label classification models trained either on the original or the surrogate corpora. The results and the error analysis showed the difficulty of generating surrogate corpus in multi-label settings, suggesting text generation under complex conditions is not trivial. On the other hand, our experiment demonstrates that the generated corpus with a sentinel-based prompting is comparatively viable in a single-label (multiclass) classification setting.

Estimation of Happiness Changes through Longitudinal Analysis of Employees’ Texts
Junko Hayashi | Kazuhiro Ito | Masae Manabe | Yasushi Watanabe | Masataka Nakayama | Yukiko Uchida | Shoko Wakamiya | Eiji Aramaki
Proceedings of the 14th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis

Measuring happiness as a determinant of well-being is increasingly recognized as crucial. While previous studies have utilized free-text descriptions to estimate happiness on a broad scale, limited research has focused on tracking individual fluctuations in happiness over time owing to the challenges associated with longitudinal data collection. This study addresses this issue by obtaining longitudinal data from two workplaces over two and six months respectively.Subsequently, the data is used to construct a happiness estimation model and assess individual happiness levels.Evaluation of the model performance using correlation coefficients shows variability in the correlation values among individuals.Notably, the model performs satisfactorily in estimating 9 of the 11 users’ happiness scores, with a correlation coefficient of 0.4 or higher. To investigate the factors affecting the model performance, we examine the relationship between the model performance and variables such as sentence length, lexical diversity, and personality traits. Correlations are observed between these features and model performance.

A Dataset for Pharmacovigilance in German, French, and Japanese: Annotating Adverse Drug Reactions across Languages
Lisa Raithel | Hui-Syuan Yeh | Shuntaro Yada | Cyril Grouin | Thomas Lavergne | Aurélie Névéol | Patrick Paroubek | Philippe Thomas | Tomohiro Nishiyama | Sebastian Möller | Eiji Aramaki | Yuji Matsumoto | Roland Roller | Pierre Zweigenbaum
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

User-generated data sources have gained significance in uncovering Adverse Drug Reactions (ADRs), with an increasing number of discussions occurring in the digital world. However, the existing clinical corpora predominantly revolve around scientific articles in English. This work presents a multilingual corpus of texts concerning ADRs gathered from diverse sources, including patient fora, social media, and clinical reports in German, French, and Japanese. Our corpus contains annotations covering 12 entity types, four attribute types, and 13 relation types. It contributes to the development of real-world multilingual language models for healthcare. We provide statistics to highlight certain challenges associated with the corpus and conduct preliminary experiments resulting in strong baselines for extracting entities and relations between these entities, both within and across languages.

Assessing Authenticity and Anonymity of Synthetic User-generated Content in the Medical Domain
Tomohiro Nishiyama | Lisa Raithel | Roland Roller | Pierre Zweigenbaum | Eiji Aramaki
Proceedings of the Workshop on Computational Approaches to Language Data Pseudonymization (CALD-pseudo 2024)

Since medical text cannot be shared easily due to privacy concerns, synthetic data bears much potential for natural language processing applications. In the context of social media and user-generated messages about drug intake and adverse drug effects, this work presents different methods to examine the authenticity of synthetic text. We conclude that the generated tweets are untraceable and show enough authenticity from the medical point of view to be used as a replacement for a real Twitter corpus. However, original data might still be the preferred choice as they contain much more diversity.

Overview of #SMM4H 2024 – Task 2: Cross-Lingual Few-Shot Relation Extraction for Pharmacovigilance in French, German, and Japanese
Lisa Raithel | Philippe Thomas | Bhuvanesh Verma | Roland Roller | Hui-Syuan Yeh | Shuntaro Yada | Cyril Grouin | Shoko Wakamiya | Eiji Aramaki | Sebastian Möller | Pierre Zweigenbaum
Proceedings of the 9th Social Media Mining for Health Research and Applications (SMM4H 2024) Workshop and Shared Tasks

This paper provides an overview of Task 2 from the Social Media Mining for Health 2024 shared task (#SMM4H 2024), which focused on Named Entity Recognition (NER, Subtask 2a) and the joint task of NER and Relation Extraction (RE, Subtask 2b) for detecting adverse drug reactions (ADRs) in German, Japanese, and French texts written by patients. Participants were challenged with a few-shot learning scenario, necessitating models that can effectively generalize from limited annotated examples. Despite the diverse strategies employed by the participants, the overall performance across submissions from three teams highlighted significant challenges. The results underscored the complexity of extracting entities and relations in multi-lingual contexts, especially from the noisy and informal nature of user-generated content. Further research is required to develop robust systems capable of accurately identifying and associating ADR-related information in low-resource and multilingual settings.

Semi-automatic Construction of a Word Complexity Lexicon for Japanese Medical Terminology
Soichiro Sugihara | Tomoyuki Kajiwara | Takashi Ninomiya | Shoko Wakamiya | Eiji Aramaki
Proceedings of the 6th Clinical Natural Language Processing Workshop

We construct a word complexity lexicon for medical terms in Japanese.To facilitate communication between medical practitioners and patients, medical text simplification is being studied.Medical text simplification is a natural language processing task that paraphrases complex technical terms into expressions that patients can understand.However, in contrast to English, where this task is being actively studied, there are insufficient language resources in Japanese.As a first step in advancing research on medical text simplification in Japanese, we annotate the 370,000 words from a large-scale medical terminology lexicon with a five-point scale of complexity for patients.

For the past nine years, the Social Media Mining for Health Applications (#SMM4H) shared tasks have promoted community-driven development and evaluation of advanced natural language processing systems to detect, extract, and normalize health-related information in publicly available user-generated content. This year, #SMM4H included seven shared tasks in English, Japanese, German, French, and Spanish from Twitter, Reddit, and health forums. A total of 84 teams from 22 countries registered for #SMM4H, and 45 teams participated in at least one task. This represents a growth of 180% and 160% in registration and participation, respectively, compared to the last iteration. This paper provides an overview of the tasks and participating systems. The data sets remain available upon request, and new systems can be evaluated through the post-evaluation phase on CodaLab.

Synchronizing Approach in Designing Annotation Guidelines for Multilingual Datasets: A COVID-19 Case Study Using English and Japanese Tweets
Kiki Ferawati | Wan Jou She | Shoko Wakamiya | Eiji Aramaki
Proceedings of the 2nd Workshop on Cross-Cultural Considerations in NLP

The difference in culture between the U.S. and Japan is a popular subject for Western vs. Eastern cultural comparison for researchers. One particular challenge is to obtain and annotate multilingual datasets. In this study, we utilized COVID-19 tweets from the two countries as a case study, focusing particularly on discussions concerning masks. The annotation task was designed to gain insights into societal attitudes toward the mask policies implemented in both countries. The aim of this study is to provide a practical approach for the annotation task by thoroughly documenting how we aligned the multilingual annotation guidelines to obtain a comparable dataset. We proceeded to document the effective practices during our annotation process to synchronize our multilingual guidelines. Furthermore, we discussed difficulties caused by differences in expression style and culture, and potential strategies that helped improve our agreement scores and reduce discrepancies between the annotation results in both languages. These findings offer an alternative method for synchronizing multilingual annotation guidelines and achieving feasible agreement scores for cross-cultural annotation tasks. This study resulted in a multilingual guideline in English and Japanese to annotate topics related to public discourses about COVID-19 masks in the U.S. and Japan.

Loneliness Episodes: A Japanese Dataset for Loneliness Detection and Analysis
Naoya Fujikawa | Quang Toan Nguyen | Kazuhiro Ito | Shoko Wakamiya | Eiji Aramaki
Proceedings of the 14th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis

Loneliness, a significant public health concern, is closely connected to both physical and mental well-being. Hence, detection and intervention for individuals experiencing loneliness are crucial. Identifying loneliness in text is straightforward when it is explicitly stated but challenging when it is implicit. Detecting implicit loneliness requires a manually annotated dataset because whereas explicit loneliness can be detected using keywords, implicit loneliness cannot be. However, there are no freely available datasets with clear annotation guidelines for implicit loneliness. In this study, we construct a freely accessible Japanese loneliness dataset with annotation guidelines grounded in the psychological definition of loneliness. This dataset covers loneliness intensity and the contributing factors of loneliness. We train two models to classify whether loneliness is expressed and the intensity of loneliness. The model classifying loneliness versus non-loneliness achieves an F1-score of 0.833, but the model for identifying the intensity of loneliness has a low F1-score of 0.400, which is likely due to label imbalance and a shortage of a certain label in the dataset. We validate performance in another domain, specifically X (formerly Twitter), and observe a decrease. In addition, we propose improvement suggestions for domain adaptation.

QA-based Event Start-Points Ordering for Clinical Temporal Relation Annotation
Seiji Shimizu | Lis Pereira | Shuntaro Yada | Eiji Aramaki
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Temporal relation annotation in the clinical domain is crucial yet challenging due to its workload and the medical expertise required. In this paper, we propose a novel annotation method that integrates event start-points ordering and question-answering (QA) as the annotation format. By focusing only on two points on a timeline, start-points ordering reduces ambiguity and simplifies the relation set to be considered during annotation. QA as annotation recasts temporal relation annotation into a reading comprehension task, allowing annotators to use natural language instead of the formalisms commonly adopted in temporal relation annotation. Based on our method, most of the relations in a document are inferable from a significantly smaller number of explicitly annotated relations, showing the efficiency of our proposed method. Using these inferred relations, we develop a temporal relation classification model that achieves a 0.72 F1 score. Also, by decomposing the annotation process into QA generation and QA validation, our method enables collaboration among medical experts and non-experts. We obtained high inter-annotator agreement (IAA) scores, which indicate the positive prospect of such collaboration in the annotation process. Our annotated corpus, annotation tool, and trained model are publicly available: https://github.com/seiji-shimizu/qa-start-ordering.

2023

Comparative evaluation of boundary-relaxed annotation for Entity Linking performance
Gabriel Herman Bernardim Andrade | Shuntaro Yada | Eiji Aramaki
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Entity Linking performance has a strong reliance on having a large quantity of high-quality annotated training data available. Yet, manual annotation of named entities, especially their boundaries, is ambiguous, error-prone, and raises many inconsistencies between annotators. While imprecise boundary annotation can degrade a model’s performance, there are applications where accurate extraction of entities’ surface form is not necessary. For those cases, a lenient annotation guideline could relieve the annotators’ workload and speed up the process. This paper presents a case study designed to verify the feasibility of such annotation process and evaluate the impact of boundary-relaxed annotation in an Entity Linking pipeline. We first generate a set of noisy versions of the widely used AIDA CoNLL-YAGO dataset by expanding the boundaries subsets of annotated entity mentions and then train three Entity Linking models on this data and evaluate the relative impact of imprecise annotation on entity recognition and disambiguation performances. We demonstrate that the magnitude of effects caused by noise in the Named Entity Recognition phase is dependent on both model complexity and noise ratio, while Entity Disambiguation components are susceptible to entity boundary imprecision due to strong vocabulary dependency.

2022

Annotation-Scheme Reconstruction for “Fake News” and Japanese Fake News Dataset
Taichi Murayama | Shohei Hisada | Makoto Uehara | Shoko Wakamiya | Eiji Aramaki
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Fake news provokes many societal problems; therefore, there has been extensive research on fake news detection tasks to counter it. Many fake news datasets were constructed as resources to facilitate this task. Contemporary research focuses almost exclusively on the factuality aspect of the news. However, this aspect alone is insufficient to explain “fake news,” which is a complex phenomenon that involves a wide range of issues. To fully understand the nature of each instance of fake news, it is important to observe it from various perspectives, such as the intention of the false news disseminator, the harmfulness of the news to our society, and the target of the news. We propose a novel annotation scheme with fine-grained labeling based on detailed investigations of existing fake news datasets to capture these various aspects of fake news. Using the annotation scheme, we construct and publish the first Japanese fake news dataset. The annotation scheme is expected to provide an in-depth understanding of fake news. We plan to build datasets for both Japanese and other languages using our scheme. Our Japanese dataset is published at https://hkefka385.github.io/dataset/fakenews-japanese/.

PICO Corpus: A Publicly Available Corpus to Support Automatic Data Extraction from Biomedical Literature
Faith Mutinda | Kongmeng Liew | Shuntaro Yada | Shoko Wakamiya | Eiji Aramaki
Proceedings of the First Workshop on Information Extraction from Scientific Publications

We present a publicly available corpus with detailed annotations describing the core elements of clinical trials: Participants, Intervention, Control, and Outcomes. The corpus consists of 1011 abstracts of breast cancer randomized controlled trials extracted from the PubMed database. The corpus improves previous corpora by providing detailed annotations for outcomes to identify numeric texts that report the number of participants that experience specific outcomes. The corpus will be helpful for the development of systems for automatic extraction of data from randomized controlled trial literature to support evidence-based medicine. Additionally, we demonstrate the feasibility of the corpus by using two strong baselines for named entity recognition task. Most of the entities achieve F1 scores greater than 0.80 demonstrating the quality of the dataset.

Emotion Analysis of Writers and Readers of Japanese Tweets on Vaccinations
Patrick John Ramos | Kiki Ferawati | Kongmeng Liew | Eiji Aramaki | Shoko Wakamiya
Proceedings of the 12th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis

Public opinion in social media is increasingly becoming a critical factor in pandemic control. Understanding the emotions of a population towards vaccinations and COVID-19 may be valuable in convincing members to become vaccinated. We investigated the emotions of Japanese Twitter users towards Tweets related to COVID-19 vaccination. Using the WRIME dataset, which provides emotion ratings for Japanese Tweets sourced from writers (Tweet posters) and readers, we fine-tuned a BERT model to predict levels of emotional intensity. This model achieved a training accuracy of MSE = 0.356. A separate dataset of 20,254 Japanese Tweets containing COVID-19 vaccine-related keywords was also collected, on which the fine-tuned BERT was used to perform emotion analysis. Afterwards, a correlation analysis between the extracted emotions and a set of vaccination measures in Japan was conducted.The results revealed that surprise and fear were the most intense emotions predicted by the model for writers and readers, respectively, on the vaccine-related Tweet dataset. The correlation analysis also showed that vaccinations were weakly positively correlated with predicted levels of writer joy, writer/reader anticipation, and writer/reader trust.

JaMIE: A Pipeline Japanese Medical Information Extraction System with Novel Relation Annotation
Fei Cheng | Shuntaro Yada | Ribeka Tanaka | Eiji Aramaki | Sadao Kurohashi
Proceedings of the Thirteenth Language Resources and Evaluation Conference

In the field of Japanese medical information extraction, few analyzing tools are available and relation extraction is still an under-explored topic. In this paper, we first propose a novel relation annotation schema for investigating the medical and temporal relations between medical entities in Japanese medical reports. We experiment with the practical annotation scenarios by separately annotating two different types of reports. We design a pipeline system with three components for recognizing medical entities, classifying entity modalities, and extracting relations. The empirical results show accurate analyzing performance and suggest the satisfactory annotation quality, the superiority of the latest contextual embedding models. and the feasible annotation strategy for high-accuracy demand.

2021

End-to-end Biomedical Entity Linking with Span-based Dictionary Matching
Shogo Ujiie | Hayate Iso | Shuntaro Yada | Shoko Wakamiya | Eiji Aramaki
Proceedings of the 20th Workshop on Biomedical Language Processing

Disease name recognition and normalization is a fundamental process in biomedical text mining. Recently, neural joint learning of both tasks has been proposed to utilize the mutual benefits. While this approach achieves high performance, disease concepts that do not appear in the training dataset cannot be accurately predicted. This study introduces a novel end-to-end approach that combines span representations with dictionary-matching features to address this problem. Our model handles unseen concepts by referring to a dictionary while maintaining the performance of neural network-based models. Experiments using two major datasaets demonstrate that our model achieved competitive results with strong baselines, especially for unseen concepts during training.

Mitigation of Diachronic Bias in Fake News Detection Dataset
Taichi Murayama | Shoko Wakamiya | Eiji Aramaki
Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021)

Fake news causes significant damage to society. To deal with these fake news, several studies on building detection models and arranging datasets have been conducted. Most of the fake news datasets depend on a specific time period. Consequently, the detection models trained on such a dataset have difficulty detecting novel fake news generated by political changes and social changes; they may possibly result in biased output from the input, including specific person names and organizational names. We refer to this problem as Diachronic Bias because it is caused by the creation date of news in each dataset. In this study, we confirm the bias, especially proper nouns including person names, from the deviation of phrase appearances in each dataset. Based on these findings, we propose masking methods using Wikidata to mitigate the influence of person names and validate whether they make fake news detection models robust through experiments with in-domain and out-of-domain data.

Are Metal Fans Angrier than Jazz Fans? A Genre-Wise Exploration of the Emotional Language of Music Listeners on Reddit
Vipul Mishra | Kongmeng Liew | Elena V. Epure | Romain Hennequin | Eiji Aramaki
Proceedings of the 2nd Workshop on NLP for Music and Spoken Audio (NLP4MusA)

2020

Offensive Language Detection on Video Live Streaming Chat
Zhiwei Gao | Shuntaro Yada | Shoko Wakamiya | Eiji Aramaki
Proceedings of the 28th International Conference on Computational Linguistics

This paper presents a prototype of a chat room that detects offensive expressions in a video live streaming chat in real time. Focusing on Twitch, one of the most popular live streaming platforms, we created a dataset for the task of detecting offensive expressions. We collected 2,000 chat posts across four popular game titles with genre diversity (e.g., competitive, violent, peaceful). To make use of the similarity in offensive expressions among different social media platforms, we adopted state-of-the-art models trained on offensive expressions from Twitter for our Twitch data (i.e., transfer learning). We investigated two similarity measurements to predict the transferability, textual similarity, and game-genre similarity. Our results show that the transfer of features from social media to live streaming is effective. However, the two measurements show less correlation in the transferability prediction.

Classification of Nostalgic Music Through LDA Topic Modeling and Sentiment Analysis of YouTube Comments in Japanese Songs
Kongmeng Liew | Yukiko Uchida | Nao Maeura | Eiji Aramaki
Proceedings of the 1st Workshop on NLP for Music and Audio (NLP4MusA)

Towards a Versatile Medical-Annotation Guideline Feasible Without Heavy Medical Knowledge: Starting From Critical Lung Diseases
Shuntaro Yada | Ayami Joh | Ribeka Tanaka | Fei Cheng | Eiji Aramaki | Sadao Kurohashi
Proceedings of the Twelfth Language Resources and Evaluation Conference

Applying natural language processing (NLP) to medical and clinical texts can bring important social benefits by mining valuable information from unstructured text. A popular application for that purpose is named entity recognition (NER), but the annotation policies of existing clinical corpora have not been standardized across clinical texts of different types. This paper presents an annotation guideline aimed at covering medical documents of various types such as radiography interpretation reports and medical records. Furthermore, the annotation was designed to avoid burdensome requirements related to medical knowledge, thereby enabling corpus development without medical specialists. To achieve these design features, we specifically focus on critical lung diseases to stabilize linguistic patterns in corpora. After annotating around 1100 electronic medical records following the annotation scheme, we demonstrated its feasibility using an NER task. Results suggest that our guideline is applicable to large-scale clinical NLP projects.

2019

Learning to Select, Track, and Generate for Data-to-Text
Hayate Iso | Yui Uehara | Tatsuya Ishigaki | Hiroshi Noji | Eiji Aramaki | Ichiro Kobayashi | Yusuke Miyao | Naoaki Okazaki | Hiroya Takamura
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

We propose a data-to-text generation model with two modules, one for tracking and the other for text generation. Our tracking module selects and keeps track of salient information and memorizes which record has been mentioned. Our generation module generates a summary conditioned on the state of tracking module. Our proposed model is considered to simulate the human-like writing process that gradually selects the information by determining the intermediate variables while writing the summary. In addition, we also explore the effectiveness of the writer information for generations. Experimental results show that our proposed model outperforms existing models in all evaluation metrics even without writer information. Incorporating writer information further improves the performance, contributing to content planning and surface realization.

2018

J-MeDic: A Japanese Disease Name Dictionary based on Real Clinical Usage
Kaoru Ito | Hiroyuki Nagai | Taro Okahisa | Shoko Wakamiya | Tomohide Iwao | Eiji Aramaki
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

Multivariate Linear Regression of Symptoms-related Tweets for Infectious Gastroenteritis Scale Estimation
Ryo Takeuchi | Hayate Iso | Kaoru Ito | Shoko Wakamiya | Eiji Aramaki
Proceedings of the International Workshop on Digital Disease Detection using Social Media 2017 (DDDSM-2017)

To date, various Twitter-based event detection systems have been proposed. Most of their targets, however, share common characteristics. They are seasonal or global events such as earthquakes and flu pandemics. In contrast, this study targets unseasonal and local disease events. Our system investigates the frequencies of disease-related words such as “nausea”,“chill”,and “diarrhea” and estimates the number of patients using regression of these word frequencies. Experiments conducted using Japanese 47 areas from January 2017 to April 2017 revealed that the detection of small and unseasonal event is extremely difficult (overall performance: 0.13). However, we found that the event scale and the detection performance show high correlation in the specified cases (in the phase of patient increasing or decreasing). The results also suggest that when 150 and more patients appear in a high population area, we can expect that our social sensors detect this outbreak. Based on these results, we can infer that social sensors can reliably detect unseasonal and local disease events under certain conditions, just as they can for seasonal or global events.

2016

Detecting Japanese Patients with Alzheimer’s Disease based on Word Category Frequencies
Daisaku Shibata | Shoko Wakamiya | Ayae Kinoshita | Eiji Aramaki
Proceedings of the Clinical Natural Language Processing Workshop (ClinicalNLP)

In recent years, detecting Alzheimer disease (AD) in early stages based on natural language processing (NLP) has drawn much attention. To date, vocabulary size, grammatical complexity, and fluency have been studied using NLP metrics. However, the content analysis of AD narratives is still unreachable for NLP. This study investigates features of the words that AD patients use in their spoken language. After recruiting 18 examinees of 53–90 years old (mean: 76.89), they were divided into two groups based on MMSE scores. The AD group comprised 9 examinees with scores of 21 or lower. The healthy control group comprised 9 examinees with a score of 22 or higher. Linguistic Inquiry and Word Count (LIWC) classified words were used to categorize the words that the examinees used. The word frequency was found from observation. Significant differences were confirmed for the usage of impersonal pronouns in the AD group. This result demonstrated the basic feasibility of the proposed NLP-based detection approach.

MedNLPDoc: Japanese Shared Task for Clinical NLP
Eiji Aramaki | Yoshinobu Kano | Tomoko Ohkuma | Mizuki Morita
Proceedings of the Clinical Natural Language Processing Workshop (ClinicalNLP)

Due to the recent replacements of physical documents with electronic medical records (EMR), the importance of information processing in medical fields has been increased. We have been organizing the MedNLP task series in NTCIR-10 and 11. These workshops were the first shared tasks which attempt to evaluate technologies that retrieve important information from medical reports written in Japanese. In this report, we describe the NTCIR-12 MedNLPDoc task which is designed for more advanced and practical use for the medical fields. This task is considered as a multi-labeling task to a patient record. This report presents results of the shared task, discusses and illustrates remained issues in the medical natural language processing field.

Forecasting Word Model: Twitter-based Influenza Surveillance and Prediction
Hayate Iso | Shoko Wakamiya | Eiji Aramaki
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Because of the increasing popularity of social media, much information has been shared on the internet, enabling social media users to understand various real world events. Particularly, social media-based infectious disease surveillance has attracted increasing attention. In this work, we specifically examine influenza: a common topic of communication on social media. The fundamental theory of this work is that several words, such as symptom words (fever, headache, etc.), appear in advance of flu epidemic occurrence. Consequently, past word occurrence can contribute to estimation of the number of current patients. To employ such forecasting words, one can first estimate the optimal time lag for each word based on their cross correlation. Then one can build a linear model consisting of word frequencies at different time points for nowcasting and for forecasting influenza epidemics. Experimentally obtained results (using 7.7 million tweets of August 2012 – January 2016), the proposed model achieved the best nowcasting performance to date (correlation ratio 0.93) and practically sufficient forecasting performance (correlation ratio 0.91 in 1-week future prediction, and correlation ratio 0.77 in 3-weeks future prediction). This report is the first of the relevant literature to describe a model enabling prediction of future epidemics using Twitter.

2015

Who caught a cold ? - Identifying the subject of a symptom
Shin Kanouchi | Mamoru Komachi | Naoaki Okazaki | Eiji Aramaki | Hiroshi Ishikawa
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Disease Event Detection based on Deep Modality Analysis
Yoshiaki Kitagawa | Mamoru Komachi | Eiji Aramaki | Naoaki Okazaki | Hiroshi Ishikawa
Proceedings of the ACL-IJCNLP 2015 Student Research Workshop

Location Name Disambiguation Exploiting Spatial Proximity and Temporal Consistency
Takashi Awamura | Daisuke Kawahara | Eiji Aramaki | Tomohide Shibata | Sadao Kurohashi
Proceedings of the third International Workshop on Natural Language Processing for Social Media

2013

Word in a Dictionary is used by Numerous Users
Eiji Aramaki | Sachiko Maskawa | Mai Miyabe | Mizuki Morita | Sachi Yasuda
Proceedings of the Sixth International Joint Conference on Natural Language Processing

Clinical Vocabulary and Clinical Finding Concepts in Medical Literature
Takashi Okumura | Eiji Aramaki | Yuka Tateisi
The First Workshop on Natural Language Processing for Medical and Healthcare Fields

Proper and Efficient Treatment of Anaphora and Long-Distance Dependency in Context-Free Grammar: An Experiment with Medical Text
Wailok Tam | Koiti Hasida | Yusuke Matsubara | Eiji Aramaki | Mai Miyabe | Motoyuki Takaai | Hirosi Uozaki
The First Workshop on Natural Language Processing for Medical and Healthcare Fields

The First Workshop on Natural Language Processing for Medical and Healthcare Fields
Eiji Aramaki | Mizuki Morita
The First Workshop on Natural Language Processing for Medical and Healthcare Fields

Incorporating Knowledge Resources to Enhance Medical Information Extraction
Yasuhide Miura | Tomoko Ohkuma | Hiroshi Masuichi | Emiko Yamada Shinohara | Eiji Aramaki | Kazuhiko Ohe
The First Workshop on Natural Language Processing for Medical and Healthcare Fields

2011

Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter
Eiji Aramaki | Sachiko Maskawa | Mizuki Morita
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

2010

Adverse-Effect Relations Extraction from Massive Clinical Records
Yasuhide Miura | Eiji Aramaki | Tomoko Ohkuma | Masatsugu Tonoike | Daigo Sugihara | Hiroshi Masuichi | Kazuhiko Ohe
Proceedings of the Second Workshop on NLP Challenges in the Information Explosion Era (NLPIX 2010)

Using Various Features in Machine Learning to Obtain High Levels of Performance for Recognition of Japanese Notational Variants
Masahiro Kojima | Masaki Murata | Jun’ichi Kazama | Kow Kuroda | Atsushi Fujita | Eiji Aramaki | Masaaki Tsuchida | Yasuhiko Watanabe | Kentaro Torisawa
Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation

2009

Fast Decoding and Easy Implementation: Transliteration as Sequential Labeling
Eiji Aramaki | Takeshi Abekawa
Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009)

TEXT2TABLE: Medical Text Summarization System Based on Named Entity Recognition and Modality Identification
Eiji Aramaki | Yasuhide Miura | Masatsugu Tonoike | Tomoko Ohkuma | Hiroshi Mashuichi | Kazuhiko Ohe
Proceedings of the BioNLP 2009 Workshop

2008

Orthographic Disambiguation Incorporating Transliterated Probability
Eiji Aramaki | Takeshi Imai | Kengo Miyo | Kazuhiko Ohe
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I

2007

UTH: SVM-based Semantic Relation Classification using Physical Sizes
Eiji Aramaki | Takeshi Imai | Kengo Miyo | Kazuhiko Ohe
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)

Support vector machine based orthographic disambiguation
Eiji Aramaki | Takeshi Imai | Kengo Miyo | Kazuhiko Ohe
Proceedings of the 11th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages: Papers

2005

Toward Medical Ontology using Natural Language Processing
Eiji Aramaki | Takeshi Imai | Masayo Kashiwagi | Masayuki Kajino | Kengo Miyo | Kazuhiko Ohe
Proceedings of OntoLex 2005 - Ontologies and Lexical Resources

Probabilistic Model for Example-based Machine Translation
Eiji Aramaki | Sadao Kurohashi | Hideki Kashioka | Naoto Kato
Proceedings of Machine Translation Summit X: Papers

Example-based machine translation (EBMT) systems, so far, rely on heuristic measures in retrieving translation examples. Such a heuristic measure costs time to adjust, and might make its algorithm unclear. This paper presents a probabilistic model for EBMT. Under the proposed model, the system searches the translation example combination which has the highest probability. The proposed model clearly formalizes EBMT process. In addition, the model can naturally incorporate the context similarity of translation examples. The experimental results demonstrate that the proposed model has a slightly better translation quality than state-of-the-art EBMT systems.

2004

Example-based machine translation using structural translation examples
Eiji Aramaki | Sadao Kurohashi
Proceedings of the First International Workshop on Spoken Language Translation: Evaluation Campaign

2003

Word Selection for EBMT based on Monolingual Similarity and Translation Confidence
Eiji Aramaki | Sadao Kurohashi | Hideki Kashioka | Hideki Tanaka
Proceedings of the HLT-NAACL 2003 Workshop on Building and Using Parallel Texts: Data Driven Machine Translation and Beyond

2001

Finding translation correspondences from parallel parsed corpus for example-based translation
Eiji Aramaki | Sadao Kurohashi | Satoshi Sato | Hideo Watanabe
Proceedings of Machine Translation Summit VIII

This paper describes a system for finding phrasal translation correspondences from parallel parsed corpus that are collections paired English and Japanese sentences. First, the system finds phrasal correspondences by Japanese-English translation dictionary consultation. Then, the system finds correspondences in remaining phrases by using sentences dependency structures and the balance of all correspondences. The method is based on an assumption that in parallel corpus most fragments in a source sentence have corresponding fragments in a target sentence.

2000

Finding Structural Correspondences from Bilingual Parsed Corpus for Corpus-based Translation
Hideo Watanabe | Sadao Kurohashi | Eiji Aramaki
COLING 2000 Volume 2: The 18th International Conference on Computational Linguistics

Co-authors

Seiji Shimizu 5

Kongmeng Liew 4

Mizuki Morita 4

Tomoko Ohkuma 4

Roland Roller 4

Pierre Zweigenbaum 4

Hiroshi Masuichi 3

Yasuhide Miura 3

Tomohiro Nishiyama 3

Naoaki Okazaki 3

Philippe Thomas 3

Kiki Ferawati 2

Hiroshi Ishikawa 2

Tomoyuki Kajiwara 2

Hideki Kashioka 2

Mamoru Komachi 2

Sachiko Maskawa 2

Taichi Murayama 2

Sebastian Möller 2

Takashi Ninomiya 2

Ribeka Tanaka 2

Masatsugu Tonoike 2

Yukiko Uchida 2

Hideo Watanabe 2

Hui-Syuan Yeh 2

Takeshi Abekawa 1

Ahmad Almoustafa 1

Mohamad Alnajjar 1

Takashi Awamura 1

Ibrahim Baroud 1

Yuchang Cheng 1

Elena V. Epure 1

Naoya Fujikawa 1

Atsushi Fujita 1

Guillermo Garcia 1

Graciela Gonzalez 1

Junko Hayashi 1

Romain Hennequin 1

Gabriel Herman Bernardim Andrade 1

Sophia Hernandez 1

Tatsuya Hiraoka 1

Shohei Hisada 1

Koki Horiguchi 1

Tatsuya Ishigaki 1

Tomoya Iwakura 1

Tomohide Iwao 1

Saidah Zahrotul Jannah 1

Masayuki Kajino 1

Yoshinobu Kano 1

Shin Kanouchi 1

Masayo Kashiwagi 1

Daisuke Kawahara 1

Jun′ichi Kazama 1

Ayae Kinoshita 1

Yoshiaki Kitagawa 1

Ichiro Kobayashi 1

Masahiro Kojima 1

Thomas Lavergne 1

Yusuke Matsubara 1

Yuji Matsumoto 1

Takuya Matsuzaki 1

Masaki Murata 1

Faith Mutinda 1

Hiroyuki Nagai 1

Masataka Nakayama 1

Aurelie Neveol 1

Quang Toan Nguyen 1

Takashi Okumura 1

Karen O’Connor 1

Patrick Paroubek 1

Patrick John Ramos 1

Raul Rodriguez-Esteban 1

Vishakha Sharma 1

Daisaku Shibata 1

Tomohide Shibata 1

Hisada Shohei 1

Panote Siriaraya 1

Soichiro Sugihara 1

Daigo Sugihara 1

Motoyuki Takaai 1

Hiroya Takamura 1

Hideki Tanaka 1

Kentaro Torisawa 1

Masaaki Tsuchida 1

Makoto Uehara 1

Hirosi Uozaki 1

Michael Van Supranes 1

Bhuvanesh Verma 1

Yasushi Watanabe 1

Yasuhiko Watanabe 1

Davy Weissenbacher 1

Sakiko Yahata 1

Emiko Yamada Shinohara 1

Venues