Shoko Wakamiya


2024

pdf bib
Synchronizing Approach in Designing Annotation Guidelines for Multilingual Datasets: A COVID-19 Case Study Using English and Japanese Tweets
Kiki Ferawati | Wan Jou She | Shoko Wakamiya | Eiji Aramaki
Proceedings of the 2nd Workshop on Cross-Cultural Considerations in NLP

The difference in culture between the U.S. and Japan is a popular subject for Western vs. Eastern cultural comparison for researchers. One particular challenge is to obtain and annotate multilingual datasets. In this study, we utilized COVID-19 tweets from the two countries as a case study, focusing particularly on discussions concerning masks. The annotation task was designed to gain insights into societal attitudes toward the mask policies implemented in both countries. The aim of this study is to provide a practical approach for the annotation task by thoroughly documenting how we aligned the multilingual annotation guidelines to obtain a comparable dataset. We proceeded to document the effective practices during our annotation process to synchronize our multilingual guidelines. Furthermore, we discussed difficulties caused by differences in expression style and culture, and potential strategies that helped improve our agreement scores and reduce discrepancies between the annotation results in both languages. These findings offer an alternative method for synchronizing multilingual annotation guidelines and achieving feasible agreement scores for cross-cultural annotation tasks. This study resulted in a multilingual guideline in English and Japanese to annotate topics related to public discourses about COVID-19 masks in the U.S. and Japan.

pdf bib
Arukikata Travelogue Dataset with Geographic Entity Mention, Coreference, and Link Annotation
Shohei Higashiyama | Hiroki Ouchi | Hiroki Teranishi | Hiroyuki Otomo | Yusuke Ide | Aitaro Yamamoto | Hiroyuki Shindo | Yuki Matsuda | Shoko Wakamiya | Naoya Inoue | Ikuya Yamada | Taro Watanabe
Findings of the Association for Computational Linguistics: EACL 2024

Geoparsing is a fundamental technique for analyzing geo-entity information in text, which is useful for geographic applications, e.g., tourist spot recommendation. We focus on document-level geoparsing that considers geographic relatedness among geo-entity mentions and present a Japanese travelogue dataset designed for training and evaluating document-level geoparsing systems. Our dataset comprises 200 travelogue documents with rich geo-entity information: 12,171 mentions, 6,339 coreference clusters, and 2,551 geo-entities linked to geo-database entries.

pdf bib
Overview of #SMM4H 2024 – Task 2: Cross-Lingual Few-Shot Relation Extraction for Pharmacovigilance in French, German, and Japanese
Lisa Raithel | Philippe Thomas | Bhuvanesh Verma | Roland Roller | Hui-Syuan Yeh | Shuntaro Yada | Cyril Grouin | Shoko Wakamiya | Eiji Aramaki | Sebastian Möller | Pierre Zweigenbaum
Proceedings of The 9th Social Media Mining for Health Research and Applications (SMM4H 2024) Workshop and Shared Tasks

This paper provides an overview of Task 2 from the Social Media Mining for Health 2024 shared task (#SMM4H 2024), which focused on Named Entity Recognition (NER, Subtask 2a) and the joint task of NER and Relation Extraction (RE, Subtask 2b) for detecting adverse drug reactions (ADRs) in German, Japanese, and French texts written by patients. Participants were challenged with a few-shot learning scenario, necessitating models that can effectively generalize from limited annotated examples. Despite the diverse strategies employed by the participants, the overall performance across submissions from three teams highlighted significant challenges. The results underscored the complexity of extracting entities and relations in multi-lingual contexts, especially from the noisy and informal nature of user-generated content. Further research is required to develop robust systems capable of accurately identifying and associating ADR-related information in low-resource and multilingual settings.

pdf bib
Overview of the 9th Social Media Mining for Health Applications (#SMM4H) Shared Tasks at ACL 2024 – Large Language Models and Generalizability for Social Media NLP
Dongfang Xu | Guillermo Garcia | Lisa Raithel | Philippe Thomas | Roland Roller | Eiji Aramaki | Shoko Wakamiya | Shuntaro Yada | Pierre Zweigenbaum | Karen O’Connor | Sai Samineni | Sophia Hernandez | Yao Ge | Swati Rajwal | Sudeshna Das | Abeed Sarker | Ari Klein | Ana Schmidt | Vishakha Sharma | Raul Rodriguez-Esteban | Juan Banda | Ivan Amaro | Davy Weissenbacher | Graciela Gonzalez-Hernandez
Proceedings of The 9th Social Media Mining for Health Research and Applications (SMM4H 2024) Workshop and Shared Tasks

For the past nine years, the Social Media Mining for Health Applications (#SMM4H) shared tasks have promoted community-driven development and evaluation of advanced natural language processing systems to detect, extract, and normalize health-related information in publicly available user-generated content. This year, #SMM4H included seven shared tasks in English, Japanese, German, French, and Spanish from Twitter, Reddit, and health forums. A total of 84 teams from 22 countries registered for #SMM4H, and 45 teams participated in at least one task. This represents a growth of 180% and 160% in registration and participation, respectively, compared to the last iteration. This paper provides an overview of the tasks and participating systems. The data sets remain available upon request, and new systems can be evaluated through the post-evaluation phase on CodaLab.

pdf bib
Generating Distributable Surrogate Corpus for Medical Multi-label Classification
Seiji Shimizu | Shuntaro Yada | Shoko Wakamiya | Eiji Aramaki
Proceedings of the First Workshop on Patient-Oriented Language Processing (CL4Health) @ LREC-COLING 2024

In medical and social media domains, annotated corpora are often hard to distribute due to copyrights and privacy issues. To overcome this situation, we propose a new method to generate a surrogate corpus for a downstream task by using a text generation model. We chose a medical multi-label classification task, MedWeb, in which patient-generated short messages express multiple symptoms. We first fine-tuned text generation models with different prompting designs on the original corpus to obtain synthetic versions of that corpus. To assess the viability of the generated corpora for the downstream task, we compared the performance of multi-label classification models trained either on the original or the surrogate corpora. The results and the error analysis showed the difficulty of generating surrogate corpus in multi-label settings, suggesting text generation under complex conditions is not trivial. On the other hand, our experiment demonstrates that the generated corpus with a sentinel-based prompting is comparatively viable in a single-label (multiclass) classification setting.

pdf bib
Loneliness Episodes: A Japanese Dataset for Loneliness Detection and Analysis
Naoya Fujikawa | Nguyen Toan | Kazuhiro Ito | Shoko Wakamiya | Eiji Aramaki
Proceedings of the 14th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis

Loneliness, a significant public health concern, is closely connected to both physical and mental well-being. Hence, detection and intervention for individuals experiencing loneliness are crucial. Identifying loneliness in text is straightforward when it is explicitly stated but challenging when it is implicit. Detecting implicit loneliness requires a manually annotated dataset because whereas explicit loneliness can be detected using keywords, implicit loneliness cannot be. However, there are no freely available datasets with clear annotation guidelines for implicit loneliness. In this study, we construct a freely accessible Japanese loneliness dataset with annotation guidelines grounded in the psychological definition of loneliness. This dataset covers loneliness intensity and the contributing factors of loneliness. We train two models to classify whether loneliness is expressed and the intensity of loneliness. The model classifying loneliness versus non-loneliness achieves an F1-score of 0.833, but the model for identifying the intensity of loneliness has a low F1-score of 0.400, which is likely due to label imbalance and a shortage of a certain label in the dataset. We validate performance in another domain, specifically X (formerly Twitter), and observe a decrease. In addition, we propose improvement suggestions for domain adaptation.

pdf bib
Estimation of Happiness Changes through Longitudinal Analysis of Employees’ Texts
Junko Hayashi | Kazuhiro Ito | Masae Manabe | Yasushi Watanabe | Masataka Nakayama | Yukiko Uchida | Shoko Wakamiya | Eiji Aramaki
Proceedings of the 14th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis

Measuring happiness as a determinant of well-being is increasingly recognized as crucial. While previous studies have utilized free-text descriptions to estimate happiness on a broad scale, limited research has focused on tracking individual fluctuations in happiness over time owing to the challenges associated with longitudinal data collection. This study addresses this issue by obtaining longitudinal data from two workplaces over two and six months respectively.Subsequently, the data is used to construct a happiness estimation model and assess individual happiness levels.Evaluation of the model performance using correlation coefficients shows variability in the correlation values among individuals.Notably, the model performs satisfactorily in estimating 9 of the 11 users’ happiness scores, with a correlation coefficient of 0.4 or higher. To investigate the factors affecting the model performance, we examine the relationship between the model performance and variables such as sentence length, lexical diversity, and personality traits. Correlations are observed between these features and model performance.

pdf bib
Semi-automatic Construction of a Word Complexity Lexicon for Japanese Medical Terminology
Soichiro Sugihara | Tomoyuki Kajiwara | Takashi Ninomiya | Shoko Wakamiya | Eiji Aramaki
Proceedings of the 6th Clinical Natural Language Processing Workshop

We construct a word complexity lexicon for medical terms in Japanese.To facilitate communication between medical practitioners and patients, medical text simplification is being studied.Medical text simplification is a natural language processing task that paraphrases complex technical terms into expressions that patients can understand.However, in contrast to English, where this task is being actively studied, there are insufficient language resources in Japanese.As a first step in advancing research on medical text simplification in Japanese, we annotate the 370,000 words from a large-scale medical terminology lexicon with a five-point scale of complexity for patients.

2022

pdf bib
PICO Corpus: A Publicly Available Corpus to Support Automatic Data Extraction from Biomedical Literature
Faith Mutinda | Kongmeng Liew | Shuntaro Yada | Shoko Wakamiya | Eiji Aramaki
Proceedings of the first Workshop on Information Extraction from Scientific Publications

We present a publicly available corpus with detailed annotations describing the core elements of clinical trials: Participants, Intervention, Control, and Outcomes. The corpus consists of 1011 abstracts of breast cancer randomized controlled trials extracted from the PubMed database. The corpus improves previous corpora by providing detailed annotations for outcomes to identify numeric texts that report the number of participants that experience specific outcomes. The corpus will be helpful for the development of systems for automatic extraction of data from randomized controlled trial literature to support evidence-based medicine. Additionally, we demonstrate the feasibility of the corpus by using two strong baselines for named entity recognition task. Most of the entities achieve F1 scores greater than 0.80 demonstrating the quality of the dataset.

pdf bib
Annotation-Scheme Reconstruction for “Fake News” and Japanese Fake News Dataset
Taichi Murayama | Shohei Hisada | Makoto Uehara | Shoko Wakamiya | Eiji Aramaki
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Fake news provokes many societal problems; therefore, there has been extensive research on fake news detection tasks to counter it. Many fake news datasets were constructed as resources to facilitate this task. Contemporary research focuses almost exclusively on the factuality aspect of the news. However, this aspect alone is insufficient to explain “fake news,” which is a complex phenomenon that involves a wide range of issues. To fully understand the nature of each instance of fake news, it is important to observe it from various perspectives, such as the intention of the false news disseminator, the harmfulness of the news to our society, and the target of the news. We propose a novel annotation scheme with fine-grained labeling based on detailed investigations of existing fake news datasets to capture these various aspects of fake news. Using the annotation scheme, we construct and publish the first Japanese fake news dataset. The annotation scheme is expected to provide an in-depth understanding of fake news. We plan to build datasets for both Japanese and other languages using our scheme. Our Japanese dataset is published at https://hkefka385.github.io/dataset/fakenews-japanese/.

pdf bib
Emotion Analysis of Writers and Readers of Japanese Tweets on Vaccinations
Patrick John Ramos | Kiki Ferawati | Kongmeng Liew | Eiji Aramaki | Shoko Wakamiya
Proceedings of the 12th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis

Public opinion in social media is increasingly becoming a critical factor in pandemic control. Understanding the emotions of a population towards vaccinations and COVID-19 may be valuable in convincing members to become vaccinated. We investigated the emotions of Japanese Twitter users towards Tweets related to COVID-19 vaccination. Using the WRIME dataset, which provides emotion ratings for Japanese Tweets sourced from writers (Tweet posters) and readers, we fine-tuned a BERT model to predict levels of emotional intensity. This model achieved a training accuracy of MSE = 0.356. A separate dataset of 20,254 Japanese Tweets containing COVID-19 vaccine-related keywords was also collected, on which the fine-tuned BERT was used to perform emotion analysis. Afterwards, a correlation analysis between the extracted emotions and a set of vaccination measures in Japan was conducted.The results revealed that surprise and fear were the most intense emotions predicted by the model for writers and readers, respectively, on the vaccine-related Tweet dataset. The correlation analysis also showed that vaccinations were weakly positively correlated with predicted levels of writer joy, writer/reader anticipation, and writer/reader trust.

2021

pdf bib
End-to-end Biomedical Entity Linking with Span-based Dictionary Matching
Shogo Ujiie | Hayate Iso | Shuntaro Yada | Shoko Wakamiya | Eiji Aramaki
Proceedings of the 20th Workshop on Biomedical Language Processing

Disease name recognition and normalization is a fundamental process in biomedical text mining. Recently, neural joint learning of both tasks has been proposed to utilize the mutual benefits. While this approach achieves high performance, disease concepts that do not appear in the training dataset cannot be accurately predicted. This study introduces a novel end-to-end approach that combines span representations with dictionary-matching features to address this problem. Our model handles unseen concepts by referring to a dictionary while maintaining the performance of neural network-based models. Experiments using two major datasaets demonstrate that our model achieved competitive results with strong baselines, especially for unseen concepts during training.

pdf bib
Mitigation of Diachronic Bias in Fake News Detection Dataset
Taichi Murayama | Shoko Wakamiya | Eiji Aramaki
Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021)

Fake news causes significant damage to society. To deal with these fake news, several studies on building detection models and arranging datasets have been conducted. Most of the fake news datasets depend on a specific time period. Consequently, the detection models trained on such a dataset have difficulty detecting novel fake news generated by political changes and social changes; they may possibly result in biased output from the input, including specific person names and organizational names. We refer to this problem as Diachronic Bias because it is caused by the creation date of news in each dataset. In this study, we confirm the bias, especially proper nouns including person names, from the deviation of phrase appearances in each dataset. Based on these findings, we propose masking methods using Wikidata to mitigate the influence of person names and validate whether they make fake news detection models robust through experiments with in-domain and out-of-domain data.

2020

pdf bib
Offensive Language Detection on Video Live Streaming Chat
Zhiwei Gao | Shuntaro Yada | Shoko Wakamiya | Eiji Aramaki
Proceedings of the 28th International Conference on Computational Linguistics

This paper presents a prototype of a chat room that detects offensive expressions in a video live streaming chat in real time. Focusing on Twitch, one of the most popular live streaming platforms, we created a dataset for the task of detecting offensive expressions. We collected 2,000 chat posts across four popular game titles with genre diversity (e.g., competitive, violent, peaceful). To make use of the similarity in offensive expressions among different social media platforms, we adopted state-of-the-art models trained on offensive expressions from Twitter for our Twitch data (i.e., transfer learning). We investigated two similarity measurements to predict the transferability, textual similarity, and game-genre similarity. Our results show that the transfer of features from social media to live streaming is effective. However, the two measurements show less correlation in the transferability prediction.

2018

pdf bib
J-MeDic: A Japanese Disease Name Dictionary based on Real Clinical Usage
Kaoru Ito | Hiroyuki Nagai | Taro Okahisa | Shoko Wakamiya | Tomohide Iwao | Eiji Aramaki
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib
Multivariate Linear Regression of Symptoms-related Tweets for Infectious Gastroenteritis Scale Estimation
Ryo Takeuchi | Hayate Iso | Kaoru Ito | Shoko Wakamiya | Eiji Aramaki
Proceedings of the International Workshop on Digital Disease Detection using Social Media 2017 (DDDSM-2017)

To date, various Twitter-based event detection systems have been proposed. Most of their targets, however, share common characteristics. They are seasonal or global events such as earthquakes and flu pandemics. In contrast, this study targets unseasonal and local disease events. Our system investigates the frequencies of disease-related words such as “nausea”,“chill”,and “diarrhea” and estimates the number of patients using regression of these word frequencies. Experiments conducted using Japanese 47 areas from January 2017 to April 2017 revealed that the detection of small and unseasonal event is extremely difficult (overall performance: 0.13). However, we found that the event scale and the detection performance show high correlation in the specified cases (in the phase of patient increasing or decreasing). The results also suggest that when 150 and more patients appear in a high population area, we can expect that our social sensors detect this outbreak. Based on these results, we can infer that social sensors can reliably detect unseasonal and local disease events under certain conditions, just as they can for seasonal or global events.

2016

pdf bib
Detecting Japanese Patients with Alzheimer’s Disease based on Word Category Frequencies
Daisaku Shibata | Shoko Wakamiya | Ayae Kinoshita | Eiji Aramaki
Proceedings of the Clinical Natural Language Processing Workshop (ClinicalNLP)

In recent years, detecting Alzheimer disease (AD) in early stages based on natural language processing (NLP) has drawn much attention. To date, vocabulary size, grammatical complexity, and fluency have been studied using NLP metrics. However, the content analysis of AD narratives is still unreachable for NLP. This study investigates features of the words that AD patients use in their spoken language. After recruiting 18 examinees of 53–90 years old (mean: 76.89), they were divided into two groups based on MMSE scores. The AD group comprised 9 examinees with scores of 21 or lower. The healthy control group comprised 9 examinees with a score of 22 or higher. Linguistic Inquiry and Word Count (LIWC) classified words were used to categorize the words that the examinees used. The word frequency was found from observation. Significant differences were confirmed for the usage of impersonal pronouns in the AD group. This result demonstrated the basic feasibility of the proposed NLP-based detection approach.

pdf bib
Forecasting Word Model: Twitter-based Influenza Surveillance and Prediction
Hayate Iso | Shoko Wakamiya | Eiji Aramaki
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Because of the increasing popularity of social media, much information has been shared on the internet, enabling social media users to understand various real world events. Particularly, social media-based infectious disease surveillance has attracted increasing attention. In this work, we specifically examine influenza: a common topic of communication on social media. The fundamental theory of this work is that several words, such as symptom words (fever, headache, etc.), appear in advance of flu epidemic occurrence. Consequently, past word occurrence can contribute to estimation of the number of current patients. To employ such forecasting words, one can first estimate the optimal time lag for each word based on their cross correlation. Then one can build a linear model consisting of word frequencies at different time points for nowcasting and for forecasting influenza epidemics. Experimentally obtained results (using 7.7 million tweets of August 2012 – January 2016), the proposed model achieved the best nowcasting performance to date (correlation ratio 0.93) and practically sufficient forecasting performance (correlation ratio 0.91 in 1-week future prediction, and correlation ratio 0.77 in 3-weeks future prediction). This report is the first of the relevant literature to describe a model enabling prediction of future epidemics using Twitter.