Proceedings of the International Workshop on Digital Disease Detection using Social Media 2017 (DDDSM-2017)

Proceedings of the International Workshop on Digital Disease Detection using Social Media 2017 (DDDSM-2017) Jitendra Jonnagaddala Hong-Jie Dai Yung-Chun Chang November 2017

Taipei, Taiwan

Association for Computational Linguistics http://www.aclweb.org/anthology/W17-58 book DDDSM:2017 Automatic detection of stance towards vaccination in online discussion forums MariaSkeppstedt AndreasKerren ManfredStede Proceedings of the International Workshop on Digital Disease Detection using Social Media 2017 (DDDSM-2017) November 2017

Taipei, Taiwan

Association for Computational Linguistics 1–8 http://www.aclweb.org/anthology/W17-5801 A classifier for automatic detection of stance towards vaccination in online forums was trained and evaluated. Debate posts from six discussion threads on the British parental website Mumsnet were manually annotated for stance 'against' or 'for' vaccination, or as 'undecided'. A support vector machine, trained to detect the three classes, achieved a macro F-score of 0.44, while a macro F-score of 0.62 was obtained by the same type of classifier on the binary classification task of distinguishing stance 'against' vaccination from stance 'for' vaccination. These results show that vaccine stance detection in online forums is a difficult task, at least for the type of model investigated and for the relatively small training corpus that was used. Future work will therefore include an expansion of the training data and an evaluation of other types of classifiers and features. inproceedings skeppstedt-kerren-stede:2017:DDDSM Analysing the Causes of Depressed Mood from Depression Vulnerable Individuals Noor FazillaAbd Yusof ChenghuaLin FrankGuerin Proceedings of the International Workshop on Digital Disease Detection using Social Media 2017 (DDDSM-2017) November 2017

Taipei, Taiwan

Association for Computational Linguistics 9–17 http://www.aclweb.org/anthology/W17-5802 We develop a computational model to discover the potential causes of depression by analysing the topics in a usergenerated text. We show the most prominent causes, and how these causes evolve over time. Also, we highlight the differences in causes between students with low and high neuroticism. Our studies demonstrate that the topics reveal valuable clues about the causes contributing to depressed mood. Identifying causes can have a significant impact on improving the quality of depression care; thereby providing greater insights into a patient’s state for pertinent treatment recommendations. Hence, this study significantly expands the ability to discover the potential factors that trigger depression, making it possible to increase the efficiency of depression treatment. inproceedings abdyusof-lin-guerin:2017:DDDSM Multivariate Linear Regression of Symptoms-related Tweets for Infectious Gastroenteritis Scale Estimation RyoTakeuchi HayateISO KaoruIto ShokoWakamiya EijiAramaki Proceedings of the International Workshop on Digital Disease Detection using Social Media 2017 (DDDSM-2017) November 2017

Taipei, Taiwan

Association for Computational Linguistics 18–25 http://www.aclweb.org/anthology/W17-5803 To date, various Twitter-based event detection systems have been proposed. Most of their targets, however, share common characteristics. They are seasonal or global events such as earthquakes and flu pandemics. In contrast, this study targets unseasonal and local disease events. Our system investigates the frequencies of disease-related words such as "nausea","chill",and "diarrhea" and estimates the number of patients using regression of these word frequencies. Experiments conducted using Japanese 47 areas from January 2017 to April 2017 revealed that the detection of small and unseasonal event is extremely difficult (overall performance: 0.13). However, we found that the event scale and the detection performance show high correlation in the specified cases (in the phase of patient increasing or decreasing). The results also suggest that when 150 and more patients appear in a high population area, we can expect that our social sensors detect this outbreak. Based on these results, we can infer that social sensors can reliably detect unseasonal and local disease events under certain conditions, just as they can for seasonal or global events. inproceedings takeuchi-EtAl:2017:DDDSM Incorporating Dependency Trees Improve Identification of Pregnant Women on Social Media Platforms Yi-JieHuang Chu HsienSu Yi-ChunChang Tseng-HsinTing Tzu-YuanFu Rou-MinWang Hong-JieDai Yung-ChunChang JitendraJonnagaddala Wen-LianHsu Proceedings of the International Workshop on Digital Disease Detection using Social Media 2017 (DDDSM-2017) November 2017

Taipei, Taiwan

Association for Computational Linguistics 26–32 http://www.aclweb.org/anthology/W17-5804 The increasing popularity of social media lead users to share enormous information on the internet. This information has various application like, it can be used to develop models to understand or predict user behavior on social media platforms. For example, few online retailers have studied the shopping patterns to predict shopper’s pregnancy stage. Another interesting application is to use the social media platforms to analyze users’ health-related information. In this study, we developed a tree kernel-based model to classify tweets conveying pregnancy related information using this corpus. The developed pregnancy classification model achieved an accuracy of 0.847 and an F-score of 0.565. A new corpus from popular social media platform Twitter was developed for the purpose of this study. In future, we would like to improve this corpus by reducing noise such as retweets. inproceedings huang-EtAl:2017:DDDSM Using a Recurrent Neural Network Model for Classification of Tweets Conveyed Influenza-related Information Chen-KaiWang OnkarSingh Zhao-LiTang Hong-JieDai Proceedings of the International Workshop on Digital Disease Detection using Social Media 2017 (DDDSM-2017) November 2017

Taipei, Taiwan

Association for Computational Linguistics 33–38 http://www.aclweb.org/anthology/W17-5805 Traditional disease surveillance systems depend on outpatient reporting and virological test results released by hospitals. These data have valid and accurate information about emerging outbreaks but it’s often not timely. In recent years the exponential growth of users getting connected to social media provides immense knowledge about epidemics by sharing related information. Social media can now flag more immediate concerns related to out-breaks in real time. In this paper we apply the long short-term memory recurrent neural net-work (RNN) architecture to classify tweets conveyed influenza-related information and compare its performance with baseline algorithms including support vector machine (SVM), decision tree, naive Bayes, simple logistics, and naive Bayes multinomial. The developed RNN model achieved an F-score of 0.845 on the MedWeb task test set, which outperforms the F-score of SVM without applying the synthetic minority oversampling technique by 0.08. The F-score of the RNN model is within 1% of the highest score achieved by SVM with oversampling technique. inproceedings wang-EtAl:2017:DDDSM ZikaHack 2016: A digital disease detection competition Dillon CAdam JitendraJonnagaddala DanielHan-Chen SeanBatongbacal LuanAlmeida Jing ZZhu Jenny JYang Jumail MMundekkat StevenBadman AbrarChughtai C RainaMacIntyre Proceedings of the International Workshop on Digital Disease Detection using Social Media 2017 (DDDSM-2017) November 2017

Taipei, Taiwan

Association for Computational Linguistics 39–46 http://www.aclweb.org/anthology/W17-5806 Effective response to infectious diseases outbreaks relies on the rapid and early detection of those outbreaks. Invalidated, yet timely and openly available digital information can be used for the early detection of outbreaks. Public health surveillance authorities can exploit these early warnings to plan and co-ordinate rapid surveillance and emergency response programs. In 2016, a digital disease detection competition named ZikaHack was launched. The objective of the competition was for multidisciplinary teams to design, develop and demonstrate innovative digital disease detection solutions to retrospectively detect the 2015- 16 Brazilian Zika virus outbreak earlier than traditional surveillance methods. In this paper, an overview of the ZikaHack competition is provided. The challenges and lessons learned in organizing this competition are also discussed for use by other researchers interested in organizing similar competitions. inproceedings adam-EtAl:2017:DDDSM A Method to Generate a Machine-Labeled Data for Biomedical Named Entity Recognition with Various Sub-Domains JuaeKim SunjaeKwon YoungjoongKo JungyunSeo Proceedings of the International Workshop on Digital Disease Detection using Social Media 2017 (DDDSM-2017) November 2017

Taipei, Taiwan

Association for Computational Linguistics 47–51 http://www.aclweb.org/anthology/W17-5807 Biomedical Named Entity (NE) recognition is a core technique for various works in the biomedical domain. In previous studies, using machine learning algorithm shows better performance than dictionary-based and rule-based approaches because there are too many terminological variations of biomedical NEs and new biomedical NEs are constantly generated. To achieve the high performance with a machine-learning algorithm, good-quality corpora are required. However, it is difficult to obtain the good-quality corpora because an-notating a biomedical corpus for ma-chine-learning is extremely time-consuming and costly. In addition, most previous corpora are insufficient for high-level tasks because they cannot cover various domains. Therefore, we propose a method for generating a large amount of machine-labeled data that covers various domains. To generate a large amount of machine-labeled data, firstly we generate an initial machine-labeled data by using a chunker and MetaMap. The chunker is developed to extract only biomedical NEs with manually annotated data. MetaMap is used to annotate the category of bio-medical NE. Then we apply the self-training approach to bootstrap the performance of initial machine-labeled data. In our experiments, the biomedical NE recognition system that is trained with our proposed machine-labeled data achieves much high performance. As a result, our system outperforms biomedical NE recognition system that using MetaMap only with 26.03%p improvements on F1-score. inproceedings kim-EtAl:2017:DDDSM Enhancing Drug-Drug Interaction Classification with Corpus-level Feature and Classifier Ensemble Jing CyunTu Po-TingLai Richard Tzong-HanTsai Proceedings of the International Workshop on Digital Disease Detection using Social Media 2017 (DDDSM-2017) November 2017

Taipei, Taiwan

Association for Computational Linguistics 52–56 http://www.aclweb.org/anthology/W17-5808 The study of drug-drug interaction (DDI) is important in the drug discovering. Both PubMed and DrugBank are rich resources to retrieve DDI information which is usually represented in plain text. Automatically extracting DDI pairs from text improves the quality of drug discov-ering. In this paper, we presented a study that focuses on the DDI classification. We normalized the drug names, and developed both sentence-level and corpus-level features for DDI classification. A classifier ensemble approach is used for the unbalance DDI labels problem. Our approach achieved an F-score of 65.4% on SemEval 2013 DDI test set. The experimental results also show the effects of proposed corpus-level features in the DDI task. inproceedings tu-lai-tsai:2017:DDDSM Chemical-Induced Disease Detection Using Invariance-based Pattern Learning Model NehaWarikoo Yung-ChunChang Wen-LianHsu Proceedings of the International Workshop on Digital Disease Detection using Social Media 2017 (DDDSM-2017) November 2017

Taipei, Taiwan

Association for Computational Linguistics 57–64 http://www.aclweb.org/anthology/W17-5809 In this work, we introduce a novel feature engineering approach named "algebraic invariance" to identify discriminative patterns for learning relation pair features for the chemical-disease relation (CDR) task of BioCreative V. Our method exploits the existing structural similarity of the key concepts of relation descriptions from the CDR corpus to generate robust linguistic patterns for SVM tree kernel-based learning. Preprocessing of the training data classifies the entity pairs as either related or unrelated to build instance types for both inter-sentential and intra-sentential scenarios. An invariant function is proposed to process and optimally cluster similar patterns for both positive and negative instances. The learning model for CDR pairs is based on the SVM tree kernel approach, which generates feature trees and vectors and is modeled on suit- able invariance based patterns, bringing brevity, precision and context to the identifier features. Results demonstrate that our method outperformed other compared approaches, achieved a high recall rate of 85.08%, and averaged an F1- score of 54.34% without the use of any additional knowledge bases. inproceedings warikoo-chang-hsu:2017:DDDSM