Identification of Drug-Related Medical Conditions in Social Media

François Morlane-Hondère, Cyril Grouin, Pierre Zweigenbaum


Abstract
Monitoring social media has been shown to be an interesting approach for the early detection of drug adverse effects. In this paper, we describe a system which extracts medical entities in French drug reviews written by users. We focus on the identification of medical conditions, which is based on the concept of post-coordination: we first extract minimal medical-related entities (pain, stomach) then we combine them to identify complex ones (It was the worst [pain I ever felt in my stomach]). These two steps are respectively performed by two classifiers, the first being based on Conditional Random Fields and the second one on Support Vector Machines. The overall results of the minimal entity classifier are the following: P=0.926; R=0.849; F1=0.886. A thourough analysis of the feature set shows that, when combined with word lemmas, clusters generated by word2vec are the most valuable features. When trained on the output of the first classifier, the second classifier’s performances are the following: p=0.683;r=0.956;f1=0.797. The addition of post-processing rules did not add any significant global improvement but was found to modify the precision/recall ratio.
Anthology ID:
L16-1320
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
2022–2028
Language:
URL:
https://aclanthology.org/L16-1320
DOI:
Bibkey:
Cite (ACL):
François Morlane-Hondère, Cyril Grouin, and Pierre Zweigenbaum. 2016. Identification of Drug-Related Medical Conditions in Social Media. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 2022–2028, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
Identification of Drug-Related Medical Conditions in Social Media (Morlane-Hondère et al., LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1320.pdf