2024
pdf
bib
abs
Overview of the 9th Social Media Mining for Health Applications (#SMM4H) Shared Tasks at ACL 2024 – Large Language Models and Generalizability for Social Media NLP
Dongfang Xu
|
Guillermo Garcia
|
Lisa Raithel
|
Philippe Thomas
|
Roland Roller
|
Eiji Aramaki
|
Shoko Wakamiya
|
Shuntaro Yada
|
Pierre Zweigenbaum
|
Karen O’Connor
|
Sai Samineni
|
Sophia Hernandez
|
Yao Ge
|
Swati Rajwal
|
Sudeshna Das
|
Abeed Sarker
|
Ari Klein
|
Ana Schmidt
|
Vishakha Sharma
|
Raul Rodriguez-Esteban
|
Juan Banda
|
Ivan Amaro
|
Davy Weissenbacher
|
Graciela Gonzalez-Hernandez
Proceedings of The 9th Social Media Mining for Health Research and Applications (SMM4H 2024) Workshop and Shared Tasks
For the past nine years, the Social Media Mining for Health Applications (#SMM4H) shared tasks have promoted community-driven development and evaluation of advanced natural language processing systems to detect, extract, and normalize health-related information in publicly available user-generated content. This year, #SMM4H included seven shared tasks in English, Japanese, German, French, and Spanish from Twitter, Reddit, and health forums. A total of 84 teams from 22 countries registered for #SMM4H, and 45 teams participated in at least one task. This represents a growth of 180% and 160% in registration and participation, respectively, compared to the last iteration. This paper provides an overview of the tasks and participating systems. The data sets remain available upon request, and new systems can be evaluated through the post-evaluation phase on CodaLab.
2022
pdf
bib
abs
Overview of the Seventh Social Media Mining for Health Applications (#SMM4H) Shared Tasks at COLING 2022
Davy Weissenbacher
|
Juan Banda
|
Vera Davydova
|
Darryl Estrada Zavala
|
Luis Gasco Sánchez
|
Yao Ge
|
Yuting Guo
|
Ari Klein
|
Martin Krallinger
|
Mathias Leddin
|
Arjun Magge
|
Raul Rodriguez-Esteban
|
Abeed Sarker
|
Lucia Schmidt
|
Elena Tutubalina
|
Graciela Gonzalez-Hernandez
Proceedings of The Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task
For the past seven years, the Social Media Mining for Health Applications (#SMM4H) shared tasks have promoted the community-driven development and evaluation of advanced natural language processing systems to detect, extract, and normalize health-related information in public, user-generated content. This seventh iteration consists of ten tasks that include English and Spanish posts on Twitter, Reddit, and WebMD. Interest in the #SMM4H shared tasks continues to grow, with 117 teams that registered and 54 teams that participated in at least one task—a 17.5% and 35% increase in registration and participation, respectively, over the last iteration. This paper provides an overview of the tasks and participants’ systems. The data sets remain available upon request, and new systems can be evaluated through the post-evaluation phase on CodaLab.
2017
pdf
bib
abs
One model per entity: using hundreds of machine learning models to recognize and normalize biomedical names in text
Victor Bellon
|
Raul Rodriguez-Esteban
Proceedings of the Biomedical NLP Workshop associated with RANLP 2017
We explored a new approach to named entity recognition based on hundreds of machine learning models, each trained to distinguish a single entity, and showed its application to gene name identification (GNI). The rationale for our approach, which we named “one model per entity” (OMPE), was that increasing the number of models would make the learning task easier for each individual model. Our training strategy leveraged freely-available database annotations instead of manually-annotated corpora. While its performance in our proof-of-concept was disappointing, we believe that there is enough room for improvement that such approaches could reach competitive performance while eliminating the cost of creating costly training corpora.
2016
pdf
bib
abs
Author Name Disambiguation in MEDLINE Based on Journal Descriptors and Semantic Types
Dina Vishnyakova
|
Raul Rodriguez-Esteban
|
Khan Ozol
|
Fabio Rinaldi
Proceedings of the Fifth Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM2016)
Author name disambiguation (AND) in publication and citation resources is a well-known problem. Often, information about email address and other details in the affiliation is missing. In cases where such information is not available, identifying the authorship of publications becomes very challenging. Consequently, there have been attempts to resolve such cases by utilizing external resources as references. However, such external resources are heterogeneous and are not always reliable regarding the correctness of information. To solve the AND task, especially when information about an author is not complete we suggest the use of new features such as journal descriptors (JD) and semantic types (ST). The evaluation of different feature models shows that their inclusion has an impact equivalent to that of other important features such as email address. Using such features we show that our system outperforms the state of the art.