Word Embeddings, Cosine Similarity and Deep Learning for Identification of Professions & Occupations in Health-related Social Media

Sergio Santamaría Carrasco, Roberto Cuervo Rosillo


Abstract
ProfNER-ST focuses on the recognition of professions and occupations from Twitter using Spanish data. Our participation is based on a combination of word-level embeddings, including pre-trained Spanish BERT, as well as cosine similarity computed over a subset of entities that serve as input for an encoder-decoder architecture with attention mechanism. Finally, our best score achieved an F1-measure of 0.823 in the official test set.
Anthology ID:
2021.smm4h-1.12
Volume:
Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task
Month:
June
Year:
2021
Address:
Mexico City, Mexico
Editors:
Arjun Magge, Ari Klein, Antonio Miranda-Escalada, Mohammed Ali Al-garadi, Ilseyar Alimova, Zulfat Miftahutdinov, Eulalia Farre-Maduell, Salvador Lima Lopez, Ivan Flores, Karen O'Connor, Davy Weissenbacher, Elena Tutubalina, Abeed Sarker, Juan M Banda, Martin Krallinger, Graciela Gonzalez-Hernandez
Venue:
SMM4H
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
74–76
Language:
URL:
https://aclanthology.org/2021.smm4h-1.12
DOI:
10.18653/v1/2021.smm4h-1.12
Bibkey:
Cite (ACL):
Sergio Santamaría Carrasco and Roberto Cuervo Rosillo. 2021. Word Embeddings, Cosine Similarity and Deep Learning for Identification of Professions & Occupations in Health-related Social Media. In Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task, pages 74–76, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
Word Embeddings, Cosine Similarity and Deep Learning for Identification of Professions & Occupations in Health-related Social Media (Santamaría Carrasco & Cuervo Rosillo, SMM4H 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.smm4h-1.12.pdf