Silvio Amir


2021

pdf bib
On the Impact of Random Seeds on the Fairness of Clinical Classifiers
Silvio Amir | Jan-Willem van de Meent | Byron Wallace
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Recent work has shown that fine-tuning large networks is surprisingly sensitive to changes in random seed(s). We explore the implications of this phenomenon for model fairness across demographic groups in clinical prediction tasks over electronic health records (EHR) in MIMIC-III —— the standard dataset in clinical NLP research. Apparent subgroup performance varies substantially for seeds that yield similar overall performance, although there is no evidence of a trade-off between overall and subgroup performance. However, we also find that the small sample sizes inherent to looking at intersections of minority groups and somewhat rare conditions limit our ability to accurately estimate disparities. Further, we find that jointly optimizing for high overall performance and low disparities does not yield statistically significant improvements. Our results suggest that fairness work using MIMIC-III should carefully account for variations in apparent differences that may arise from stochasticity and small sample sizes.

2019

pdf bib
Mental Health Surveillance over Social Media with Digital Cohorts
Silvio Amir | Mark Dredze | John W. Ayers
Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology

The ability to track mental health conditions via social media opened the doors for large-scale, automated, mental health surveillance. However, inferring accurate population-level trends requires representative samples of the underlying population, which can be challenging given the biases inherent in social media data. While previous work has adjusted samples based on demographic estimates, the populations were selected based on specific outcomes, e.g. specific mental health conditions. We depart from these methods, by conducting analyses over demographically representative digital cohorts of social media users. To validated this approach, we constructed a cohort of US based Twitter users to measure the prevalence of depression and PTSD, and investigate how these illnesses manifest across demographic subpopulations. The analysis demonstrates that cohort-based studies can help control for sampling biases, contextualize outcomes, and provide deeper insights into the data.

2016

pdf bib
INESC-ID at SemEval-2016 Task 4-A: Reducing the Problem of Out-of-Embedding Words
Silvio Amir | Ramon F. Astudillo | Wang Ling | Mário J. Silva | Isabel Trancoso
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf bib
Modelling Context with User Embeddings for Sarcasm Detection in Social Media
Silvio Amir | Byron C. Wallace | Hao Lyu | Paula Carvalho | Mário J. Silva
Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning

2015

pdf bib
Not All Contexts Are Created Equal: Better Word Representations with Variable Attention
Wang Ling | Yulia Tsvetkov | Silvio Amir | Ramón Fermandez | Chris Dyer | Alan W Black | Isabel Trancoso | Chu-Cheng Lin
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation
Wang Ling | Chris Dyer | Alan W Black | Isabel Trancoso | Ramón Fermandez | Silvio Amir | Luís Marujo | Tiago Luís
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
INESC-ID: A Regression Model for Large Scale Twitter Sentiment Lexicon Induction
Silvio Amir | Ramon F. Astudillo | Wang Ling | Bruno Martins | Mario J. Silva | Isabel Trancoso
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf bib
INESC-ID: Sentiment Analysis without Hand-Coded Features or Linguistic Resources using Embedding Subspaces
Ramon F. Astudillo | Silvio Amir | Wang Ling | Bruno Martins | Mario J. Silva | Isabel Trancoso
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf bib
Learning Word Representations from Scarce and Noisy Data with Embedding Subspaces
Ramon F. Astudillo | Silvio Amir | Wang Ling | Mário Silva | Isabel Trancoso
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

2014

pdf bib
TUGAS: Exploiting unlabelled data for Twitter sentiment analysis
Silvio Amir | Miguel B. Almeida | Bruno Martins | João Filgueiras | Mário J. Silva
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)