Parsa Bagherzadeh


2022

pdf bib
Integration of Heterogeneous Knowledge Sources for Biomedical Text Processing
Parsa Bagherzadeh | Sabine Bergler
Proceedings of the 13th International Workshop on Health Text Mining and Information Analysis (LOUHI)

Recently, research into bringing outside knowledge sources into current neural NLP models has been increasing. Most approaches that leverage external knowledge sources require laborious and non-trivial designs, as well as tailoring the system through intensive ablation of different knowledge sources, an effort that discourages users to use quality ontological resources. In this paper, we show that multiple large heterogeneous KSs can be easily integrated using a decoupled approach, allowing for an automatic ablation of irrelevant KSs, while keeping the overall parameter space tractable. We experiment with BERT and pre-trained graph embeddings, and show that they interoperate well without performance degradation, even when some do not contribute to the task.

pdf bib
CLaCLab at SocialDisNER: Using Medical Gazetteers for Named-Entity Recognition of Disease Mentions in Spanish Tweets
Harsh Verma | Parsa Bagherzadeh | Sabine Bergler
Proceedings of The Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task

This paper summarizes the CLaC submission for SMM4H 2022 Task 10 which concerns the recognition of diseases mentioned in Spanish tweets. Before classifying each token, we encode each token with a transformer encoder using features from Multilingual RoBERTa Large, UMLS gazetteer, and DISTEMIST gazetteer, among others. We obtain a strict F1 score of 0.869, with competition mean of 0.675, standard deviation of 0.245, and median of 0.761.

2021

pdf bib
CLaC-BP at SemEval-2021 Task 8: SciBERT Plus Rules for MeasEval
Benjamin Therien | Parsa Bagherzadeh | Sabine Bergler
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

This paper explains the design of a heterogeneous system that ranked eighth in competition in SemEval2021 Task 8. We analyze ablation experiments and demonstrate how the system components, namely tokenizer, unit identifier, modifier classifier, and language model, affect the overall score. We compare our results to similar experiments from the literature and introduce a grouping algorithm developed in the post-evaluation phase that increased our system’s overall score, hypothetically elevating our competition rank from eight to six.

pdf bib
Multi-input Recurrent Independent Mechanisms for leveraging knowledge sources: Case studies on sentiment analysis and health text mining
Parsa Bagherzadeh | Sabine Bergler
Proceedings of Deep Learning Inside Out (DeeLIO): The 2nd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures

This paper presents a way to inject and leverage existing knowledge from external sources in a Deep Learning environment, extending the recently proposed Recurrent Independent Mechnisms (RIMs) architecture, which comprises a set of interacting yet independent modules. We show that this extension of the RIMs architecture is an effective framework with lower parameter implications compared to purely fine-tuned systems.

pdf bib
Interacting Knowledge Sources, Inspection and Analysis: Case-studies on Biomedical text processing
Parsa Bagherzadeh | Sabine Bergler
Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP

In this paper we investigate the recently proposed multi-input RIM for inspectability. This framework follows an encapsulation paradigm, where external knowledge sources are encoded as largely independent modules, enabling transparency for model inspection.

pdf bib
Leveraging knowledge sources for detecting self-reports of particular health issues on social media
Parsa Bagherzadeh | Sabine Bergler
Proceedings of the 12th International Workshop on Health Text Mining and Information Analysis

This paper investigates incorporating quality knowledge sources developed by experts for the medical domain as well as syntactic information for classification of tweets into four different health oriented categories. We claim that resources such as the MeSH hierarchy and currently available parse information are effective extensions of moderately sized training datasets for various fine-grained tweet classification tasks of self-reported health issues.

pdf bib
Competing Independent Modules for Knowledge Integration and Optimization
Parsa Bagherzadeh | Sabine Bergler
Findings of the Association for Computational Linguistics: EMNLP 2021

This paper presents a neural framework of untied independent modules, used here for integrating off the shelf knowledge sources such as language models, lexica, POS information, and dependency relations. Each knowledge source is implemented as an independent component that can interact and share information with other knowledge sources. We report proof of concept experiments for several standard sentiment analysis tasks and show that the knowledge sources interoperate effectively without interference. As a second use-case, we show that the proposed framework is suitable for optimizing BERT-like language models even without the help of external knowledge sources. We cast each Transformer layer as a separate module and demonstrate performance improvements from this explicit integration of the different information encoded at the different Transformer layers .

2020

pdf bib
CLaC at SemEval-2020 Task 5: Muli-task Stacked Bi-LSTMs
MinGyou Sung | Parsa Bagherzadeh | Sabine Bergler
Proceedings of the Fourteenth Workshop on Semantic Evaluation

We consider detection of the span of antecedents and consequents in argumentative prose a structural, grammatical task. Our system comprises a set of stacked Bi-LSTMs trained on two complementary linguistic annotations. We explore the effectiveness of grammatical features (POS and clause type) through ablation. The reported experiments suggest that a multi-task learning approach using this external, grammatical knowledge is useful for detecting the extent of antecedents and consequents and performs nearly as well without the use of word embeddings.

pdf bib
CLaC at SMM4H 2020: Birth Defect Mention Detection
Parsa Bagherzadeh | Sabine Bergler
Proceedings of the Fifth Social Media Mining for Health Applications Workshop & Shared Task

For the detection of personal tweets, where a parent speaks of a child’s birth defect, CLaC combines ELMo word embeddings and gazetteer lists from external resources with a GCNN (for encoding dependencies), in a multi layer, transformer inspired architecture. To address the task, we compile several gazetteer lists from resources such as MeSH and GI. The proposed system obtains .69 for μF1 score in the SMM4H 2020 Task 5 where the competition average is .65.

2019

pdf bib
Adverse Drug Effect and Personalized Health Mentions, CLaC at SMM4H 2019, Tasks 1 and 4
Parsa Bagherzadeh | Nadia Sheikh | Sabine Bergler
Proceedings of the Fourth Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task

CLaC labs participated in Task 1 and 4 of SMM4H 2019. We pursed two main objectives in our submission. First we tried to use some textual features in a deep net framework, and second, the potential use of more than one word embedding was tested. The results seem positively affected by the proposed architectures.

2018

pdf bib
CLaC at SMM4H Task 1, 2, and 4
Parsa Bagherzadeh | Nadia Sheikh | Sabine Bergler
Proceedings of the 2018 EMNLP Workshop SMM4H: The 3rd Social Media Mining for Health Applications Workshop & Shared Task

CLaC Labs participated in Tasks 1, 2, and 4 using the same base architecture for all tasks with various parameter variations. This was our first exploration of this data and the SMM4H Tasks, thus a unified system was useful to compare the behavior of our architecture over the different datasets and how they interact with different linguistic features.