Multilingual Protest News Detection - Shared Task 1, CASE 2021
Ali Hürriyetoğlu | Osman Mutlu | Erdem Yörük | Farhana Ferdousi Liza | Ritesh Kumar | Shyam Ratan
Proceedings of the 4th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2021)

Benchmarking state-of-the-art text classification and information extraction systems in multilingual, cross-lingual, few-shot, and zero-shot settings for socio-political event information collection is achieved in the scope of the shared task Socio-political and Crisis Events Detection at the workshop CASE @ ACL-IJCNLP 2021. Socio-political event data is utilized for national and international policy- and decision-making. Therefore, the reliability and validity of these datasets are of the utmost importance. We split the shared task into three parts to address the three aspects of data collection (Task 1), fine-grained semantic classification (Task 2), and evaluation (Task 3). Task 1, which is the focus of this report, is on multilingual protest news detection and comprises four subtasks that are document classification (subtask 1), sentence classification (subtask 2), event sentence coreference identification (subtask 3), and event extraction (subtask 4). All subtasks had English, Portuguese, and Spanish for both training and evaluation data. Data in Hindi language was available only for the evaluation of subtask 1. The majority of the submissions, which are 238 in total, are created using multi- and cross-lingual approaches. Best scores are above 77.27 F1-macro for subtask 1, above 85.32 F1-macro for subtask 2, above 84.23 CoNLL 2012 average score for subtask 3, and above 66.20 F1-macro for subtask 4 in all evaluation settings. The performance of the best system for subtask 4 is above 66.20 F1 for all available languages. Although there is still a significant room for improvement in cross-lingual and zero-shot settings, the best submissions for each evaluation scenario yield remarkable results. Monolingual models outperformed the multilingual models in a few evaluation scenarios.


Sentence Classification with Imbalanced Data for Health Applications
Farhana Ferdousi Liza
Proceedings of the Fifth Social Media Mining for Health Applications Workshop & Shared Task

Identifying and extracting reports of medications, their abuse or adverse effects from social media is a challenging task. In social media, relevant reports are very infrequent, causes imbalanced class distribution for machine learning algorithms. Learning algorithms typically designed to optimize the overall accuracy without considering the relative distribution of each class. Thus, imbalanced class distribution is problematic as learning algorithms have low predictive accuracy for the infrequent class. Moreover, social media represents natural linguistic variation in creative language expressions. In this paper, we have used a combination of data balancing and neural language representation techniques to address the challenges. Specifically, we participated the shared tasks 1, 2 (all languages), 4, and 3 (only the span detection, no normalization was attempted) in Social Media Mining for Health applications (SMM4H) 2020 (Klein et al., 2020). The results show that with the proposed methodology recall scores are better than the precision scores for the shared tasks. The recall score is also better compared to the mean score of the total submissions. However, the F1-score is worse than the mean score except for task 2 (French).


Relating RNN Layers with the Spectral WFA Ranks in Sequence Modelling
Farhana Ferdousi Liza | Marek Grzes
Proceedings of the Workshop on Deep Learning and Formal Languages: Building Bridges

We analyse Recurrent Neural Networks (RNNs) to understand the significance of multiple LSTM layers. We argue that the Weighted Finite-state Automata (WFA) trained using a spectral learning algorithm are helpful to analyse RNNs. Our results suggest that multiple LSTM layers in RNNs help learning distributed hidden states, but have a smaller impact on the ability to learn long-term dependencies. The analysis is based on the empirical results, however relevant theory (whenever possible) was discussed to justify and support our conclusions.

Room to Glo: A Systematic Comparison of Semantic Change Detection Approaches with Word Embeddings
Philippa Shoemark | Farhana Ferdousi Liza | Dong Nguyen | Scott Hale | Barbara McGillivray
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Word embeddings are increasingly used for the automatic detection of semantic change; yet, a robust evaluation and systematic comparison of the choices involved has been lacking. We propose a new evaluation framework for semantic change detection and find that (i) using the whole time series is preferable over only comparing between the first and last time points; (ii) independently trained and aligned embeddings perform better than continuously trained embeddings for long time periods; and (iii) that the reference point for comparison matters. We also present an analysis of the changes detected on a large Twitter dataset spanning 5.5 years.


An Improved Crowdsourcing Based Evaluation Technique for Word Embedding Methods
Farhana Ferdousi Liza | Marek Grześ
Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP