Sudhanshu Mishra
2020
Multilingual Joint Fine-tuning of Transformer models for identifying Trolling, Aggression and Cyberbullying at TRAC 2020
Sudhanshu Mishra
|
Shivangi Prasad
|
Shubhanshu Mishra
Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying
We present our team ‘3Idiots’ (referred as ‘sdhanshu’ in the official rankings) approach for the Trolling, Aggression and Cyberbullying (TRAC) 2020 shared tasks. Our approach relies on fine-tuning various Transformer models on the different datasets. We also investigated the utility of task label marginalization, joint label classification, and joint training on multilingual datasets as possible improvements to our models. Our team came second in English sub-task A, a close fourth in the English sub-task B and third in the remaining 4 sub-tasks. We find the multilingual joint training approach to be the best trade-off between computational efficiency of model deployment and model’s evaluation performance. We open source our approach at https://github.com/socialmediaie/TRAC2020.
Scubed at 3C task A - A simple baseline for citation context purpose classification
Shubhanshu Mishra
|
Sudhanshu Mishra
Proceedings of the 8th International Workshop on Mining Scientific Publications
We present our team Scubed’s approach in the ‘3C’ Citation Context Classification Task, Subtask A, citation context purpose classification. Our approach relies on text based features transformed via tf-idf features followed by training a variety of models which are capable of capturing non-linear features. Our best model on the leaderboard is a multi-layer perceptron which also performs best during our rerun. Our submission code for replicating experiments is at: https://github.com/napsternxg/Citation_Context_Classification.
Scubed at 3C task B - A simple baseline for citation context influence classification
Shubhanshu Mishra
|
Sudhanshu Mishra
Proceedings of the 8th International Workshop on Mining Scientific Publications
We present our team Scubed’s approach in the 3C Citation Context Classification Task, Subtask B, citation context influence classification. Our approach relies on text based features transformed via tf-idf features followed by training a variety of simple models resulting in a strong baseline. Our best model on the leaderboard is a random forest classifier using only the citation context text. A replication of our analysis finds logistic regression and gradient boosted tree classifier to be the best performing model. Our submission code can be found at: https://github.com/napsternxg/Citation_Context_Classification.