Mandhara Jayaram


2020

pdf bib
A Representation Learning Approach to Animal Biodiversity Conservation
Meet Mukadam | Mandhara Jayaram | Yongfeng Zhang
Proceedings of the 28th International Conference on Computational Linguistics

Generating knowledge from natural language data has aided in solving many artificial intelligence problems. Vector representations of words have been the driving force behind the majority of natural language processing tasks. This paper develops a novel approach for predicting the conservation status of animal species using custom generated scientific name embeddings. We use two different vector embeddings generated using representation learning on Wikipedia text and animal taxonomy data. We generate name embeddings for all species in the animal kingdom using unsupervised learning and build a model on the IUCN Red List dataset to classify species into endangered or least-concern. To our knowledge, this is the first work that makes use of learnt features instead of handcrafted features for this task and achieves competitive results. Based on the high confidence results of our model, we also predict the conservation status of data deficient species whose conservation status is still unknown and thus steering more focus towards them for protection. These embeddings have also been made publicly available here. We believe this will greatly help in solving various downstream tasks and further advance research in the cross-domain involving natural language processing, conservation biology, and life sciences.