John Mariano


2022

pdf bib
Towards Classification of Legal Pharmaceutical Text using GAN-BERT
Tapan Auti | Rajdeep Sarkar | Bernardo Stearns | Atul Kr. Ojha | Arindam Paul | Michaela Comerford | Jay Megaro | John Mariano | Vall Herard | John P. McCrae
Proceedings of the First Computing Social Responsibility Workshop within the 13th Language Resources and Evaluation Conference

Pharmaceutical text classification is an important area of research for commercial and research institutions working in the pharmaceutical domain. Addressing this task is challenging due to the need of expert verified labelled data which can be expensive and time consuming to obtain. Towards this end, we leverage predictive coding methods for the task as they have been shown to generalise well for sentence classification. Specifically, we utilise GAN-BERT architecture to classify pharmaceutical texts. To capture the domain specificity, we propose to utilise the BioBERT model as our BERT model in the GAN-BERT framework. We conduct extensive evaluation to show the efficacy of our approach over baselines on multiple metrics.

2021

pdf bib
Few-shot and Zero-shot Approaches to Legal Text Classification: A Case Study in the Financial Sector
Rajdeep Sarkar | Atul Kr. Ojha | Jay Megaro | John Mariano | Vall Herard | John P. McCrae
Proceedings of the Natural Legal Language Processing Workshop 2021

The application of predictive coding techniques to legal texts has the potential to greatly reduce the cost of legal review of documents, however, there is such a wide array of legal tasks and continuously evolving legislation that it is hard to construct sufficient training data to cover all cases. In this paper, we investigate few-shot and zero-shot approaches that require substantially less training data and introduce a triplet architecture, which for promissory statements produces performance close to that of a supervised system. This method allows predictive coding methods to be rapidly developed for new regulations and markets.