AbstractIn drug development, protocols define how clinical trials are conducted, and are therefore of paramount importance. They contain key patient-, investigator-, medication-, and study-related information, often elaborated in different sections in the protocol texts. Granular-level parsing on large quantity of existing protocols can accelerate clinical trial design and provide actionable insights into trial optimization. Here, we report our progresses in using deep learning NLP algorithms to enable automated protocol analytics. In particular, we combined a pre-trained BERT transformer model with joint-learning strategies to simultaneously identify clinically relevant entities (i.e. Named Entity Recognition) and extract the syntactic relations between these entities (i.e. Relation Extraction) from the eligibility criteria section in protocol texts. When comparing to standalone NER and RE models, our joint-learning strategy can effectively improve the performance of RE task while retaining similarly high NER performance, likely due to the synergy of optimizing toward both tasks’ objectives via shared parameters. The derived NLP model provides an end-to-end solution to convert unstructured protocol texts into structured data source, which will be embedded into a comprehensive clinical analytics workflow for downstream trial design missions such like patient population extraction, patient enrollment rate estimation, and protocol amendment prediction.