This study examines the relationship between the linguistic characteristics of a test item and the complexity of the response process required to answer it correctly. Using data from a large-scale medical licensing exam, clustering methods identified items that were similar with respect to their relative difficulty and relative response-time intensiveness to create low response process complexity and high response process complexity item classes. Interpretable models were used to investigate the linguistic features that best differentiated between these classes from a descriptive and predictive framework. Results suggest that nuanced features such as the number of ambiguous medical terms help explain response process complexity beyond superficial item characteristics such as word count. Yet, although linguistic features carry signal relevant to response process complexity, the classification of individual items remains challenging.