Studying Expert-ese: Profiling and Classification of Domain-Specific Language Variation in Architecture with Traditional Machine Learning and LLMs

Carmen Schacht, Renate Delucchi Danhier


Abstract
This study investigates how domain expertise shapes spontaneous oral language production, with a focus on architecture. Building on the ExpLay Corpus, which contains image descriptions by speakers with and without architectural training, we analyze linguistic variation by combining Profiling-UD and the DECAF framework. We extract a broad range of syntactic and morpho-syntactic features to build linguistic profiles for both groups and train classifiers to distinguish expert from non-expert productions. Two traditional machine learning models (logistic regression and SVM) are compared with a lightweight BiLSTM and two large language models (GliClass and LLaMA 2). While the expert and non-expert corpora diverge only subtly (pairwise Jensen–Shannon divergence (JSD)= 0.25), the BiLSTM using fastText embeddings achieves the highest F1-score (0.88), outperforming both traditional models and LLMs. This indicates that semantic representations are more predictive of domain variation than purely structural features and that smaller neural architectures generalize better on limited data. Overall, the findings provide empirical evidence that architectural expertise leaves measurable linguistic traces in spontaneous speech, supporting the Grammar of Space hypothesis.
Anthology ID:
2026.latechclfl-1.3
Volume:
Proceedings of the 10th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature 2026
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Diego Alves, Yuri Bizzoni, Stefania Degaetano-Ortlieb, Anna Kazantseva, Janis Pagel, Stan Szpakowicz
Venues:
LaTeCH-CLfL | WS
SIG:
SIGHUM
Publisher:
Association for Computational Linguistics
Note:
Pages:
16–29
Language:
URL:
https://aclanthology.org/2026.latechclfl-1.3/
DOI:
Bibkey:
Cite (ACL):
Carmen Schacht and Renate Delucchi Danhier. 2026. Studying Expert-ese: Profiling and Classification of Domain-Specific Language Variation in Architecture with Traditional Machine Learning and LLMs. In Proceedings of the 10th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature 2026, pages 16–29, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
Studying Expert-ese: Profiling and Classification of Domain-Specific Language Variation in Architecture with Traditional Machine Learning and LLMs (Schacht & Delucchi Danhier, LaTeCH-CLfL 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.latechclfl-1.3.pdf
Supplementarymaterial:
 2026.latechclfl-1.3.SupplementaryMaterial.txt