Beyond Single Words: MWE Identification in Bioinformatics Research Articles and Dispersion Profiling Across IMRaD

Jurgi Giraud, Andrew Gargett


Abstract
Multiword Expressions (MWEs) are pervasive in scientific writing, and in specialized domains they include both multiword terminology (e.g., noun compounds) and recurrent academic phrasing. This study profiles MWEs in a large corpus of bioinformatics research articles segmented by IMRaD sections. Building on recent multi-method approaches to scientific MWE identification, we extract MWEs using complementary automated strategies (semantic matching, dependency parsing, controlled vocabularies, and academic formula lists) and compare the resulting inventories by size, form, and IMRaD section distribution. We further quantify cross-document dispersion using document frequency and Gries’ DP to distinguish widely reused expressions from items concentrated in a small subset of articles. Results show that bioinformatics MWEs are predominantly short and nominal, but that extraction methods differ in the extent to which they recover discourse and reporting phraseology. Dispersion is strongly long-tailed across sections with most MWEs being document-specific, while a smaller recurrent core aligns with section function and is enriched for conventional templates and standardized multiword terms. Overall, the findings argue for combining complementary identification methods with dispersion profiling to characterize domain "multiwordness" in a principled and section-sensitive way.
Anthology ID:
2026.mwe-1.10
Volume:
Proceedings of the 22nd Workshop on Multiword Expressions (MWE 2026)
Month:
March
Year:
2026
Address:
Rabat, Marocco
Editors:
Atul Kr. Ojha, Verginica Barbu Mititelu, Mathieu Constant, Ivelina Stoyanova, A. Seza Doğruöz, Alexandre Rademaker
Venues:
MWE | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
86–95
Language:
URL:
https://aclanthology.org/2026.mwe-1.10/
DOI:
Bibkey:
Cite (ACL):
Jurgi Giraud and Andrew Gargett. 2026. Beyond Single Words: MWE Identification in Bioinformatics Research Articles and Dispersion Profiling Across IMRaD. In Proceedings of the 22nd Workshop on Multiword Expressions (MWE 2026), pages 86–95, Rabat, Marocco. Association for Computational Linguistics.
Cite (Informal):
Beyond Single Words: MWE Identification in Bioinformatics Research Articles and Dispersion Profiling Across IMRaD (Giraud & Gargett, MWE 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.mwe-1.10.pdf