Cross-Domain Semantic Fidelity Evaluation for Meaning-to-Text Generation

Davan Harrison; Marilyn Walker

Cross-Domain Semantic Fidelity Evaluation for Meaning-to-Text Generation

Abstract

Slot Error Rate (SER) is the standard metric for evaluating semantic accuracy in meaning-to-text generation, but computing it has historically required domain-specific scripts that do not generalize across datasets. We present a cross-domain SER evaluation framework that replaces hand-crafted rules with a learned slot extraction model. We adapt Llama-3.2-3B-Instruct with LoRA, updating only 0.34% of its parameters, and show that this small adapted model outperforms prompted frontier LLMs by a wide margin on structured extraction across 23 dialogue domains. We further apply overgenerate-and-rank to the extraction task itself, generating multiple candidate meaning representations and selecting the best one with a trained ranker, which improves SER-Accuracy from 75% to 88%. We combine the extraction model with a Natural Language Inference (NLI) verification baseline through learned per-example routing, achieving 90.0% accuracy on held-out evaluation pairs without any domain-specific rule engineering. We compare our framework against published rule-based SER tools and show that our learned approach matches or outperforms hand-crafted scripts on all six comparable domains.

Anthology ID:: 2026.gem-main.41
Volume:: Proceedings of the Fifth Workshop on Generation, Evaluation and Metrics (GEM)
Month:: July
Year:: 2026
Address:: San Diego, California, USA
Editors:: Simon Mille, Sebastian Gehrmann, Patrícia Schmidtová, Ondřej Dušek, Marzieh Fadaee, Kyle Lo, Enrico Santus, Gabriel Stanovsky
Venues:: GEM | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 443–455
Language:
URL:: https://aclanthology.org/2026.gem-main.41/
DOI:
Bibkey:
Cite (ACL):: Davan Harrison and Marilyn Walker. 2026. Cross-Domain Semantic Fidelity Evaluation for Meaning-to-Text Generation. In Proceedings of the Fifth Workshop on Generation, Evaluation and Metrics (GEM), pages 443–455, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):: Cross-Domain Semantic Fidelity Evaluation for Meaning-to-Text Generation (Harrison & Walker, GEM 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.gem-main.41.pdf

PDF Cite Search Fix data