ReproHum #0033-05: Human Evaluation Report on "Generating Scientific Definitions with Controllable Complexity"

Ines Arous; Jackie Chi Kit Cheung

ReproHum #0033-05: Human Evaluation Report on "Generating Scientific Definitions with Controllable Complexity"

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use <b>...</b> for bold, <i>...</i> for italic, and <url>...</url> for URLs.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

Human evaluation remains a central component of assessing NLG systems, especially for open-ended or creative generation tasks. Yet, the field still lacks standardized practices for designing and reporting such evaluations. In this paper, we present a reproduction study of the human evaluation conducted by August et al. for their method of generating scientific definitions with controllable complexity. By closely replicating their experimental setup, we find that our results partially align with the original findings, suggesting a moderate level of reproducibility.

Anthology ID:: 2026.gem-main.89
Volume:: Proceedings of the Fifth Workshop on Generation, Evaluation and Metrics (GEM)
Month:: July
Year:: 2026
Address:: San Diego, California, USA
Editors:: Simon Mille, Sebastian Gehrmann, Patrícia Schmidtová, Ondřej Dušek, Marzieh Fadaee, Kyle Lo, Enrico Santus, Gabriel Stanovsky
Venues:: GEM | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1117–1126
Language:
URL:: https://aclanthology.org/2026.gem-main.89/
DOI:
Bibkey:
Cite (ACL):: Ines Arous and Jackie Chi Kit Cheung. 2026. ReproHum #0033-05: Human Evaluation Report on "Generating Scientific Definitions with Controllable Complexity". In Proceedings of the Fifth Workshop on Generation, Evaluation and Metrics (GEM), pages 1117–1126, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):: ReproHum #0033-05: Human Evaluation Report on “Generating Scientific Definitions with Controllable Complexity” (Arous & Cheung, GEM 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.gem-main.89.pdf

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{arous-cheung-2026-reprohum,
    title = "{R}epro{H}um {\#}0033-05: Human Evaluation Report on ``Generating Scientific Definitions with Controllable Complexity''",
    author = "Arous, Ines  and
      Cheung, Jackie Chi Kit",
    editor = "Mille, Simon  and
      Gehrmann, Sebastian  and
      Schmidtov{\'a}, Patr{\'i}cia  and
      Du{\v{s}}ek, Ond{\v{r}}ej  and
      Fadaee, Marzieh  and
      Lo, Kyle  and
      Santus, Enrico  and
      Stanovsky, Gabriel",
    booktitle = "Proceedings of the Fifth Workshop on Generation, Evaluation and Metrics ({GEM})",
    month = jul,
    year = "2026",
    address = "San Diego, California, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2026.gem-main.89/",
    pages = "1117--1126",
    ISBN = "979-8-89176-423-1",
    abstract = "Human evaluation remains a central component of assessing NLG systems, especially for open-ended or creative generation tasks. Yet, the field still lacks standardized practices for designing and reporting such evaluations. In this paper, we present a reproduction study of the human evaluation conducted by August et al. for their method of generating scientific definitions with controllable complexity. By closely replicating their experimental setup, we find that our results partially align with the original findings, suggesting a moderate level of reproducibility."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="arous-cheung-2026-reprohum">
    <titleInfo>
        <title>ReproHum #0033-05: Human Evaluation Report on “Generating Scientific Definitions with Controllable Complexity”</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Ines</namePart>
        <namePart type="family">Arous</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Jackie</namePart>
        <namePart type="given">Chi</namePart>
        <namePart type="given">Kit</namePart>
        <namePart type="family">Cheung</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2026-07</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the Fifth Workshop on Generation, Evaluation and Metrics (GEM)</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Simon</namePart>
            <namePart type="family">Mille</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Sebastian</namePart>
            <namePart type="family">Gehrmann</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Patrícia</namePart>
            <namePart type="family">Schmidtová</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Ondřej</namePart>
            <namePart type="family">Dušek</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Marzieh</namePart>
            <namePart type="family">Fadaee</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Kyle</namePart>
            <namePart type="family">Lo</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Enrico</namePart>
            <namePart type="family">Santus</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Gabriel</namePart>
            <namePart type="family">Stanovsky</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>Association for Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">San Diego, California, USA</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
        <identifier type="isbn">979-8-89176-423-1</identifier>
    </relatedItem>
    <abstract>Human evaluation remains a central component of assessing NLG systems, especially for open-ended or creative generation tasks. Yet, the field still lacks standardized practices for designing and reporting such evaluations. In this paper, we present a reproduction study of the human evaluation conducted by August et al. for their method of generating scientific definitions with controllable complexity. By closely replicating their experimental setup, we find that our results partially align with the original findings, suggesting a moderate level of reproducibility.</abstract>
    <identifier type="citekey">arous-cheung-2026-reprohum</identifier>
    <location>
        <url>https://aclanthology.org/2026.gem-main.89/</url>
    </location>
    <part>
        <date>2026-07</date>
        <extent unit="page">
            <start>1117</start>
            <end>1126</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T ReproHum #0033-05: Human Evaluation Report on “Generating Scientific Definitions with Controllable Complexity”
%A Arous, Ines
%A Cheung, Jackie Chi Kit
%Y Mille, Simon
%Y Gehrmann, Sebastian
%Y Schmidtová, Patrícia
%Y Dušek, Ondřej
%Y Fadaee, Marzieh
%Y Lo, Kyle
%Y Santus, Enrico
%Y Stanovsky, Gabriel
%S Proceedings of the Fifth Workshop on Generation, Evaluation and Metrics (GEM)
%D 2026
%8 July
%I Association for Computational Linguistics
%C San Diego, California, USA
%@ 979-8-89176-423-1
%F arous-cheung-2026-reprohum
%X Human evaluation remains a central component of assessing NLG systems, especially for open-ended or creative generation tasks. Yet, the field still lacks standardized practices for designing and reporting such evaluations. In this paper, we present a reproduction study of the human evaluation conducted by August et al. for their method of generating scientific definitions with controllable complexity. By closely replicating their experimental setup, we find that our results partially align with the original findings, suggesting a moderate level of reproducibility.
%U https://aclanthology.org/2026.gem-main.89/
%P 1117-1126

Download as File

Markdown (Informal)

[ReproHum #0033-05: Human Evaluation Report on "Generating Scientific Definitions with Controllable Complexity"](https://aclanthology.org/2026.gem-main.89/) (Arous & Cheung, GEM 2026)

ReproHum #0033-05: Human Evaluation Report on “Generating Scientific Definitions with Controllable Complexity” (Arous & Cheung, GEM 2026)

ACL

Ines Arous and Jackie Chi Kit Cheung. 2026. ReproHum #0033-05: Human Evaluation Report on "Generating Scientific Definitions with Controllable Complexity". In Proceedings of the Fifth Workshop on Generation, Evaluation and Metrics (GEM), pages 1117–1126, San Diego, California, USA. Association for Computational Linguistics.