Yvette Graham

2025

TransLaTeX: Exposing the Last-Mile Execution Gap in LLM-Agent for Scientific Formatting
Jiawen Lyn | Yvette Graham
Proceedings of The First Workshop on Human–LLM Collaboration for Ethical and Responsible Science Production (SciProdLLM)

Large Language Models (LLMs) have achieved remarkable progress in tasks such as survey writing and language polishing, yet the final stage of LaTeX formatting and template adaptation remains a neglected and error-prone bottleneck.We identify an execution illusion, where LLMs produce linguistically fluent but unexecutable LaTeX code.To address this, we introduce TransLaTeX—the first reasoning-and-control framework that converts documents between scholarly templates with compiler-level verifiability.TransLaTeX achieves three key innovations:(1) Structure–content separation via placeholder masking, ensuring privacy and less token consumption;(2) SafeFormatBench, the first benchmark dedicated to executable LaTeX generation and template conversion; and(3) Execution-grounded verification across compilation, policy compliance, and visual consistency.TransLaTeX outperforms Pandoc and full-text LLM baselines on SafeFormatBench in compilation rate, ACL policy compliance, and layout fidelity, effectively mitigating the execution illusion.

pdf bib abs

Audio-Based Crowd-Sourced Evaluation of Machine Translation Quality
Sami Haq | Sheila Castilho | Yvette Graham
Proceedings of the Tenth Conference on Machine Translation

Machine Translation (MT) has achieved remarkable performance, with growing interest in speech translation and multimodal approaches. However, despite these advancements, MT quality assessment remains largely text-centric, typically relying on human experts who read and compare texts. Since many real-world MT applications (e.g., Google Translate Voice Mode, iFLYTEK Translator) involve translation being spoken rather printed or read, a more natural way to assess translation quality would be through speech as opposed text-only evaluations. This study compares text-only and audio-based evaluations of 10 MT systems from the WMT General MT Shared Task, using crowd-sourced judgments collected via Amazon Mechanical Turk. We additionally, performed statistical significance testing and self-replication experiments to test reliability and consistency of audio-based approach. Crowd-sourced assessments based on audio yield rankings largely consistent with text-only evaluations but, in some cases, identify significant differences between translation systems. We attribute this to speech’s richer, more natural modality and propose incorporating speech-based assessments into future MT evaluation frameworks.

pdf bib abs

An Appraisal Theoretic Approach to Modelling Affect Flow in Conversation Corpora
Alok Debnath | Yvette Graham | Owen Conlan
Proceedings of the 29th Conference on Computational Natural Language Learning

This paper presents a model of affect in conversations by leveraging Appraisal Theory as a generalizable framework. We propose that the multidimensional cognitive model of Appraisal Theory offers significant advantages for analyzing emotions in conversational contexts, addressing the current challenges of inconsistent annotation methodologies across corpora. To demonstrate this, we present AppraisePLM, a regression and classification model trained on the crowd-EnVent corpus that outperforms existing models in predicting 21 appraisal dimensions including pleasantness, self-control, and alignment with social norms. We apply AppraisePLM to diverse conversation datasets spanning task-oriented dialogues, general-domain chit-chat, affect-specific conversations, and domain-specific affect analysis. Our analysis reveals that AppraisePLM successfully extrapolates emotion labels across datasets, while capturing domain-specific patterns in affect flow – change in conversational emotion over the conversation. This work highlights the entangled nature of affective phenomena in conversation and positions affect flow as a promising model for holistic emotion analysis, offering a standardized approach to evaluate and benchmark affective capabilities in conversational agents.

2024

pdf bib abs

Advancing Open-Domain Conversational Agents - Designing an Engaging System for Natural Multi-Turn Dialogue
Islam A. Hassan | Yvette Graham
Proceedings of the 1st Workshop on Simulating Conversational Intelligence in Chat (SCI-CHAT 2024)

This system paper describes our conversational AI agent developed for the SCI-CHAT competition. The goal is to build automated dialogue agents that can have natural, coherent conversations with humans over multiple turns. Our model is based on fine-tuning the Snorkel-Mistral-PairRM-DPO language model on podcast conversation transcripts. This allows the model to leverage Snorkel-Mistral-PairRMDPO’s linguistic knowledge while adapting it for multi-turn dialogue modeling using LoRA. During evaluation, human judges will converse with the agent on specified topics and provide ratings on response quality. Our system aims to demonstrate how large pretrained language models, when properly adapted and evaluated, can effectively converse on open-ended topics spanning multiple turns.

Yvette Graham

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2010

2009

2008

2007

Co-authors

Venues