LaTeXMT: Machine Translation for LaTeX Documents

Calvin Hoy, Samuel Frontull, Georg Moser


Abstract
While machine translation has taken great strides in recent years, thanks in large part to transformer language models, machine translation tools are designed primarily for plain text, and thus not equipped to deal with complex markup documents. Not even Large Language Models can reliably handle LaTeX source files, as non-standard structures are not captured by any available training data. Previous attempts to create translation engines for LaTeX either work on compiled documents, rely on document pre-processors which may lose critical semantic elements, or cannot distinguish between text and non-text content. In this paper we present LaTeXMT, a software solution for structure-preserving, source-to-source translation of LaTeX documents. All of the source code to LaTeXMT is provided under the LGPL-3.0 open-source licence and a web version is publicly available.
Anthology ID:
2025.emnlp-demos.56
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Ivan Habernal, Peter Schulam, Jörg Tiedemann
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
739–748
Language:
URL:
https://aclanthology.org/2025.emnlp-demos.56/
DOI:
Bibkey:
Cite (ACL):
Calvin Hoy, Samuel Frontull, and Georg Moser. 2025. LaTeXMT: Machine Translation for LaTeX Documents. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 739–748, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
LaTeXMT: Machine Translation for LaTeX Documents (Hoy et al., EMNLP 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.emnlp-demos.56.pdf