Amin Mantrach
2026
Multilingual Self-Taught Faithfulness Evaluators
Carlo Alfano | Aymen Al Marjani | Zeno Jonke | Amin Mantrach | Saab Mansour | Marcello Federico
Findings of the Association for Computational Linguistics: EACL 2026
Carlo Alfano | Aymen Al Marjani | Zeno Jonke | Amin Mantrach | Saab Mansour | Marcello Federico
Findings of the Association for Computational Linguistics: EACL 2026
The growing use of large language models (LLMs) has increased the need for automatic evaluation systems, particularly to address the challenge of information hallucination. Although existing faithfulness evaluation approaches have shown promise, they are predominantly English-focused and often require expensive human-labeled training data for fine-tuning specialized models. As LLMs see increased adoption in multilingual contexts, there is a need for accurate faithfulness evaluators that can operate across languages without extensive labeled data. This paper presents STEMF (Self-Taught Evaluators for Multilingual Faithfulness), a framework that learns exclusively from synthetic multilingual data while leveraging cross-lingual transfer learning. Through experiments comparing language-specific and mixed-language fine-tuning approaches, we demonstrate a consistent relationship between an LLM’s general language capabilities and its performance in language-specific evaluation tasks. Our framework shows improvements over existing baselines, including state-of-the-art English evaluators and machine translation-based approaches.
2025
Monte Carlo Temperature: a robust sampling strategy for LLM’s uncertainty quantification methods
Nicola Cecere | Andrea Bacciu | Ignacio Fernández-Tobías | Amin Mantrach
Proceedings of the 5th Workshop on Trustworthy NLP (TrustNLP 2025)
Nicola Cecere | Andrea Bacciu | Ignacio Fernández-Tobías | Amin Mantrach
Proceedings of the 5th Workshop on Trustworthy NLP (TrustNLP 2025)
Uncertainty quantification (UQ) in Large Language Models (LLMs) is essential for their safe and reliable deployment, particularly in critical applications where incorrect outputs can have serious consequences. Current UQ methods typically rely on querying the model multiple times using non-zero temperature sampling to generate diverse outputs for uncertainty estimation. However, the impact of selecting a given temperature parameter is understudied, and our analysis reveals that temperature plays a fundamental role in the quality of uncertainty estimates. The conventional approach of identifying optimal temperature values requires expensive hyperparameter optimization (HPO) that must be repeated for each new model-dataset combination. We propose Monte Carlo Temperature (MCT), a robust sampling strategy that eliminates the need for temperature calibration. Our analysis reveals that: 1) MCT provides more robust uncertainty estimates across a wide range of temperatures, 2) MCT improves the performance of UQ methods by replacing fixed-temperature strategies that do not rely on HPO, and 3) MCT achieves statistical parity with oracle temperatures, which represent the ideal outcome of a well-tuned but computationally expensive HPO process. These findings demonstrate that effective UQ can be achieved without the computational burden of temperature parameter calibration.
2015
HEADS: Headline Generation as Sequence Prediction Using an Abstract Feature-Rich Space
Carlos A. Colmenares | Marina Litvak | Amin Mantrach | Fabrizio Silvestri
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Carlos A. Colmenares | Marina Litvak | Amin Mantrach | Fabrizio Silvestri
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies