Marina Torrón
2024
Generating subject-matter expertise assessment questions with GPT-4: a medical translation use-case
Diana Silveira
|
Marina Torrón
|
Helena Moniz
Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 1)
This paper examines the suitability of a large language model (LLM), GPT-4, for generating multiple choice questions (MCQs) aimed at assessing subject matter expertise (SME) in the domain of medical translation. The main objective of these questions is to model the skills of potential subject matter experts in a human-in-the-loop machine translation (MT) flow, to ensure that tasks are matched to the individuals with the right skill profile. The investigation was conducted at Unbabel, an artificial intelligence-powered human translation platform. Two medical translation experts evaluated the GPT-4-generated questions and answers, one focusing on English–European Portuguese, and the other on English–German. We present a methodology for creating prompts to elicit high-quality GPT-4 outputs for this use case, as well as for designing evaluation scorecards for human review of such output. Our findings suggest that GPT-4 has the potential to generate suitable items for subject matter expertise tests, providing a more efficient approach compared to relying solely on humans. Furthermore, we propose recommendations for future research to build on our approach and refine the quality of the outputs generated by LLMs.