Jia Zhang

2025

Prompt engineering in translation: How do student translators leverage GenAI tools for translation tasks
Jia Zhang | Xiaoyu Zhao | Stephen Doherty
Proceedings of Machine Translation Summit XX: Volume 1

GenAI, though not developed specifically for translation, has shown the potential to produce translations as good as, if not better than, contemporary neural machine translation systems. In the context of tertiary-level translator education, the integration of GenAI has renewed debate in curricula and pedagogy. Despite divergent opinions among educators, it is evident that translation students, like many other students, are using GenAI tools to facilitate translation tasks as they use MT tools. We thus argue for the benefits of guiding students in using GenAI in an informed, critical, and ethical manner. To provide insights for tailored curriculum and pedagogy, it is insightful to investigate what students use GenAI for and how they use it. This study is among the first to investigate translation students’ prompting behaviours. For thematic and discourse analysis, we collected prompts in GenAI tools generated by a representative sample of postgraduate student participants for eight months. The findings revealed that students had indeed used GenAI in various translation tasks, but their prompting behaviours were intuitive and uninformed. Our findings suggest an urgent need for translation educators to consider students’ agency and critical engagement with GenAI tools.

pdf bib abs

Socioeconomic status (SES) reflects an individual’s standing in society, from a holistic set of factors including income, education level, and occupation. Identifying individuals in low-SES groups is crucial to ensuring they receive necessary support. However, many individuals may be hesitant to disclose their SES directly. This study introduces a federated learning-powered framework capable of verifying individuals’ SES levels through the analysis of their communications described in natural language. We propose to study language usage patterns among individuals from different SES groups using clustering and topic modeling techniques. An empirical study leveraging life narrative interviews demonstrates the effectiveness of our proposed approach.

pdf bib abs

Bridging the Socioeconomic Gap in Education: A Hybrid AI and Human Annotation Approach
Nahed Abdelgaber | Labiba Jahan | Arham Vinit Doshi | Rishi Suri | Hamza Reza Pavel | Jia Zhang
Proceedings of the 29th Conference on Computational Natural Language Learning

Students’ academic performance is influenced by various demographic factors, with socioeconomic class being a prominently researched and debated factor. Computer Science research traditionally prioritizes computationally definable problems, yet challenges such as the scarcity of high-quality labeled data and ethical concerns surrounding the mining of personal information can pose barriers to exploring topics like the impact of SES on students’ education. Overcoming these barriers may involve automating the collection and annotation of high-quality language data from diverse social groups through human collaboration. Therefore, our focus is on gathering unstructured narratives from Internet forums written by students with low socioeconomic status (SES) using machine learning models and human insights. We developed a hybrid data collection model that semi-automatically retrieved narratives from the Reddit website and created a dataset five times larger than the seed dataset. Additionally, we compared the performance of traditional ML models with recent large language models (LLMs) in classifying narratives written by low-SES students, and analyzed the collected data to extract valuable insights into the socioeconomic challenges these students encounter and the solutions they pursue.

2023

pdf bib abs

The impact of machine translation on the translation quality of undergraduate translation students
Jia Zhang | Hong Qian
Proceedings of Machine Translation Summit XIX, Vol. 2: Users Track

Post-editing (PE) refers to checking, proofreading, and revising the translation output of any automated translation (Gouadec, 2007, p. 25). It is needed because the meaning of a text can yet be accurately and fluently conveyed by machine translation (MT). The importance of PE and, accordingly, PE training has been widely acknowledged, and specialised courses have recently been introduced across universities and other organisations worldwide. However, scant consideration is given to when PE skills should be introduced in translation training. PE courses are usually offered to advanced translation learners, i.e., those at the postgraduate level or in the last year of an undergraduate program. Also, existing empirical studies most often investigate the impact of MT on postgraduate students or undergraduate students in the last year of their study. This paper reports on a study that aims to determine the possible effects of MT and PE on the translation quality of students at the early stage of translator training, i.e., undergraduate translation students with only basic translation knowledge. Methodologically, an experiment was conducted to compare students’ (n=10) PEMT-based translations and from-scratch translations without the assistance of machine translation. Second-year students of an undergraduate translation programme were invited to translate two English texts with similar difficulties into Chinese. One of the texts was translated directly, while the other one was done with reference to machine-generated translation. Translation quality can be dynamic. When examined from different perspectives using different methods, the quality of a translation can vary. Several methods of translation quality assessment were adopted in this project, including rubrics-based scoring, error analysis and fixed-point translation analysis. It was found that the quality of students’ PE translations was compromised compared with that of from-scratch translations. In addition, errors were more homogenised in the PEMT-based translations. It is hoped that this study can shed some light on the role of PEMT in translator training and contribute to the curricula and course design of post-editing for translator education. Reference: Gouadec, D. (2007). Translation as a Profession. John Benjamins Publishing. Keywords: machine translation, post-editing, translator training, translation quality assessment, error analysis, undergraduate students

pdf bib abs

Exploring undergraduate translation students’ perceptions towards machine translation: A qualitative questionnaire survey
Jia Zhang
Proceedings of Machine Translation Summit XIX, Vol. 2: Users Track

Machine translation (MT) has relatively recently been introduced in higher education institutions, with specialised courses provided for students. However, such courses are often offered at the postgraduate level or towards the last year of an undergraduate programme (e.g., Arenas & Moorkens, 2019; Doherty et al., 2012). Most previous studies have focussed on postgraduate students or undergraduate students in the last year of their programme and surveyed their perceptions or attitudes towards MT with quantitative questionnaires (e.g., Liu et al., 2022; Yang et al., 2021), yet undergraduate students earlier in their translation education remain overlooked. As such, not much is known about how they perceive and use MT and what their training needs may be. This study investigates the perceptions towards MT of undergraduate students at the early stage of translator training via qualitative questionnaires. Year-two translation students with little or no MT knowledge and no real-life translation experience (n=20) were asked to fill out a questionnaire with open-ended questions. Their answers were manually analysed by the researcher using NVivo to identify themes and arguments. It was revealed that even without proper training, the participants recognised MT’s potential advantages and disadvantages to a certain degree. MT is more often engaged as an instrument to learn language and translation rather than straightforwardly a translation tool. None of the students reported post-editing machine-generated translation in their translation assignments. Instead, they referenced MT output to understand terms, slang, fixed combinations and complicated sentences and to produce accurate, authentic and diversified phrases and sentences. They held a positive attitude towards MT quality and agreed that MT increased their translation quality, and they felt more confident with the tasks. While they were willing to experiment with MT as a translation tool and perform post-editing in future tasks, they were doubtful that MT could be introduced in the classroom at their current stage of translation learning. They feared that MT would impact their independent and critical thinking. Students did not mention any potential negative impacts of MT on the development of their language proficiency or translation competency. It is hoped that the findings will make an evidence-based contribution to the design of MT curricula and teaching pedagogies. Keywords: machine translation, post-editing, translator training, perception, attitudes, teaching pedagogy References: Arenas, A. G., & Moorkens, J. (2019). Machine translation and post-editing training as part of a master’s programme. Journal of Specialised Translation, 31, 217–238. Doherty, S., Kenny, D., & Way, A. (2012). Taking statistical machine translation to the student translator. Proceedings of the 10th Conference of the Association for Machine Translation in the Americas: Commercial MT User Program. Liu, K., Kwok, H. L., Liu, J., & Cheung, A. K. (2022). Sustainability and influence of machine translation: Perceptions and attitudes of translation instructors and learners in Hong Kong. Sustainability, 14(11), 6399. Yang, Y., Wang, X., & Yuan, Q. (2021). Measuring the usability of machine translation in the classroom context. Translation and Interpreting Studies, 16(1), 101–123.

2022

pdf bib abs

ValCAT: Variable-Length Contextualized Adversarial Transformations Using Encoder-Decoder Language Model
Chuyun Deng | Mingxuan Liu | Yue Qin | Jia Zhang | Hai-Xin Duan | Donghong Sun
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Adversarial texts help explore vulnerabilities in language models, improve model robustness, and explain their working mechanisms. However, existing word-level attack methods trap in a one-to-one attack pattern, i.e., only a single word can be modified in one transformation round, and they ignore the interactions between several consecutive words. In this paper, we propose ValCAT, a black-box attack framework that misleads the language model by applying variable-length contextualized transformations to the original text. Compared to word-level methods, ValCAT expands the basic units of perturbation from single words to spans composed of multiple consecutive words, enhancing the perturbation capability. Experiments show that our method outperforms state-of-the-art methods in terms of attack success rate, perplexity, and semantic similarity on several classification tasks and inference tasks. The comprehensive human evaluation demonstrates that ValCAT has a significant advantage in ensuring the fluency of the adversarial examples and achieves better semantic consistency. We release the code at https://github.com/linerxliner/ValCAT.

Co-authors

Venues

Fix author