Candidate Profile Summarization: A RAG Approach with Synthetic Data Generation for Tech Jobs

Anum Afzal; Ishwor Subedi; Florian Matthes

Candidate Profile Summarization: A RAG Approach with Synthetic Data Generation for Tech Jobs

Anum Afzal, Ishwor Subedi, Florian Matthes

Abstract

As Large Language Models (LLMs) become increasingly applied to resume evaluation and candidate selection, this study investigates the effectiveness of using in-context example resumes to generate synthetic data. We compare a Retrieval-Augmented Generation (RAG) system to a Named Entity Recognition (NER)-based baseline for job-resume matching, generating diverse synthetic resumes with models like Mixtral-8x22B-Instruct-v0.1. Our results show that combining BERT, ROUGE, and Jaccard similarity metrics effectively assesses synthetic resume quality, ensuring the least lexical overlap along with high similarity and diversity. Our experiments show that RAG notably outperforms NER for retrieval tasks—though generation-based summarization remains challenged by role differentiation. Human evaluation further highlights issues of factual accuracy and completeness, emphasizing the importance of in-context examples, prompt engineering, and improvements in summary generation for robust, automated candidate selection.

Anthology ID:: 2025.ranlp-1.3
Volume:: Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era
Month:: September
Year:: 2025
Address:: Varna, Bulgaria
Editors:: Galia Angelova, Maria Kunilovskaya, Marie Escribe, Ruslan Mitkov
Venue:: RANLP
SIG:
Publisher:: INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:: 22–31
Language:
URL:: https://aclanthology.org/2025.ranlp-1.3/
DOI:
Bibkey:
Cite (ACL):: Anum Afzal, Ishwor Subedi, and Florian Matthes. 2025. Candidate Profile Summarization: A RAG Approach with Synthetic Data Generation for Tech Jobs. In Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era, pages 22–31, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):: Candidate Profile Summarization: A RAG Approach with Synthetic Data Generation for Tech Jobs (Afzal et al., RANLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.ranlp-1.3.pdf

PDF Cite Search Fix data