AKCIT at SemEval-2026 Task 13: A Lightweight LightGBM Baseline for Cross-Language Detection of LLM-Generated Code

Rone Brandao Filho; Walcy Santos Rezende Rios; Lucas Neves; Jose Ricardo Fleury Oliveira; Diogo Fernandes Costa Silva; Arlindo Galvão Filho

AKCIT at SemEval-2026 Task 13: A Lightweight LightGBM Baseline for Cross-Language Detection of LLM-Generated Code

Rone Brandao Filho, Walcy Santos Rezende Rios, Lucas Neves, Jose Ricardo Fleury Oliveira, Diogo Fernandes, Arlindo Galvão Filho

Abstract

The widespread use of LLMs in software development has made the detection of machine-generated code a pressing challenge, particularly when models must generalize across programming languages and domains. We present a lightweight, LLM-free pipeline that combines stylometric feature extraction with a LightGBM classifier and explicitly prioritizes structural generalization over deep semantic modeling. Despite its simplicity, the method achieves a Macro F1 of 0.70–0.72, more than doubling the CodeBERT baseline (0.30) in SemEval-2026 Task 13 Subtask A, while operating without GPUs or any fine-tuning.

Anthology ID:: 2026.semeval-1.417
Volume:: Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Month:: July
Year:: 2026
Address:: San Diego, California, USA
Editors:: Ekaterina Kochmar, Debanjan Ghosh, Kai North, Mamoru Komachi
Venues:: SemEval | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3357–3362
Language:
URL:: https://aclanthology.org/2026.semeval-1.417/
DOI:
Bibkey:
Cite (ACL):: Rone Brandao Filho, Walcy Santos Rezende Rios, Lucas Neves, Jose Ricardo Fleury Oliveira, Diogo Fernandes, and Arlindo Galvão Filho. 2026. AKCIT at SemEval-2026 Task 13: A Lightweight LightGBM Baseline for Cross-Language Detection of LLM-Generated Code. In Proceedings of the 20th International Workshop on Semantic Evaluation (2026), pages 3357–3362, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):: AKCIT at SemEval-2026 Task 13: A Lightweight LightGBM Baseline for Cross-Language Detection of LLM-Generated Code (Brandao Filho et al., SemEval 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.semeval-1.417.pdf

PDF Cite Search Fix data