Automatic Error Detection: Comparing AI vs. Human Performance on L2 Italian Texts

Irene Fioravanti, Luciana Forti, Stefania Spina


Abstract
This paper reports on a study aimed at comparing AI vs. human performance in detecting and categorising errors in L2 Italian texts. Four LLMs were considered: ChatGPT, Copilot, Gemini and Llama3. Two groups of human annotators were involved: L1 and L2 speakers of Italian. A gold standard set of annotations was developed. A fine-grained annotation scheme was adopted, to reflect the specific traits of Italian morphosyntax, with related potential learner errors. Overall, we found that human annotation outperforms AI, with some degree of variation with respect tospecific error types. An increased attention to languages other than English in NLP may significantly improve AI performance in this pivotal task for the many domains of language-related disciplines.
Anthology ID:
2024.clicit-1.44
Volume:
Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)
Month:
December
Year:
2024
Address:
Pisa, Italy
Editors:
Felice Dell'Orletta, Alessandro Lenci, Simonetta Montemagni, Rachele Sprugnoli
Venue:
CLiC-it
SIG:
Publisher:
CEUR Workshop Proceedings
Note:
Pages:
366–372
Language:
URL:
https://aclanthology.org/2024.clicit-1.44/
DOI:
Bibkey:
Cite (ACL):
Irene Fioravanti, Luciana Forti, and Stefania Spina. 2024. Automatic Error Detection: Comparing AI vs. Human Performance on L2 Italian Texts. In Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024), pages 366–372, Pisa, Italy. CEUR Workshop Proceedings.
Cite (Informal):
Automatic Error Detection: Comparing AI vs. Human Performance on L2 Italian Texts (Fioravanti et al., CLiC-it 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.clicit-1.44.pdf