Investigating the Ability of Machine Learning Models to Classify LLM-Generated Software Artifacts from Student-Written Software Artifacts
2025 (English)Independent thesis Basic level (degree of Bachelor), 20 credits / 30 HE credits
Student thesis
Abstract [en]
This study investigates the use of machine learning (ML) models to distinguish between Largue Language Model generated (LLM-generated) and student-written programming assignments. It compares the CSEDM and LLM-generated datasets from ChatGPT, DeepSeek, and Qwen to identify distinguishing features. The LLM dataset has been generated with several prompt engineering techniques to ensure a diverse and representative series of outputs. Several ML models, including XGBoost, SVM, and LightGBM, are evaluated for classification performance. This study has shown that Deepseek is the most difficult to classify LLM, whereas the ML models excelled in different areas, with no one model being preeminent. The study also explores prompt engineering techniques to generate LLM code resembling student submissions. Results include an analysis of what ML models perform the best and why, visualising what differences LLM software artifacts have to student artifacts. Furthermore, the results emphasize and visualize features importance for different models using feature extraction. The findings offer promising results and insights into educational policy and the development of stronger automated detection tools.
Place, publisher, year, edition, pages
2025. , p. 130
Keywords [en]
Machine Learning, Large Language Models, Programming Education, Feature Extraction, DeepSeek, ChatGPT, Qwen, Prompt Engineering
National Category
Software Engineering Computer Systems
Identifiers
URN: urn:nbn:se:his:diva-25548OAI: oai:DiVA.org:his-25548DiVA, id: diva2:1985257
Subject / course
Informationsteknologi
Educational program
Computer Science - Specialization in Systems Development
Supervisors
Examiners
2025-07-232025-07-232025-09-29Bibliographically approved