Evaluating large language models’ capability to generate algorithmic code using prompt engineering
2024 (English)Independent thesis Basic level (degree of Bachelor), 20 credits / 30 HE credits
Student thesis
Abstract [en]
The study evaluated the performance of large language models (LLMs) such as Gemini, ChatGPT- 4, and GitHub Copilot in generating C++ algorithms for specific tasks using different prompting techniques. The central aim was to assess the effectiveness of these models in creating code solutions that are both functionally correct and complete, using a combination of automated unit tests and human evaluation. Across two main tasks (Social Network and Huffman Encoding), the models showed different levels of success in generating functionally correct code. Github Copilot and ChatGPT-4 generally produced more syntactically accurate and functionally appropriate code than Gemini, There was a notable variation in completeness, whether the code met all the tasks’ specified requirements. Some models managed to include all necessary functionalities more consistently than others. Gemini, for instance, excelled in generating complete solutions for the Social Network task but had issues with the Huffman Encoding task, where its output often did not integrate the provided code effectively or correctly.
Place, publisher, year, edition, pages
2024. , p. 3, 42, xv
Keywords [en]
LLM, large language model, ChatGPT, Gemini, Github Copilot, prompt engineering, algorithm
National Category
Information Systems, Social aspects
Identifiers
URN: urn:nbn:se:his:diva-24285OAI: oai:DiVA.org:his-24285DiVA, id: diva2:1883138
Subject / course
Informationsteknologi
Educational program
Computer Science - Specialization in Systems Development
Supervisors
Examiners
2024-07-092024-07-092024-07-09Bibliographically approved