Leveraging large language models for accurate Cypher query generation: Natural language query to Cypher statements
2024 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE credits
Student thesis
Abstract [en]
The rise of Large Language Models (LLMs) has transformed various fields, including education, health, natural language processing, code generation, content creation, and more.
The study seeks to use large language models to generate Cypher Queries based on natural language questions. The main objective of the study is to leverage and evaluate large language models and measure their Cypher Query Generation capabilities.
The study utilizes GPT-3.5 turbo and Code Llama 2 for cypher generation in datasets collected and annotated across three categories: movies, network management, and companies. The study uses In-Context learning and QLoRA for fine-tuning the large language models. The BLEU and ROUGE evaluations indicate that GPT-3.5 turbo, utilizing the InContext learning method, outperforms the Code Llama 2, a fine-tuned model with QLoRA.
The main challenges faced in this study are the unavailability of datasets and limited computational resources, such as GPU.
Place, publisher, year, edition, pages
2024. , p. v, 47
Keywords [en]
Large language models, natural language generation, Cypher query, text generation, deep learning
National Category
Information Systems, Social aspects
Identifiers
URN: urn:nbn:se:his:diva-24158OAI: oai:DiVA.org:his-24158DiVA, id: diva2:1881385
Subject / course
Informationsteknologi
Educational program
Data Science - Master’s Programme
Supervisors
Examiners
2024-07-032024-07-032024-07-03Bibliographically approved