Utvärdering av autonom lager-navigering genom generella förstärknings- och imitationsinlärningsalgoritmer: En jämförande studie av PPO, BC och GAIL metoder för autonom lager-navigering
2025 (Swedish)Independent thesis Basic level (degree of Bachelor), 20 credits / 30 HE credits
Student thesisAlternative title
Evaluation of Autonomous Warehouse Navigation through General Reinforcement and Imitation Learning Algorithms : A Comparative Study of PPO, BC and GAIL Methods for Autonomous Warehouse Navigation (English)
Abstract [en]
As reinforcement learning (RL) algorithms advance and warehouse automation becomes increasingly important for efficient logistics operations, developing autonomous navigation for robots is a key interest. This study evaluates two machine-learning paradigms within a simulated warehouse environment. First, an RL algorithm called Proximal Policy Optimization (PPO) is evaluated against combined methods that are pre-trained via imitation learning (IL) algorithms and subsequently fine-tuned with PPO (IL + RL). Second, two IL algorithms called Behavioral Cloning (BC), and Generative Adversarial Imitation Learning (GAIL) are evaluated against each other to assess their stand alone and combined navigation performance. Together, these experiments show both the benefit of combining IL with RL fine-tuning versus standalone RL, and the comparative value of IL algorithms when used in dependently (BC versus GAIL) and combined (BC + GAIL) for robot navigation.The autonomous agent is controlled by a neural network, specifically a multi layer perceptron (MLP). Performance metrics, namely mean reward and sample efficiency are tracked at multiple training milestones. The results show that one method combining BC + PPO (IL +RL) consistently outperforms the PPO (RL) method, even with a low amount of demonstration data used. Also, for the standalone IL evaluations, it shows that BC performs overall better than GAIL for this given game-engine-based environment and MLP complexity. These findings give insight into the generalizability of PPO, GAIL, and BC algorithms outside of domain-specific simulators and the advantages and limitations of both standalone and sequential training methods in autonomous warehouse navigation.
Place, publisher, year, edition, pages
2025. , p. 44
Keywords [en]
Artificial Intelligence, Godot Agents, Reinforcement learning, Imitation learning, Autonomous navigation, Proximal Policy Optimization, Behavioral Cloning
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:his:diva-25569OAI: oai:DiVA.org:his-25569DiVA, id: diva2:1985389
Subject / course
Informationsteknologi
Educational program
Computer Science - Specialization in Systems Development
Supervisors
Examiners
2025-07-242025-07-242025-09-29Bibliographically approved