Deep Reinforcement Learning (DRL) is a branch of Artificial Intelligence (AI) focused on developing decision-making systems that learn through interaction with their environment. A central challenge in DRL is generalization—the ability of trained models to perform well in previously unseen environments. This study aims to evaluate the impact of hyperparameter optimization (HPO) on the generalization capabilities of two popular DRL algorithms: Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC). HPO, typically used to improve task-specific performance by finding the optimal hyperparameters (HPs), is investigated here as a potential method to enhance generalization. Experiments were conducted to compare the performance of PPO and SAC with and without HPO across varied environments. Results indicate that SAC benefits from HPO, whereas PPO performs better with default settings. These findings suggest that the effectiveness of HP tuning in DRL is highly context-dependent, influenced by both the choice of algorithm and the characteristics of the environment.