Deep learning models, while powerful, often struggle with uncertainty when faced with noisy or out-of-distribution data, leading to unreliable predictions. Uncertainty in this context refers to the model’s confidence in its predictions, which can be categorized as aleatoric (data-related) or epistemic (model-related). Accurately quantifying this uncertainty is crucial for deploying deep learning systems in critical applications, where over- or under-confident errors can have severe consequences.
This thesis investigates how different uncertainty quantification methods—specifically Monte Carlo Dropout, Ensemble method, and their combination, Ensemble of Monte Carlo Droput—perform across varying deep learning architectures. Using a synthetic dataset designed to challenge these models, the study examines the effects of network width and depth on uncertainty estimation. Results indicate that wider networks yield sharper, narrower uncertainty boundaries, while deeper networks tend to exhibit uncertainty patterns that better capture the complexity of data. Calibration metrics, including Expected Calibration Error and Brier Score, are used to evaluate the reliability of these models, revealing ongoing challenges in achieving well-calibrated uncertainty estimates.
The findings offer valuable insights into optimizing both UQ methods and network architectures to enhance the reliability and robustness of deep learning models, contributing to the development of more trustworthy AI systems.