Evaluating Gradient Boosting Machines Against Classical Machine Learning Algorithms for Effective Multiple Sclerosis Subtyping
2024 (English)Independent thesis Advanced level (degree of Master (One Year)), 20 credits / 30 HE credits
Student thesis
Abstract [en]
Multiple Sclerosis is a chronic and complex neurological disorder characterized by significant variability in its subtypes, including Relapsing-Remitting MS, Primary Progressive MS and Secondary Progressive MS. Accurate classification of these subtypes is critical for effective treatment and disease management but remains challenging due to the inherent complexity of biological data and the limitations of traditional diagnostic methods. This study evaluates the performance of Gradient Boosting Machines, including XGBoost, CatBoost, and LightGBM, compared to classical machine learning algorithms, such as Random Forest, Support Vector Machines, Naive Bayes, K-Nearest Neighbours and Logistic Regression, for MS subtype classification using transcriptomic data from cerebrospinal fluid and blood samples. Three publicly available datasets, E-MTAB-2374, GSE190847, and E-MTAB-5151, were analysed, with preprocessing steps including PCA for dimensionality reduction and SMOTE to address class imbalance. Model performance was assessed using metrics such as balanced accuracy, precision, recall, and F1-score. Results indicated that dataset characteristics, particularly class balance and biological source, significantly influenced model performance. While GBMs showed potential in handling complex, high dimensional datasets, no single model consistently outperformed across all datasets or subtypes. This study highlights the importance of integrating advanced machine learning techniques with robust preprocessing strategies to improve MS subtype classification. Future research should focus on multiomics integration and external validation to enhance diagnostic accuracy and clinical applicability.
Place, publisher, year, edition, pages
2024. , p. 33
Keywords [en]
Classification, Gradient Boosting Machine, Machine Learning, Multiple Sclerosis, Transcriptomics
National Category
Bioinformatics (Computational Biology)
Identifiers
URN: urn:nbn:se:his:diva-24891OAI: oai:DiVA.org:his-24891DiVA, id: diva2:1936451
Subject / course
Bioinformatics
Educational program
Bioinformatics - Master’s Programme
Supervisors
Examiners
2025-02-112025-02-112025-09-29Bibliographically approved