Högskolan i Skövde

his.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Evaluating Gradient Boosting Machines Against Classical Machine Learning Algorithms for Effective Multiple Sclerosis Subtyping
University of Skövde, School of Bioscience.
2024 (English)Independent thesis Advanced level (degree of Master (One Year)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Multiple Sclerosis is a chronic and complex neurological disorder characterized by significant variability in its subtypes, including Relapsing-Remitting MS, Primary Progressive MS and Secondary Progressive MS. Accurate classification of these subtypes is critical for effective treatment and disease management but remains challenging due to the inherent complexity of biological data and the limitations of traditional diagnostic methods. This study evaluates the performance of Gradient Boosting Machines, including XGBoost, CatBoost, and LightGBM, compared to classical machine learning algorithms, such as Random Forest, Support Vector Machines, Naive Bayes, K-Nearest Neighbours and Logistic Regression, for MS subtype classification using transcriptomic data from cerebrospinal fluid and blood samples. Three publicly available datasets, E-MTAB-2374, GSE190847, and E-MTAB-5151, were analysed, with preprocessing steps including PCA for dimensionality reduction and SMOTE to address class imbalance. Model performance was assessed using metrics such as balanced accuracy, precision, recall, and F1-score. Results indicated that dataset characteristics, particularly class balance and biological source, significantly influenced model performance. While GBMs showed potential in handling complex, high dimensional datasets, no single model consistently outperformed across all datasets or subtypes. This study highlights the importance of integrating advanced machine learning techniques with robust preprocessing strategies to improve MS subtype classification. Future research should focus on multiomics integration and external validation to enhance diagnostic accuracy and clinical applicability.

Place, publisher, year, edition, pages
2024. , p. 33
Keywords [en]
Classification, Gradient Boosting Machine, Machine Learning, Multiple Sclerosis, Transcriptomics
National Category
Bioinformatics (Computational Biology)
Identifiers
URN: urn:nbn:se:his:diva-24891OAI: oai:DiVA.org:his-24891DiVA, id: diva2:1936451
Subject / course
Bioinformatics
Educational program
Bioinformatics - Master’s Programme
Supervisors
Examiners
Available from: 2025-02-11 Created: 2025-02-11 Last updated: 2025-09-29Bibliographically approved

Open Access in DiVA

No full text in DiVA

By organisation
School of Bioscience
Bioinformatics (Computational Biology)

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 119 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf