Open this publication in new window or tab >>2025 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]
Cancer remains one of the leading causes of death and is a major burden on patients and healthcare systems. One difficulty for finding effective treatment and matching patients to the right treatment strategy is the complexity of tumor biology. Machine learning holds the potential to learn patterns from data generated by high-throughput technologies, such as RNA-sequencing, that can elucidate the mechanisms underlying cancers and make clinically relevant predictions. In this thesis, we investigate the modeling of cancer with machine learning approaches from different molecular perspectives. First, we review the literature on the fusion of biomedical modalities with multimodal deep neural networks. In this review, we provide a descriptive overview, propose a novel taxonomy, and identify relevant research gaps. Moreover, for models to be applicable to clinical practice, they must be robust to shifts in the distribution patients are sampled from. Such shifts can stem from differences in the underlying biology or technical variation introduced during the processing of the biological material. Therefore, in two studies, we investigate domain generalization of machine learning models trained with bulk RNA-sequencing data to predict cancer survival endpoints. First, we show that deep learning-based domain generalization methods developed on non-molecular data improve robustness to distributional shifts on molecular data. We test these methods by predicting overall and recurrence free survival of breast cancer patients with subgroup shifts between source and target domains. Next, we show that relative representations of normalized count values, such as binning or ranking of expression values within a single sample, can increase domain generalization. We test these approaches in three experiments on breast, brain, and ovarian cancer. In a final study, we show that cancer stage can be predicted from circulating microRNA data with machine learning models, providing a proof of concept for this application. Overall, the work in this thesis supports making machine learning models more applicable to clinical practice by providing empirical evidence of methods improving the modeling of cancer biology. Continuing to study domain generalization of models in clinical practice and to develop methods for robustness are highlighted as future work.
Place, publisher, year, edition, pages
Skövde: University of Skövde, 2025. p. xi, 147
Series
Dissertation Series ; 63
National Category
Cancer and Oncology Bioinformatics (Computational Biology) Other Computer and Information Science
Research subject
Bioinformatics
Identifiers
urn:nbn:se:his:diva-25131 (URN)978-91-987907-9-5 (ISBN)978-91-989080-0-8 (ISBN)
Public defence
2025-06-04, G110, University of Skövde Building G, Skövde, 13:00 (English)
Opponent
Supervisors
Note
Ett av fyra delarbeten (övriga se rubriken Delarbeten/List of papers):
3. Stahlschmidt, Sören Richard, Synnergren, Jane, and Giovannucci, Andrea (2025). “Relative Representations of RNA-seq Data Improve Domain Generalization of Machine Learning Models for Cancer Prognosis”. In: Under Submission.
Publications with low relevance:
5. Johansson, Markus, Stahlschmidt, Sören Richard, Heydarkhan-Hagvall, Sepideh, Jeppsson, Anders, Holmgren, Gustav, Sartipy, Peter, and Synnergren, Jane (2025). “Uncovering the transcriptomic landscape of cardiac hypertrophy using single-cell RNA sequencing and machine learning”. In: Under Submission.
6. Lyubetskaya, Anna et al. (2025). “In situ multi-modal characterization of pancreatic cancer reveals tumor cell identity as a defining factor of the surrounding microenvironment”. In: Under Submission.
7. Marzec-Schmidt, Katarzyna, Ghosheh, Nidal, Stahlschmidt, Sören Richard, Küppers-Munther, Barbara, Synnergren, Jane, and Ulfenborg, Benjamin (2023). “Artificial Intelligence Supports Automated Characterization of Differentiated Human Pluripotent Stem Cells”. In: Stem Cells 41.9, pp. 850–861. DOI:10. 1093/stmcls/sxad049.
2025-05-122025-05-092025-05-21Bibliographically approved