Högskolan i Skövde

his.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Domain Generalization of Deep Learning Models Under Subgroup Shift in Breast Cancer Prognosis
University of Skövde, School of Bioscience. University of Skövde, Systems Biology Research Environment. (Translational Bioinformatics)ORCID iD: 0000-0003-4191-8435
University of Skövde, School of Bioscience. University of Skövde, Systems Biology Research Environment. (Translational Bioinformatics)ORCID iD: 0000-0001-9242-4852
University of Skövde, School of Informatics. University of Skövde, Informatics Research Environment. (Skövde Artificial Intelligence Lab (SAIL))ORCID iD: 0000-0001-8884-2154
University of Skövde, School of Bioscience. University of Skövde, Systems Biology Research Environment. Institute of Medicine, Sahlgrenska Academy University of Gothenburg, Sweden. (Translational Bioinformatics)ORCID iD: 0000-0003-4697-0590
2024 (English)In: 2024 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), IEEE, 2024Conference paper, Published paper (Refereed)
Abstract [en]

Making breast cancer prognosis from gene expression profiles of the primary tumor has become a promising application of deep learning. Yet, to be relevant to real world applications in the clinic and for knowledge discovery, these models must be robust to common distribution shifts. In this study, we evaluate recently proposed methods for improving domain and subgroup shifts. We test the in-distribution and out-of-distribution generalization of multiple episode learning, stochastic weight averaging, group distributionally robust optimization, and a subsampling scheme on one training and four external breast cancer prognosis datasets. The evaluation found that the methods can, to various degrees, improve generalization across domains, although there remain, partially high, generalization gaps. Additionally, in-distribution and out-of-distribution generalization differs between clinical subtypes of breast cancer. Thus, we conclude that further research into methods specifically addressing challenges in breast cancer prognosis from gene expression data are warranted. 

Place, publisher, year, edition, pages
IEEE, 2024.
Series
IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), ISSN 2994-9351, E-ISSN 2994-9408
Keywords [en]
breast cancer, domain generalization, gene expression, subgroup shift, survival analysis, Contrastive Learning, Diseases, Lung cancer, Stochastic systems, Breast cancer prognosis, Gene expression profiles, Generalisation, Genes expression, Learning models, Real-world
National Category
Cancer and Oncology Bioinformatics and Computational Biology Other Computer and Information Science
Research subject
Bioinformatics; Skövde Artificial Intelligence Lab (SAIL)
Identifiers
URN: urn:nbn:se:his:diva-24659DOI: 10.1109/CIBCB58642.2024.10702166ISI: 001546450400010Scopus ID: 2-s2.0-85207504799ISBN: 979-8-3503-5663-2 (electronic)ISBN: 979-8-3503-5664-9 (print)OAI: oai:DiVA.org:his-24659DiVA, id: diva2:1911226
Conference
21st IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2024, 27-29 August 2024, Natal, Brazil
Funder
Knowledge Foundation, 20170302Knowledge Foundation, 20200014Swedish Research Council, 2022-06725
Note

© 2024 IEEE

Correspondence Address: S.R. Stahlschmidt; University of Skövde, Systems Biology Research Center, Skövde, Sweden; email: soren.richard.stahlschmidt@his.se

This work was supported by the University of Skövde, Sweden under grants from the Knowledge Foundation (20170302, 20200014). The computations were enabled by resources provided by Chalmers e-Commons at Chalmers and the National Academic Infrastructure for Supercomputing in Sweden (NAISS), partially funded by the Swedish Research Council through grant agreement no. 2022-06725.

Available from: 2024-11-07 Created: 2024-11-07 Last updated: 2025-10-17Bibliographically approved
In thesis
1. Machine Learning for Predicting Cancer Endpoints from Bulk Omics Data: Generalizing Knowledge from Various Modalities Across Domains
Open this publication in new window or tab >>Machine Learning for Predicting Cancer Endpoints from Bulk Omics Data: Generalizing Knowledge from Various Modalities Across Domains
2025 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Cancer remains one of the leading causes of death and is a major burden on patients and healthcare systems. One difficulty for finding effective treatment and matching patients to the right treatment strategy is the complexity of tumor biology. Machine learning holds the potential to learn patterns from data generated by high-throughput technologies, such as RNA-sequencing, that can elucidate the mechanisms underlying cancers and make clinically relevant predictions. In this thesis, we investigate the modeling of cancer with machine learning approaches from different molecular perspectives. First, we review the literature on the fusion of biomedical modalities with multimodal deep neural networks. In this review, we provide a descriptive overview, propose a novel taxonomy, and identify relevant research gaps. Moreover, for models to be applicable to clinical practice, they must be robust to shifts in the distribution patients are sampled from. Such shifts can stem from differences in the underlying biology or technical variation introduced during the processing of the biological material. Therefore, in two studies, we investigate domain generalization of machine learning models trained with bulk RNA-sequencing data to predict cancer survival endpoints. First, we show that deep learning-based domain generalization methods developed on non-molecular data improve robustness to distributional shifts on molecular data. We test these methods by predicting overall and recurrence free survival of breast cancer patients with subgroup shifts between source and target domains. Next, we show that relative representations of normalized count values, such as binning or ranking of expression values within a single sample, can increase domain generalization. We test these approaches in three experiments on breast, brain, and ovarian cancer. In a final study, we show that cancer stage can be predicted from circulating microRNA data with machine learning models, providing a proof of concept for this application. Overall, the work in this thesis supports making machine learning models more applicable to clinical practice by providing empirical evidence of methods improving the modeling of cancer biology. Continuing to study domain generalization of models in clinical practice and to develop methods for robustness are highlighted as future work.

Place, publisher, year, edition, pages
Skövde: University of Skövde, 2025. p. xi, 147
Series
Dissertation Series ; 63
National Category
Cancer and Oncology Bioinformatics (Computational Biology) Other Computer and Information Science
Research subject
Bioinformatics
Identifiers
urn:nbn:se:his:diva-25131 (URN)978-91-987907-9-5 (ISBN)978-91-989080-0-8 (ISBN)
Public defence
2025-06-04, G110, University of Skövde Building G, Skövde, 13:00 (English)
Opponent
Supervisors
Note

Ett av fyra delarbeten (övriga se rubriken Delarbeten/List of papers):

3. Stahlschmidt, Sören Richard, Synnergren, Jane, and Giovannucci, Andrea (2025). “Relative Representations of RNA-seq Data Improve Domain Generalization of Machine Learning Models for Cancer Prognosis”. In: Under Submission.

Publications with low relevance:

5. Johansson, Markus, Stahlschmidt, Sören Richard, Heydarkhan-Hagvall, Sepideh, Jeppsson, Anders, Holmgren, Gustav, Sartipy, Peter, and Synnergren, Jane (2025). “Uncovering the transcriptomic landscape of cardiac hypertrophy using single-cell RNA sequencing and machine learning”. In: Under Submission.

6. Lyubetskaya, Anna et al. (2025). “In situ multi-modal characterization of pancreatic cancer reveals tumor cell identity as a defining factor of the surrounding microenvironment”. In: Under Submission.

7. Marzec-Schmidt, Katarzyna, Ghosheh, Nidal, Stahlschmidt, Sören Richard, Küppers-Munther, Barbara, Synnergren, Jane, and Ulfenborg, Benjamin (2023). “Artificial Intelligence Supports Automated Characterization of Differentiated Human Pluripotent Stem Cells”. In: Stem Cells 41.9, pp. 850–861. DOI:10. 1093/stmcls/sxad049. 

Available from: 2025-05-12 Created: 2025-05-09 Last updated: 2025-09-29Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Stahlschmidt, Sören RichardUlfenborg, BenjaminFalkman, GöranSynnergren, Jane

Search in DiVA

By author/editor
Stahlschmidt, Sören RichardUlfenborg, BenjaminFalkman, GöranSynnergren, Jane
By organisation
School of BioscienceSystems Biology Research EnvironmentSchool of InformaticsInformatics Research Environment
Cancer and OncologyBioinformatics and Computational BiologyOther Computer and Information Science

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 468 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf