Högskolan i Skövde

his.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Multimodal deep learning for biomedical data fusion: a review
University of Skövde, School of Bioscience. University of Skövde, Systems Biology Research Environment. (Translational bioinformatics)ORCID iD: 0000-0003-4191-8435
University of Skövde, Systems Biology Research Environment. University of Skövde, School of Bioscience. (Translational bioinformatics)ORCID iD: 0000-0001-9242-4852
University of Skövde, Systems Biology Research Environment. University of Skövde, School of Bioscience. (Translational bioinformatics)ORCID iD: 0000-0003-4697-0590
2022 (English)In: Briefings in Bioinformatics, ISSN 1467-5463, E-ISSN 1477-4054, Vol. 23, no 2, article id bbab569Article, review/survey (Refereed) Published
Abstract [en]

Biomedical data are becoming increasingly multimodal and thereby capture the underlying complex relationships among biological processes. Deep learning (DL)-based data fusion strategies are a popular approach for modeling these nonlinear relationships. Therefore, we review the current state-of-the-art of such methods and propose a detailed taxonomy that facilitates more informed choices of fusion strategies for biomedical applications, as well as research on novel methods. By doing so, we find that deep fusion strategies often outperform unimodal and shallow approaches. Additionally, the proposed subcategories of fusion strategies show different advantages and drawbacks. The review of current methods has shown that, especially for intermediate fusion strategies, joint representation learning is the preferred approach as it effectively models the complex interactions of different levels of biological organization. Finally, we note that gradual fusion, based on prior biological knowledge or on search strategies, is a promising future research path. Similarly, utilizing transfer learning might overcome sample size limitations of multimodal data sets. As these data sets become increasingly available, multimodal DL approaches present the opportunity to train holistic models that can learn the complex regulatory dynamics behind health and disease.

Place, publisher, year, edition, pages
Oxford University Press, 2022. Vol. 23, no 2, article id bbab569
Keywords [en]
data integration, deep neural networks, fusion strategies, multi-omics, multimodal machine learning, representation learning
National Category
Bioinformatics (Computational Biology)
Research subject
Bioinformatics
Identifiers
URN: urn:nbn:se:his:diva-20873DOI: 10.1093/bib/bbab569ISI: 000804196500091PubMedID: 35089332Scopus ID: 2-s2.0-85127534700OAI: oai:DiVA.org:his-20873DiVA, id: diva2:1633465
Funder
Knowledge Foundation, 20170302Knowledge Foundation, 20200014
Note

CC BY-NC 4.0

Corresponding author: Sören Richard Stahlschmidt. Systems Biology Research Center, University of Skövde, Skövde, Sweden. E-mail: soren.richard.stahlschmidt@his.se

Published: 28 January 2022

This work was supported by the University of Skövde, Sweden under grants from the Knowledge Foundation (20170302, 20200014).

Available from: 2022-01-31 Created: 2022-01-31 Last updated: 2025-05-09Bibliographically approved
In thesis
1. Machine Learning for Predicting Cancer Endpoints from Bulk Omics Data: Generalizing Knowledge from Various Modalities Across Domains
Open this publication in new window or tab >>Machine Learning for Predicting Cancer Endpoints from Bulk Omics Data: Generalizing Knowledge from Various Modalities Across Domains
2025 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Cancer remains one of the leading causes of death and is a major burden on patients and healthcare systems. One difficulty for finding effective treatment and matching patients to the right treatment strategy is the complexity of tumor biology. Machine learning holds the potential to learn patterns from data generated by high-throughput technologies, such as RNA-sequencing, that can elucidate the mechanisms underlying cancers and make clinically relevant predictions. In this thesis, we investigate the modeling of cancer with machine learning approaches from different molecular perspectives. First, we review the literature on the fusion of biomedical modalities with multimodal deep neural networks. In this review, we provide a descriptive overview, propose a novel taxonomy, and identify relevant research gaps. Moreover, for models to be applicable to clinical practice, they must be robust to shifts in the distribution patients are sampled from. Such shifts can stem from differences in the underlying biology or technical variation introduced during the processing of the biological material. Therefore, in two studies, we investigate domain generalization of machine learning models trained with bulk RNA-sequencing data to predict cancer survival endpoints. First, we show that deep learning-based domain generalization methods developed on non-molecular data improve robustness to distributional shifts on molecular data. We test these methods by predicting overall and recurrence free survival of breast cancer patients with subgroup shifts between source and target domains. Next, we show that relative representations of normalized count values, such as binning or ranking of expression values within a single sample, can increase domain generalization. We test these approaches in three experiments on breast, brain, and ovarian cancer. In a final study, we show that cancer stage can be predicted from circulating microRNA data with machine learning models, providing a proof of concept for this application. Overall, the work in this thesis supports making machine learning models more applicable to clinical practice by providing empirical evidence of methods improving the modeling of cancer biology. Continuing to study domain generalization of models in clinical practice and to develop methods for robustness are highlighted as future work.

Place, publisher, year, edition, pages
Skövde: University of Skövde, 2025. p. xi, 147
Series
Dissertation Series ; 63
National Category
Cancer and Oncology Bioinformatics (Computational Biology) Other Computer and Information Science
Research subject
Bioinformatics
Identifiers
urn:nbn:se:his:diva-25131 (URN)978-91-987907-9-5 (ISBN)978-91-989080-0-8 (ISBN)
Public defence
2025-06-04, G110, University of Skövde Building G, Skövde, 13:00 (English)
Opponent
Supervisors
Note

Ett av fyra delarbeten (övriga se rubriken Delarbeten/List of papers):

3. Stahlschmidt, Sören Richard, Synnergren, Jane, and Giovannucci, Andrea (2025). “Relative Representations of RNA-seq Data Improve Domain Generalization of Machine Learning Models for Cancer Prognosis”. In: Under Submission.

Publications with low relevance:

5. Johansson, Markus, Stahlschmidt, Sören Richard, Heydarkhan-Hagvall, Sepideh, Jeppsson, Anders, Holmgren, Gustav, Sartipy, Peter, and Synnergren, Jane (2025). “Uncovering the transcriptomic landscape of cardiac hypertrophy using single-cell RNA sequencing and machine learning”. In: Under Submission.

6. Lyubetskaya, Anna et al. (2025). “In situ multi-modal characterization of pancreatic cancer reveals tumor cell identity as a defining factor of the surrounding microenvironment”. In: Under Submission.

7. Marzec-Schmidt, Katarzyna, Ghosheh, Nidal, Stahlschmidt, Sören Richard, Küppers-Munther, Barbara, Synnergren, Jane, and Ulfenborg, Benjamin (2023). “Artificial Intelligence Supports Automated Characterization of Differentiated Human Pluripotent Stem Cells”. In: Stem Cells 41.9, pp. 850–861. DOI:10. 1093/stmcls/sxad049. 

Available from: 2025-05-12 Created: 2025-05-09 Last updated: 2025-05-21Bibliographically approved

Open Access in DiVA

fulltext(1198 kB)646 downloads
File information
File name FULLTEXT02.pdfFile size 1198 kBChecksum SHA-512
485999c6c50f2482bbcd2de3efd72f8dacb892601ba75b7ff9313d324bd209e34abc6dd7faf519edfe390cf33c942b12ef84355af791934e6b4f1037fb4929e6
Type fulltextMimetype application/pdf

Other links

Publisher's full textPubMedScopus

Authority records

Stahlschmidt, Sören RichardUlfenborg, BenjaminSynnergren, Jane

Search in DiVA

By author/editor
Stahlschmidt, Sören RichardUlfenborg, BenjaminSynnergren, Jane
By organisation
School of BioscienceSystems Biology Research Environment
In the same journal
Briefings in Bioinformatics
Bioinformatics (Computational Biology)

Search outside of DiVA

GoogleGoogle Scholar
Total: 735 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
pubmed
urn-nbn

Altmetric score

doi
pubmed
urn-nbn
Total: 1260 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf