Högskolan i Skövde

his.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Influence of Preprocessing Steps for Molecular Data on Deep Neural Network Performance
University of Skövde, School of Bioscience.
2023 (English)Independent thesis Advanced level (degree of Master (Two Years)), 30 credits / 45 HE creditsStudent thesis
Abstract [en]

The massive accumulation of omics data requires effective computational tools to analyze and interpret such data. Deep learning (DL), a branch of machine learning (ML) and artificial intelligence (AI), has shed light on these challengings and achieved great success in bioinformatics. However, the influence of preprocessing steps on DL model’s performance remains a critical aspect that requires thorough investigation. This study aims to investigates the effects of different combinations of preprocessing techniques and feature selection methods on the predictive performance of deep neural networks (DNN) on supervised tasks. For this purpose, four normalization methods, one transformation method, and two feature scaling methods were applied, in addition to two feature selection methods. This comprehensive analysis resulted in a total of 28 unique combinations, each representing a unique classifier. The experimental analysis was conducted using gene expression profiles from multiple cancer datasets. The result highlights the significance of preprocessing step in achieving optimal DNN performance, with notable variations observed across different datasets and preprocessing techniques. We identify a specific preprocessing workflow that improve DNN performance, and certain preprocessing choices that may lead to suboptimal model performance. In addition, we identify potential pitfalls and challenges associated with the data structure and class imbalance. This study contributes to the understanding of the effect of pre-processing steps and provides insights into which pre-processing steps work best and hence, improve the overall performance of DNN model and enables the development of more robust and accurate models.

Place, publisher, year, edition, pages
2023. , p. 24
National Category
Bioinformatics and Computational Biology
Identifiers
URN: urn:nbn:se:his:diva-23319OAI: oai:DiVA.org:his-23319DiVA, id: diva2:1806241
Subject / course
Systems Biology
Educational program
Systems Biology with specialization in Bioinformatics - Master's Programme
Supervisors
Examiners
Available from: 2023-10-20 Created: 2023-10-20 Last updated: 2025-09-29Bibliographically approved

Open Access in DiVA

fulltext(1043 kB)366 downloads
File information
File name FULLTEXT01.pdfFile size 1043 kBChecksum SHA-512
e66054518650fea30f9db48fa53cd2e605348a337ee273fe0327c75bf3624fdc1a040afaa2dca83f0e35b59f46b8577c7e3713161f13c03b57c4abdfe641f5e6
Type fulltextMimetype application/pdf

By organisation
School of Bioscience
Bioinformatics and Computational Biology

Search outside of DiVA

GoogleGoogle Scholar
Total: 367 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 864 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf