Högskolan i Skövde

his.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Comparative analysis of autoencoder and PCA for dimensionality reduction in gene expression data
University of Skövde, School of Bioscience.
2024 (English)Independent thesis Basic level (degree of Bachelor), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

The study of Gene Expression (GE) data is important for understanding genetic transcription, revealing disease mechanisms, improving diagnostics, and guiding targeted therapies. However, the high dimensionality of GE data presents challenges. Dimensionality Reduction (DR) techniques address this by reducing computational complexity and simplifying data processing. This study aimed to develop an Autoencoder (AE) model for GE data DR and compare its feature weighting with that of Principal Component Analysis (PCA). A command-line interface (CLI) tool for processing high-dimensional data was also created. This study used a gene ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways analysis to compare the biological functions of genes identified by PCA and AE. A hypergeometric test evaluated the overlap between features selected by PCA and AE, and network analysis provided a comprehensive comparison, identifying hub genes. This study also assessed the predictive potential of PCA and AE-reduced datasets, determining which method better preserved the original data's information. The findings highlighted that PCA and AE select and prioritise features differently, each capturing unique aspects of the data. Despite these differences, both methods consistently identified similar biological processes and functions, as evidenced by GO terms and KEGG pathways analysis. Interestingly, PCA and AE reduced data retained almost identical amounts of information from the original data. The developed tool, named AutoGeneReducer, can reduce high-dimensional data in GE analysis. Future work will explore additional DR techniques such as t-SNE, UMAP, and Variational Autoencoders (VAEs) to enhance understanding of complex dataset structures and further advance the AutoGeneReducer tool.  

Place, publisher, year, edition, pages
2024. , p. v, 49
National Category
Bioinformatics and Computational Biology
Identifiers
URN: urn:nbn:se:his:diva-24281OAI: oai:DiVA.org:his-24281DiVA, id: diva2:1883117
External cooperation
Örebro University, School of Medical Sciences
Subject / course
Bioinformatics
Educational program
Molekylär bioinformatik
Supervisors
Examiners
Available from: 2024-07-09 Created: 2024-07-09 Last updated: 2025-02-07Bibliographically approved

Open Access in DiVA

fulltext(2794 kB)126 downloads
File information
File name FULLTEXT02.pdfFile size 2794 kBChecksum SHA-512
c773fd734685eb0d9c58009f30b4ea7566033f2a0a6b0be6f0a77739fb5b95ec5bf040c2eabe858bceff1eff8adf3ec79329d6bf2dbf7028831c95af3eb8f6b2
Type fulltextMimetype application/pdf

By organisation
School of Bioscience
Bioinformatics and Computational Biology

Search outside of DiVA

GoogleGoogle Scholar
Total: 251 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 856 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf