Högskolan i Skövde

his.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
High dimensional data clustering; A comparative study on gene expressions: Experiment on clustering algorithms on RNA-sequence from tumors with evaluation on internal validation
University of Skövde, School of Informatics.
2019 (English)Independent thesis Advanced level (degree of Master (One Year)), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

In cancer research, class discovery is the first process for investigating a new dataset for which hidden groups there are by similar attributes. However datasets from gene expressions, RNA microarray or RNA-sequence, are high-dimensional. Which makes it hard to perform clusteranalysis and to get clusters that are well separated. Well separated clusters are wanted because that tells that objects are most likely not placed in wrong clusters. This report investigate in an experiment whether using K-Means and hierarchical are suitable for clustering gene expressions in RNA-sequence data from various tumors. Dimensionality reduction methods are also applied to see whether that helps create well-separated clusters. The results tell that well separated clusters are only achieved by using PCA as dimensionality reduction and K-Means on correlation. The main contribution of this paper is determining that using K-Means or hierarchical clustering on the full natural dimensionality of RNA-sequence data returns unwanted silhouette average width, under 0,4.

Place, publisher, year, edition, pages
2019. , p. 28
Keywords [en]
Cluster analysis, cluster validation, RNA-sequence, tumors, high-dimensional data, dimensionality reduction
National Category
Information Systems
Identifiers
URN: urn:nbn:se:his:diva-17492OAI: oai:DiVA.org:his-17492DiVA, id: diva2:1340291
Subject / course
Computer Science
Educational program
Data Science - Master’s Programme
Supervisors
Examiners
Available from: 2020-11-20 Created: 2019-08-04 Last updated: 2020-11-20Bibliographically approved

Open Access in DiVA

fulltext(508 kB)1428 downloads
File information
File name FULLTEXT01.pdfFile size 508 kBChecksum SHA-512
9db8703aa117a326dda5dec668871a2b67505041a44e2dd9eb5d5b06d701fdfe7a09e19728e96e3289eefd739c3501c3afa4c8e0c15cd23205009e8711b61c92
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Henriksson, William
By organisation
School of Informatics
Information Systems

Search outside of DiVA

GoogleGoogle Scholar
Total: 1428 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 220 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf