his.sePublikationer
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Data Mining with Decision Trees in the Gene Logic Database: A Breast Cancer Study
Högskolan i Skövde, Institutionen för datavetenskap.
2002 (Engelska)Självständigt arbete på avancerad nivå (magisterexamen)Studentuppsats
Abstract [en]

Data mining approaches have been increasingly used in recent years in order to find patterns and regularities in large databases. In this study, the C4.5 decision tree approach was used for mining of Gene Logic database, containing biological data. The decision tree approach was used in order to identify the most relevant genes and risk factors involved in breast cancer, in order to separate healthy patients from breast cancer patients in the data sets used. Four different tests were performed for this purpose. Cross validation was performed, for each of the four tests, in order to evaluate the capacity of the decision tree approaches in correctly classifying ‘new’ samples. In the first test, the expression of 108 breast related genes, shown in appendix A, for 75 patients were used as input to the C4.5 algorithm. This test resulted in a decision tree containing only four genes considered to be the most relevant in order to correctly classify patients. Cross validation indicates an average accuracy of 89% in classifying ‘new’ samples. In the second test, risk factor data was used as input. The cross validation result shows an average accuracy of 87% in classifying ‘new’ samples. In the third test, both gene expression data and risk factor data were put together as one input. The cross validation procedure for this approach again indicates an average accuracy of 87% in classifying ‘new’ samples. In the final test, the C4.5 algorithm was used in order to indicate possible signalling pathways involving the four genes identified by the decision tree based on only gene expression data. In some of cases, the C4.5 algorithm found trees suggesting pathways which are supported by the breast cancer literature. Since not all pathways involving the four putative breast cancer genes are known yet, the other suggested pathways should be further analyzed in order to increase their credibility.

In summary, this study demonstrates the application of decision tree approaches for the identification of genes and risk factors relevant for the classification of breast cancer patients

Ort, förlag, år, upplaga, sidor
Skövde: Institutionen för datavetenskap , 2002. , s. 87
Nyckelord [en]
Data mining, Decision trees, C4.5, Breast cancer
Nationell ämneskategori
Bioinformatik (beräkningsbiologi)
Identifikatorer
URN: urn:nbn:se:his:diva-710OAI: oai:DiVA.org:his-710DiVA, id: diva2:3111
Presentation
(Engelska)
Uppsök
fysik/kemi/matematik
Handledare
Tillgänglig från: 2008-02-04 Skapad: 2008-02-04 Senast uppdaterad: 2018-01-12

Open Access i DiVA

fulltext(1991 kB)447 nedladdningar
Filinformation
Filnamn FULLTEXT01.psFilstorlek 1991 kBChecksumma SHA-1
54f9968595a1753f803c9e02135b02e6fb696bbef3ce60918476fdd03d7b1cdbfdaa4897
Typ fulltextMimetyp application/postscript
fulltext(503 kB)1691 nedladdningar
Filinformation
Filnamn FULLTEXT02.pdfFilstorlek 503 kBChecksumma SHA-512
ca468313d8c3fc6e38a49e2febf27b1707f7f5dc3fe637dd3d85503781a5d1071a0563eaace324fd5bb9c9ee5148d9f2601fb7ef7e7245cd1f205e4cd060c90d
Typ fulltextMimetyp application/pdf

Av organisationen
Institutionen för datavetenskap
Bioinformatik (beräkningsbiologi)

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 2138 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

urn-nbn

Altmetricpoäng

urn-nbn
Totalt: 558 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf