Högskolan i Skövde

his.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
In silico modeling for uncertain biochemical data
University of Skövde, School of Life Sciences.
2009 (English)Independent thesis Advanced level (degree of Master (One Year)), 15 credits / 22,5 HE creditsStudent thesis
Abstract [en]

Analyzing and modeling data is a well established research area and a vast variety of different methods have been developed over the last decades. Most of these methods assume fixed positions of data points; only recently uncertainty in data has caught attention as potentially useful source of information. In order to provide a deeper insight into this subject, this thesis concerns itself with the following essential question: Can information on uncertainty of feature values be exploited to improve in silico modeling? For this reason a state-of-art random forest algorithm is developed using Matlab R. In addition, three techniques of handling uncertain numeric features are presented and incorporated in different modified versions of random forests. To test the hypothesis six realworld data sets were provided by AstraZeneca. The data describe biochemical features of chemical compounds, including the results of an Ames test; a widely used technique to determine the mutagenicity of chemical substances. Each of the datasets contains a single uncertain numeric feature, represented as an expected value and an error estimate. Themodified algorithms are then applied on the six data sets in order to obtain classifiers, able to predict the outcome of an Ames test. The hypothesis is tested using a paired t-test and the results reveal that information on uncertainty can indeed improve the performance of in silico models.

Place, publisher, year, edition, pages
2009. , p. 47
Keywords [en]
Uncertain Data, Random Forest, Ames Test
National Category
Bioinformatics and Systems Biology
Identifiers
URN: urn:nbn:se:his:diva-3099OAI: oai:DiVA.org:his-3099DiVA, id: diva2:223115
Presentation
(English)
Uppsok
bio-/geovetenskap
Supervisors
Examiners
Available from: 2009-07-01 Created: 2009-06-10 Last updated: 2009-07-01Bibliographically approved

Open Access in DiVA

fulltext(964 kB)421 downloads
File information
File name FULLTEXT02.pdfFile size 964 kBChecksum SHA-512
8f67bbf4618d4f01af55a9528b5d396f0b8588cc9cf1f29816cb593304f246f8d11c6cfa0ee374861a50b7da693e26fced1f7c9f47ea06012f5d3a2ef766fa41
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Gusenleitner, Daniel
By organisation
School of Life Sciences
Bioinformatics and Systems Biology

Search outside of DiVA

GoogleGoogle Scholar
Total: 421 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 436 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf