Högskolan i Skövde

his.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Utilizing Information on Uncertainty for In Silico Modeling using Random Forests
University of Skövde, School of Humanities and Informatics. University of Skövde, The Informatics Research Centre. Stockholm University, Sweden. (Skövde Artificial Intelligence Lab (SAIL))ORCID iD: 0000-0001-8382-0300
AstraZeneca R&D, Södertälje, Sweden.
2009 (English)In: Proceedings of the 3rd Skövde Workshop on Information Fusion Topics (SWIFT 2009), Skövde: University of Skövde , 2009, p. 59-62Conference paper, Published paper (Refereed)
Abstract [en]

Information on uncertainty of measurements or estimates of molecular properties are rarely utilized by in silico predictive models. In this study, different approaches to handling uncertain numerical features are explored when using the stateof- the-art random forest algorithm for generating predictive models. Two main approaches are considered: i) sampling from probability distributions prior to tree generation, which does not require any change to the underlying tree learning algorithm, and ii) adjusting the algorithm to allow for handling probability distributions, similar to how missing values typically are handled, i.e., partitions may include fractions of examples. An experiment with six datasets concerning the prediction of various chemical properties is presented, where 95% confidence intervals are included for one of the 92 numerical features. In total, five approaches to handling uncertain numeric features are compared: ignoring the uncertainty, sampling from distributions that are assumed to be uniform and normal respectively, and adjusting tree learning to handle probability distributions that are assumed to be uniform and normal respectively. The experimental results show that all approaches that utilize information on uncertainty indeed outperform the single approach ignoring this, both with respect to accuracy and area under ROC curve. A decomposition of the squared error of the constituent classification trees shows that the highest variance is obtained by ignoring the information on uncertainty, but that this also results in the highest mean squared error of the constituent trees.

Place, publisher, year, edition, pages
Skövde: University of Skövde , 2009. p. 59-62
Series
SUSI, ISSN 1653-2325 ; 2009:3
National Category
Computer and Information Sciences
Research subject
Technology; Skövde Artificial Intelligence Lab (SAIL)
Identifiers
URN: urn:nbn:se:his:diva-3542ISBN: 978-91-978513-2-9 (electronic)OAI: oai:DiVA.org:his-3542DiVA, id: diva2:284576
Conference
The 3rd Annual Skövde Workshop on Information Fusion Topics (SWIFT 2009), 12-13 Oct 2009, Skövde, Sweden
Note

[CD-ROM]

Available from: 2010-01-07 Created: 2010-01-07 Last updated: 2020-11-30Bibliographically approved

Open Access in DiVA

No full text in DiVA

Authority records

Boström, Henrik

Search in DiVA

By author/editor
Boström, Henrik
By organisation
School of Humanities and InformaticsThe Informatics Research Centre
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 536 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf