his.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Accuracy on a Hold-out Set: The Red Herring of Data Mining
University of Skövde, School of Humanities and Informatics. University of Skövde, The Informatics Research Centre.
2006 (English)In: 23rd Annual Workshop of the Swedish Artificial Intelligence Society, Swedish Artificial Intelligence Society - SAIS, Umeå universitet , 2006Conference paper, (Other academic)
Abstract [en]

Abstract: When performing predictive modeling, the overall goal is to generate models likely to have high accuracy when applied to novel data. A technique commonly used to maximize generalization accuracy is to create ensembles of models, e.g., averaging the output from a number of individual models. Several, more or less sophisticated techniques, aimed at either directly creating ensembles or selecting ensemble members from a pool of available models, have been suggested. Many techniques utilize a part of the available data not used for the training of the models (a hold-out set) to rank and select either ensembles or ensemble members based on accuracy on that set. The obvious underlying assumption is that increased accuracy on the hold-out set is a good indicator of increased generalization capability on novel data. Or, put in another way, that there is high correlation between accuracy on the hold-out set and accuracy on yet novel data. The experiments in this study, however, show that this is generally not the case; i.e. there is little to gain from selecting ensembles using hold-out set accuracy. The experiments also show that this low correlation holds for individual neural networks as well; making the entire use of hold-out sets to compare predictive models questionable

Place, publisher, year, edition, pages
Swedish Artificial Intelligence Society - SAIS, Umeå universitet , 2006.
Series
UMINF, ISSN 0348-0542
Identifiers
URN: urn:nbn:se:his:diva-2020OAI: oai:DiVA.org:his-2020DiVA: diva2:32296
Conference
The 23rd Annual Workshop of the Swedish Artificial Intelligence Society Workshop, Umeå, Sweden, May 10-12
Available from: 2007-03-22 Created: 2007-03-22 Last updated: 2013-03-20

Open Access in DiVA

No full text

Search in DiVA

By author/editor
Niklasson, Lars
By organisation
School of Humanities and InformaticsThe Informatics Research Centre

Search outside of DiVA

GoogleGoogle Scholar

Total: 494 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf