Accuracy on a Hold-out Set: The Red Herring of Data Mining
2006 (English)In: Proceedings of SAIS 2006: The 23rd Annunual Workshop of the Swedish Artificial Intelligence Society / [ed] Michael Minock; Patrik Eklund; Helena Lindgren, Umeå: Swedish Artificial Intelligence Society - SAIS, Umeå University , 2006, p. 137-146Conference paper, Published paper (Refereed)
Abstract [en]
Abstract: When performing predictive modeling, the overall goal is to generate models likely to have high accuracy when applied to novel data. A technique commonly used to maximize generalization accuracy is to create ensembles of models, e.g., averaging the output from a number of individual models. Several, more or less sophisticated techniques, aimed at either directly creating ensembles or selecting ensemble members from a pool of available models, have been suggested. Many techniques utilize a part of the available data not used for the training of the models (a hold-out set) to rank and select either ensembles or ensemble members based on accuracy on that set. The obvious underlying assumption is that increased accuracy on the hold-out set is a good indicator of increased generalization capability on novel data. Or, put in another way, that there is high correlation between accuracy on the hold-out set and accuracy on yet novel data. The experiments in this study, however, show that this is generally not the case; i.e. there is little to gain from selecting ensembles using hold-out set accuracy. The experiments also show that this low correlation holds for individual neural networks as well; making the entire use of hold-out sets to compare predictive models questionable
Place, publisher, year, edition, pages
Umeå: Swedish Artificial Intelligence Society - SAIS, Umeå University , 2006. p. 137-146
Series
Report / UMINF - Umeå University, Department of Computing Science, ISSN 0348-0542 ; 06.19
National Category
Information Systems Computer Sciences
Identifiers
URN: urn:nbn:se:his:diva-2020OAI: oai:DiVA.org:his-2020DiVA, id: diva2:32296
Conference
The 23rd Annual Workshop of the Swedish Artificial Intelligence Society Workshop, SAIS 2006, Umeå, Sweden, May 10-12
2007-03-222007-03-222021-06-28Bibliographically approved