Högskolan i Skövde

his.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Probabilistic Prediction in Scikit-Learn
University of Skövde, School of Informatics. University of Skövde, Informatics Research Environment.ORCID iD: 0000-0001-5378-0862
Dept. of Computing, Jönköping University, Sweden.
2021 (English)Conference paper, Published paper (Refereed)
Abstract [en]

Adding confidence measures to predictive models should increase the trustworthiness, but only if the models are well-calibrated. Historically, some algorithms like logistic regression, but also neural networks, have been considered to produce well-calibrated probability estimates off-the-shelf. Other techniques, like decision trees and Naive Bayes, on the other hand, are infamous for being significantly overconfident in their probabilistic predictions. In this paper, a large experimental study is conducted to investigate how well-calibrated models produced by a number of algorithms in the scikit-learn library are out-of-the-box, but also if either the built-in calibration techniques Platt scaling and isotonic regression, or Venn-Abers, can be used to improve the calibration. The results show that of the seven algorithms evaluated, the only one obtaining well-calibrated models without the external calibration is logistic regression. All other algorithms, i.e., decision trees, adaboost, gradient boosting, kNN, naive Bayes and random forest benefit from using any of the calibration techniques. In particular, decision trees, Naive Bayes and the boosted models are substantially improved using external calibration. From a practitioner’s perspective, the obvious recommendation becomes to incorporate calibration when using probabilistic prediction. Comparing the different calibration techniques, Platt scaling and VennAbers generally outperform isotonic regression, on these rather small datasets. Finally, the unique ability of Venn-Abers to output not only well-calibrated probability estimates, but also the confidence in these estimates is demonstrated.

Place, publisher, year, edition, pages
2021.
National Category
Information Systems
Identifiers
URN: urn:nbn:se:his:diva-23135OAI: oai:DiVA.org:his-23135DiVA, id: diva2:1791506
Conference
The 18th International Conference on Modeling Decisions for Artificial Intelligence (MDAI 2021), September 27-30, 2021 - Umeå, Sweden
Funder
Knowledge Foundation
Note

This research is partly funded by the Swedish Knowledge Foundation through the industrial graduate school INSIDR.

Available from: 2023-08-25 Created: 2023-08-25 Last updated: 2023-10-03Bibliographically approved
In thesis
1. Data-driven decision support in digital retailing
Open this publication in new window or tab >>Data-driven decision support in digital retailing
2023 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

In the digital era and advent of artificial intelligence, digital retailing has emerged as a notable shift in commerce. It empowers e-tailers with data-driven insights and predictive models to navigate a variety of challenges, driving informed decision-making and strategic formulation. While predictive models are fundamental for making data-driven decisions, this thesis spotlights binary classifiers as a central focus. These classifiers reveal the complexities of two real-world problems, marked by their particular properties. Specifically, binary decisions are made based on predictions, relying solely on predicted class labels is insufficient because of the variations in classification accuracy. Furthermore, prediction outcomes have different costs associated with making different mistakes, which impacts the utility.

To confront these challenges, probabilistic predictions, often unexplored or uncalibrated, is a promising alternative to class labels. Therefore, machine learning modelling and calibration techniques are explored, employing benchmark data sets alongside empirical studies grounded in industrial contexts. These studies analyse predictions and their associated probabilities across diverse data segments and settings. The thesis found, as a proof of concept, that specific algorithms inherently possess calibration while others, with calibrated probabilities, demonstrate reliability. In both cases, the thesis concludes that utilising top predictions with the highest probabilities increases the precision level and minimises the false positives. In addition, adopting well-calibrated probabilities is a powerful alternative to mere class labels. Consequently, by transforming probabilities into reliable confidence values through classification with a rejection option, a pathway emerges wherein confident and reliable predictions take centre stage in decision-making. This enables e-tailers to form distinct strategies based on these predictions and optimise their utility.

This thesis highlights the value of calibrated models and probabilistic prediction and emphasises their significance in enhancing decision-making. The findings have practical implications for e-tailers leveraging data-driven decision support. Future research should focus on producing an automated system that prioritises high and well-calibrated probability predictions while discarding others and optimising utilities based on the costs and gains associated with the different prediction outcomes to enhance decision support for e-tailers.

Place, publisher, year, edition, pages
Skövde: University of Skövde, 2023. p. xiii, 108
Series
Dissertation Series ; 53
Keywords
Digital Retailing, Decision Support, Probabilistic Prediction, Calibration, Product Returns, Customer Churn, Binary Classification, Scikit-Learn
National Category
Other Computer and Information Science Computer Sciences Computer Systems Software Engineering Business Administration
Identifiers
urn:nbn:se:his:diva-23279 (URN)978-91-987906-7-2 (ISBN)
Presentation
2023-10-31, G111, Högskolan i Skövde, Skövde, 13:15 (English)
Opponent
Supervisors
Funder
Knowledge Foundation
Note

The current thesis is a part of the industrial graduate school in digital retailing (INSiDR) at the University of Borås and funded by the Swedish Knowledge Foundation.

Available from: 2023-10-03 Created: 2023-10-03 Last updated: 2023-10-03Bibliographically approved

Open Access in DiVA

fulltext(467 kB)49 downloads
File information
File name FULLTEXT01.pdfFile size 467 kBChecksum SHA-512
87c0ff3d6f90ce72e5c7aba0f68cb7dcf2a3367e70cf49b1d8f05a6f5695eddac1e0c3afd3710ab2f0b1a6e9cabb0d3d935dac9120c51b011eaebdf2e87536a1
Type fulltextMimetype application/pdf

Authority records

Sweidan, Dirar

Search in DiVA

By author/editor
Sweidan, Dirar
By organisation
School of InformaticsInformatics Research Environment
Information Systems

Search outside of DiVA

GoogleGoogle Scholar
Total: 55 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 90 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf