Högskolan i Skövde

his.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Data-driven decision support in digital retailing
University of Skövde, School of Informatics. University of Skövde, Informatics Research Environment. Högskolan i Borås.ORCID iD: 0000-0001-5378-0862
2023 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

In the digital era and advent of artificial intelligence, digital retailing has emerged as a notable shift in commerce. It empowers e-tailers with data-driven insights and predictive models to navigate a variety of challenges, driving informed decision-making and strategic formulation. While predictive models are fundamental for making data-driven decisions, this thesis spotlights binary classifiers as a central focus. These classifiers reveal the complexities of two real-world problems, marked by their particular properties. Specifically, binary decisions are made based on predictions, relying solely on predicted class labels is insufficient because of the variations in classification accuracy. Furthermore, prediction outcomes have different costs associated with making different mistakes, which impacts the utility.

To confront these challenges, probabilistic predictions, often unexplored or uncalibrated, is a promising alternative to class labels. Therefore, machine learning modelling and calibration techniques are explored, employing benchmark data sets alongside empirical studies grounded in industrial contexts. These studies analyse predictions and their associated probabilities across diverse data segments and settings. The thesis found, as a proof of concept, that specific algorithms inherently possess calibration while others, with calibrated probabilities, demonstrate reliability. In both cases, the thesis concludes that utilising top predictions with the highest probabilities increases the precision level and minimises the false positives. In addition, adopting well-calibrated probabilities is a powerful alternative to mere class labels. Consequently, by transforming probabilities into reliable confidence values through classification with a rejection option, a pathway emerges wherein confident and reliable predictions take centre stage in decision-making. This enables e-tailers to form distinct strategies based on these predictions and optimise their utility.

This thesis highlights the value of calibrated models and probabilistic prediction and emphasises their significance in enhancing decision-making. The findings have practical implications for e-tailers leveraging data-driven decision support. Future research should focus on producing an automated system that prioritises high and well-calibrated probability predictions while discarding others and optimising utilities based on the costs and gains associated with the different prediction outcomes to enhance decision support for e-tailers.

Place, publisher, year, edition, pages
Skövde: University of Skövde , 2023. , p. xiii, 108
Series
Dissertation Series ; 53
Keywords [en]
Digital Retailing, Decision Support, Probabilistic Prediction, Calibration, Product Returns, Customer Churn, Binary Classification, Scikit-Learn
National Category
Other Computer and Information Science Computer Sciences Computer Systems Software Engineering Business Administration
Identifiers
URN: urn:nbn:se:his:diva-23279ISBN: 978-91-987906-7-2 (print)OAI: oai:DiVA.org:his-23279DiVA, id: diva2:1802036
Presentation
2023-10-31, G111, Högskolan i Skövde, Skövde, 13:15 (English)
Opponent
Supervisors
Funder
Knowledge Foundation
Note

The current thesis is a part of the industrial graduate school in digital retailing (INSiDR) at the University of Borås and funded by the Swedish Knowledge Foundation.

Available from: 2023-10-03 Created: 2023-10-03 Last updated: 2023-10-03Bibliographically approved
List of papers
1. Probabilistic Prediction in Scikit-Learn
Open this publication in new window or tab >>Probabilistic Prediction in Scikit-Learn
2021 (English)Conference paper, Published paper (Refereed)
Abstract [en]

Adding confidence measures to predictive models should increase the trustworthiness, but only if the models are well-calibrated. Historically, some algorithms like logistic regression, but also neural networks, have been considered to produce well-calibrated probability estimates off-the-shelf. Other techniques, like decision trees and Naive Bayes, on the other hand, are infamous for being significantly overconfident in their probabilistic predictions. In this paper, a large experimental study is conducted to investigate how well-calibrated models produced by a number of algorithms in the scikit-learn library are out-of-the-box, but also if either the built-in calibration techniques Platt scaling and isotonic regression, or Venn-Abers, can be used to improve the calibration. The results show that of the seven algorithms evaluated, the only one obtaining well-calibrated models without the external calibration is logistic regression. All other algorithms, i.e., decision trees, adaboost, gradient boosting, kNN, naive Bayes and random forest benefit from using any of the calibration techniques. In particular, decision trees, Naive Bayes and the boosted models are substantially improved using external calibration. From a practitioner’s perspective, the obvious recommendation becomes to incorporate calibration when using probabilistic prediction. Comparing the different calibration techniques, Platt scaling and VennAbers generally outperform isotonic regression, on these rather small datasets. Finally, the unique ability of Venn-Abers to output not only well-calibrated probability estimates, but also the confidence in these estimates is demonstrated.

National Category
Information Systems
Identifiers
urn:nbn:se:his:diva-23135 (URN)
Conference
The 18th International Conference on Modeling Decisions for Artificial Intelligence (MDAI 2021), September 27-30, 2021 - Umeå, Sweden
Funder
Knowledge Foundation
Note

This research is partly funded by the Swedish Knowledge Foundation through the industrial graduate school INSIDR.

Available from: 2023-08-25 Created: 2023-08-25 Last updated: 2023-10-03Bibliographically approved
2. Predicting Customer Churn in Retailing
Open this publication in new window or tab >>Predicting Customer Churn in Retailing
2022 (English)In: Proceedings 21st IEEE International Conference on Machine Learning and Applications ICMLA 2022: 12–14 December 2022 Nassau, The Bahamas / [ed] M. Arif Wani; Mehmed Kantardzic; Vasile Palade; Daniel Neagu; Longzhi Yang; Kit-Yan Chan, IEEE, 2022, p. 635-640Conference paper, Published paper (Refereed)
Abstract [en]

Customer churn is one of the most challenging problems for digital retailers. With significantly higher costs for acquiring new customers than retaining existing ones, knowledge about which customers are likely to churn becomes essential. This paper reports a case study where a data-driven approach to churn prediction is used for predicting churners and gaining insights about the problem domain. The real-world data set used contains approximately 200 000 customers, describing each customer using more than 50 features. In the pre-processing, exploration, modeling and analysis, attributes related to recency, frequency, and monetary concepts are identified and utilized. In addition, correlations and feature importance are used to discover and understand churn indicators. One important finding is that the churn rate highly depends on the number of previous purchases. In the segment consisting of customers with only one previous purchase, more than 75% will churn, i.e., not make another purchase in the coming year. For customers with at least four previous purchases, the corresponding churn rate is around 25%. Further analysis shows that churning customers in general, and as expected, make smaller purchases and visit the online store less often. In the experimentation, three modeling techniques are evaluated, and the results show that, in particular, Gradient Boosting models can predict churners with relatively high accuracy while obtaining a good balance between precision and recall. 

Place, publisher, year, edition, pages
IEEE, 2022
Keywords
Sales, Case-studies, Churn rates, Correlation, Customer churn prediction, Customer churns, Digital retailing, Feature importance, High costs, RFM analysis, Top probability, Forecasting, correlations, top probabilities
National Category
Software Engineering Business Administration Computer Sciences
Research subject
Interaction Lab (ILAB)
Identifiers
urn:nbn:se:his:diva-22430 (URN)10.1109/ICMLA55696.2022.00105 (DOI)000980994900094 ()2-s2.0-85152214345 (Scopus ID)978-1-6654-6283-9 (ISBN)978-1-6654-6284-6 (ISBN)
Conference
2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA), 12-14 December 2022, Nassau, Bahamas
Funder
Knowledge Foundation, 20160035, 20170215
Note

© 2022 IEEE.

The current research is a part of the Industrial Graduate School in Digital Retailing (INSiDR) at the University of Borås, funded by the Swedish Knowledge Foundation, grants nr. 20160035, 20170215.

Available from: 2023-04-20 Created: 2023-04-20 Last updated: 2023-10-03Bibliographically approved
3. Predicting returns in men's fashion
Open this publication in new window or tab >>Predicting returns in men's fashion
2020 (English)In: Developments of Artificial Intelligence Technologies in Computation and Robotics: Proceedings of the 14th International FLINS Conference (FLINS 2020) / [ed] Li Zhong; Chunrong Yuan; Jie Lu; Etienne E. Kerre, World Scientific, 2020, p. 1506-1513Conference paper, Published paper (Refereed)
Abstract [en]

While consumers value a free and easy return process, the costs to e-tailers associated with returns are substantial and increasing. Consequently, merchants are now tempted to implement stricter policies, but must balance this against the risk of losing valuable customers. With this in mind, data-driven and algorithmic approaches have been introduced to predict if a certain order is likely to result in a return. In this application paper, a novel approach, combining information about the customer and the order, is suggested and evaluated on a real-world data set from a Swedish e-tailer in men's fashion. The results show that while the predictive accuracy is rather low, a system utilizing the suggested approach could still be useful. Specifically, it is reasonable to assume that an e-tailer would only act on predicted returns where the confidence is very high, e.g., the top 1-5%. For such predictions, the obtained precision is 0.918-0.969, with an acceptable detection rate.

Place, publisher, year, edition, pages
World Scientific, 2020
Series
World Scientific Proceedings Series on Computer Engineering and Information Science, ISSN 1793-7868 ; 12
Keywords
Return prediction, Predictive modeling, Random Forests
National Category
Business Administration
Identifiers
urn:nbn:se:his:diva-19974 (URN)10.1142/9789811223334_0180 (DOI)000656123200180 ()978-981-122-333-4 (ISBN)978-981-122-334-1 (ISBN)
Conference
15th Symposium of Intelligent Systems and Knowledge Engineering (ISKE) held jointly with 14th International FLINS Conference (FLINS 2020), Cologne, Germany, 18 – 21 August 2020
Available from: 2021-06-24 Created: 2021-06-24 Last updated: 2023-10-03Bibliographically approved
4. Improved Decision Support for Product Returns using Probabilistic Prediction
Open this publication in new window or tab >>Improved Decision Support for Product Returns using Probabilistic Prediction
2023 (English)In: Proceedings 2023 Congress in Computer Science, Computer Engineering, & Applied Computing, CSCE 2023: Las Vegas, USA24-27 July 2023, IEEE, 2023, p. 1567-1573Conference paper, Published paper (Refereed)
Abstract [en]

Product returns are not only costly for e-tailers, but the unnecessary transports also impact the environment. Consequently, online retailers have started to formulate policies to reduce the number of returns. Determining when and how to act is, however, a delicate matter, since a too harsh approach may lead to not only the order being cancelled, but also the customer leaving the business. Being able to accurately predict which orders that will lead to a return would be a strong tool, guiding which actions to be taken. This paper addresses the problem of data-driven product return prediction, by conducting a case study using a large real-world data set. The main results are that well-calibrated probabilistic predictors are essential for providing predictions with high precision and reasonable recall. This implies that utilizing calibrated models to predict some instances, while rejecting to predict others can be recommended. In practice, this would make it possible for a decision-maker to only act upon a subset of all predicted returns, where the risk of a return is very high.

Place, publisher, year, edition, pages
IEEE, 2023
Keywords
Product Returns, Decision Support, Probabilistic Predictions, Calibration, Predict with Reject Option.
National Category
Computer and Information Sciences Information Systems Probability Theory and Statistics Business Administration
Research subject
INF301 Data Science; Interaction Lab (ILAB)
Identifiers
urn:nbn:se:his:diva-23269 (URN)10.1109/CSCE60160.2023.00258 (DOI)2-s2.0-85191148521 (Scopus ID)979-8-3503-2760-1 (ISBN)979-8-3503-2759-5 (ISBN)979-8-3503-2758-8 (ISBN)
Conference
The 19th International Conference on Data Science (ICDATA’23), July 24-27, 2023 - Las Vegas, Nevada, USA
Projects
INSiDR
Funder
Knowledge Foundation, 20160035Knowledge Foundation, 20170215
Note

©2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

This research is a part of the industrial graduate research school in digital retailing (INSiDR) at the University of Borås, funded by The Swedish Knowledge Foundation, grants nr. 20160035, 20170215.

Available from: 2023-09-29 Created: 2023-09-29 Last updated: 2024-07-05Bibliographically approved

Open Access in DiVA

fulltext(10981 kB)788 downloads
File information
File name FULLTEXT01.pdfFile size 10981 kBChecksum SHA-512
012a9fab8f91ce39e550936eb9f8649c93b200c79c8d26312abfe04d10ee05f5958a2f8c088c11b1470addb4501160d7cab25ac6218bcc3b9102915789213542
Type fulltextMimetype application/pdf

Authority records

Sweidan, Dirar

Search in DiVA

By author/editor
Sweidan, Dirar
By organisation
School of InformaticsInformatics Research Environment
Other Computer and Information ScienceComputer SciencesComputer SystemsSoftware EngineeringBusiness Administration

Search outside of DiVA

GoogleGoogle Scholar
Total: 796 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 959 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf