Högskolan i Skövde

his.sePublications
2122232425262724 of 27
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Imbalanced data oversampling through subspace optimization with Bayesian reinforcement
University of Skövde, School of Engineering Science. University of Skövde, Virtual Engineering Research Environment. (Virtual Production Development (VPD))ORCID iD: 0000-0003-4647-9363
University of Skövde, School of Engineering Science. University of Skövde, Virtual Engineering Research Environment. (Virtual Production Development (VPD))ORCID iD: 0000-0001-5436-2128
University of Skövde, School of Informatics. University of Skövde, Informatics Research Environment. (Skövde Artificial Intelligence Lab (SAIL))ORCID iD: 0000-0003-2973-3112
2026 (English)In: Artificial Intelligence Review, ISSN 0269-2821, E-ISSN 1573-7462, Vol. 59, no 1, article id 1Article in journal (Refereed) Published
Abstract [en]

Many real-world machine learning classification problems suffer from imbalanced training data, where the least frequent label has high relevance and significance for the end user, such as equipment breakdowns or various types of process anomalies. This imbalance can negatively impact the learning algorithm and lead to misclassification of minority labels, resulting in erroneous actions and potentially high unexpected costs. Most previous oversampling methods rely only on the minority samples, often ignoring their overall density and distribution in relation to the other classes. In addition, most of them lack in the oversampling method’s explainability. In contrast, this paper proposes a novel oversampling method that considers a subspace of the feature-set for the creation of synthetic minority samples using nonlinear optimization of a class-sensitive objective function. Suitable subspaces for oversampling are identified through a Bayesian reinforcement strategy based on Dirichlet smoothing, which may be useful for explainable-AI. An empirical comparison of the proposed method is performed with 10 existing techniques on 18 real-world datasets using two traditional machine learning classifiers and four evaluation metrics. Statistical analysis of cross-validated runs over the 18 datasets and four metrics (i.e. 72 experiments) reveals that the proposed approach is among the best performing methods in 6 and 2 instances when using random forest classifier and support vector machine classifier, thus placing it at the top. The study also reveals that some feature combinations are more important than others for minority oversampling, and the proposed approach offers a way to identify such features.

Place, publisher, year, edition, pages
Springer Nature, 2026. Vol. 59, no 1, article id 1
Keywords [en]
Imbalanced data, Oversampling, Nonlinear optimization, Dirichlet distribution, Bayesian reinforcement, Density-based, Features subspace, Feature importance, Explainable-AI
National Category
Computer Sciences Computer Systems
Research subject
Virtual Production Development (VPD); Skövde Artificial Intelligence Lab (SAIL)
Identifiers
URN: urn:nbn:se:his:diva-25994DOI: 10.1007/s10462-025-11417-1ISI: 001610765900001Scopus ID: 2-s2.0-105021344491OAI: oai:DiVA.org:his-25994DiVA, id: diva2:2012898
Projects
TOPAZ - Towards Prescriptive Analytics in Virtual Factories through Structured Data Mining and OptimizationIntegrated Manufacturing Analytics Platform for Predictive Maintenance with IoT
Funder
University of SkövdeKnowledge Foundation, 20200011Vinnova, 2021-02537
Note

CC BY 4.0

Published online: 10 November 2025

Mahesh Kumbhar, mahesh.kumbar@his.se

The authors acknowledge the financial support received from KK-stiftelsen (The Knowledge Foundation, Stockholm, Sweden) and VINNOVA (Sweden Innovation Agency, Stockholm, Sweden) for the research projects ‘TOPAZ - Towards Prescriptive Analytics in Virtual Factories through Structured Data Mining and Optimization’ under grant 20200011 and ‘Integrated Manufacturing Analytics Platform for Predictive Maintenance with IoT’ under grant 2021-02537.

 Open access funding provided by University of Skövde.

Available from: 2025-11-11 Created: 2025-11-11 Last updated: 2025-11-20Bibliographically approved

Open Access in DiVA

fulltext(4514 kB)31 downloads
File information
File name FULLTEXT01.pdfFile size 4514 kBChecksum SHA-512
887077f1fc0fd56f56806d811d3cf9700bef1c43fc3c7fe2715ccacb8d1d8c14a57e4846069d712044544260f89013b895356880858115d3a2dee54ba8b484fc
Type fulltextMimetype application/pdf

Other links

Publisher's full textScopus

Authority records

Kumbhar, MaheshBandaru, SunithKarlsson, Alexander

Search in DiVA

By author/editor
Kumbhar, MaheshBandaru, SunithKarlsson, Alexander
By organisation
School of Engineering ScienceVirtual Engineering Research EnvironmentSchool of InformaticsInformatics Research Environment
In the same journal
Artificial Intelligence Review
Computer SciencesComputer Systems

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 283 hits
2122232425262724 of 27
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf