Högskolan i Skövde

his.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A comparative study for classification algorithms on imblanced datasets: An investigation into the performance of RF, GBDT and MLP
University of Skövde, School of Informatics.
2020 (English)Independent thesis Basic level (degree of Bachelor), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

In the field of machine learning classification is one of the most common types to be deployed in society, with a wide amount of possible applications. However, a well-known problem in the field is classification is that of imbalanced datasets. Where many algorithms tend to favor the majority class and in some cases completely ignore the minority class. And in many cases the minority class is the most valuable one, leading to underperforming and undeployable implementations.There are many proposed solutions for this problem, they range from different algorithms, modifications of existing algorithms and data manipulation methods. This study tries to contribute to the field by benchmarking three commonly applied algorithms (Random forest, gradient boosted decision trees and multi-layer perceptron), in combination with three different data-manipulation methods (oversampling, undersampling and no data manipulation). This was done through experiments over three differently shaped datasets.The results point towards random forest being the best overall performing algorithm. But when it comes to data with a lot of categorical dimensions the multi-layer perceptron was the top performer. And when it comes to data-manipulation, undersampling was the best approach for all the datasets and algorithms.

Place, publisher, year, edition, pages
2020. , p. 28
National Category
Information Systems, Social aspects
Identifiers
URN: urn:nbn:se:his:diva-18660OAI: oai:DiVA.org:his-18660DiVA, id: diva2:1448074
Subject / course
Informationsteknologi
Educational program
Information Systems
Supervisors
Examiners
Available from: 2020-06-26 Created: 2020-06-26 Last updated: 2020-06-26Bibliographically approved

Open Access in DiVA

fulltext(880 kB)1346 downloads
File information
File name FULLTEXT01.pdfFile size 880 kBChecksum SHA-512
ab800f4d6d81616bc0f96781943530347f9f8f73339f913be46fac6bd6dac760584e28846eecb6c25ec62d29276280a8560eff4b5d6b06ba2d9e9b7df9e95279
Type fulltextMimetype application/pdf

By organisation
School of Informatics
Information Systems, Social aspects

Search outside of DiVA

GoogleGoogle Scholar
Total: 1346 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 409 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf