Högskolan i Skövde

his.sePublikasjoner
Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
A comparative study for classification algorithms on imblanced datasets: An investigation into the performance of RF, GBDT and MLP
Högskolan i Skövde, Institutionen för informationsteknologi.
2020 (engelsk)Independent thesis Basic level (degree of Bachelor), 20 poäng / 30 hpOppgave
Abstract [en]

In the field of machine learning classification is one of the most common types to be deployed in society, with a wide amount of possible applications. However, a well-known problem in the field is classification is that of imbalanced datasets. Where many algorithms tend to favor the majority class and in some cases completely ignore the minority class. And in many cases the minority class is the most valuable one, leading to underperforming and undeployable implementations.There are many proposed solutions for this problem, they range from different algorithms, modifications of existing algorithms and data manipulation methods. This study tries to contribute to the field by benchmarking three commonly applied algorithms (Random forest, gradient boosted decision trees and multi-layer perceptron), in combination with three different data-manipulation methods (oversampling, undersampling and no data manipulation). This was done through experiments over three differently shaped datasets.The results point towards random forest being the best overall performing algorithm. But when it comes to data with a lot of categorical dimensions the multi-layer perceptron was the top performer. And when it comes to data-manipulation, undersampling was the best approach for all the datasets and algorithms.

sted, utgiver, år, opplag, sider
2020. , s. 28
HSV kategori
Identifikatorer
URN: urn:nbn:se:his:diva-18660OAI: oai:DiVA.org:his-18660DiVA, id: diva2:1448074
Fag / kurs
Informationsteknologi
Utdanningsprogram
Information Systems
Veileder
Examiner
Tilgjengelig fra: 2020-06-26 Laget: 2020-06-26 Sist oppdatert: 2020-06-26bibliografisk kontrollert

Open Access i DiVA

fulltext(880 kB)1398 nedlastinger
Filinformasjon
Fil FULLTEXT01.pdfFilstørrelse 880 kBChecksum SHA-512
ab800f4d6d81616bc0f96781943530347f9f8f73339f913be46fac6bd6dac760584e28846eecb6c25ec62d29276280a8560eff4b5d6b06ba2d9e9b7df9e95279
Type fulltextMimetype application/pdf

Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar
Totalt: 1398 nedlastinger
Antall nedlastinger er summen av alle nedlastinger av alle fulltekster. Det kan for eksempel være tidligere versjoner som er ikke lenger tilgjengelige

urn-nbn

Altmetric

urn-nbn
Totalt: 419 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf