Högskolan i Skövde

his.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Assessing the Impact of Feature Quantity in Tree Based Machine Learning Models for the Detection of Malicious Software Packages in PyPI
University of Skövde, School of Informatics.
University of Skövde, School of Informatics.
University of Skövde, School of Informatics.
2025 (English)Independent thesis Basic level (degree of Bachelor), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Software supply chain attacks, where malicious code is inserted into software components or repositories, present a growing threat to the open-source ecosystem. Traditional scanning methods often fail to detect hidden threats. Machine Learning (ML) offers a potential solution by automating the detection of malicious packages. However, previous studies report false positive rates too high for practical deployment. This study investigates whether increasing feature quantity improves ML model performance for detecting malicious software packages on PyPI. Using aquasi-experiment with Decision Tree, Random Forest, and XGBoost classifiers, the feature set is incrementally expanded, and performance changes are measured. Results show that while adding features improves performance initially, there are diminishing gains beyond a certain point. Future work includes expanding feature sets further and adding additional randomized datasets to reduce feature bias.

Place, publisher, year, edition, pages
2025. , p. 59
Keywords [en]
Classification, Feature Quantity, Artificial Intelligence, Malicious, cyber security, Software Supply-Chain
National Category
Computer Sciences Software Engineering
Identifiers
URN: urn:nbn:se:his:diva-25575OAI: oai:DiVA.org:his-25575DiVA, id: diva2:1985408
Subject / course
Informationsteknologi
Educational program
Computer Science - Specialization in Systems Development
Supervisors
Examiners
Available from: 2025-07-24 Created: 2025-07-24 Last updated: 2025-09-29Bibliographically approved

Open Access in DiVA

fulltext(1572 kB)197 downloads
File information
File name FULLTEXT01.pdfFile size 1572 kBChecksum SHA-512
89d27983cf3c503c636952171526bfb533fe6ccd2dbe3c083b2a2733c3b705e98914cdbc975e43ddb3ecb54331a436f27c4b23b434db39d56279bae538d6e1af
Type fulltextMimetype application/pdf

By organisation
School of Informatics
Computer SciencesSoftware Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 201 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 310 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf