his.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A comperative study of text classification models on invoices: The feasibility of different machine learning algorithms and their accuracy
University of Skövde, School of Informatics.
University of Skövde, School of Informatics.
2018 (English)Independent thesis Basic level (degree of Bachelor), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Text classification for companies is becoming more important in a world where an increasing amount of digital data are made available. The aim is to research whether five different machine learning algorithms can be used to automate the process of classification of invoice data and see which one gets the highest accuracy. Algorithms are in a later stage combined for an attempt to achieve higher results.

N-grams are used, and results are compared in form of total accuracy of classification for each algorithm. A library in Python, called scikit-learn, implementing the chosen algorithms, was used. Data is collected and generated to represent data present on a real invoice where data has been extracted.

Results from this thesis show that it is possible to use machine learning for this type of problem. The highest scoring algorithm (LinearSVC from scikit-learn) classifies 86% of all samples correctly. This is a margin of 16% above the acceptable level of 70%.

Place, publisher, year, edition, pages
2018. , p. 42
Keywords [en]
Machine learning, text classification, invoices, supervised learning, information retrieval, ensemble learning
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:his:diva-15647OAI: oai:DiVA.org:his-15647DiVA, id: diva2:1219825
External cooperation
Asitis AB
Subject / course
Informationsteknologi
Educational program
Computer Science - Specialization in Systems Development
Supervisors
Examiners
Available from: 2018-06-26 Created: 2018-06-18 Last updated: 2018-06-26Bibliographically approved

Open Access in DiVA

fulltext(2098 kB)20 downloads
File information
File name FULLTEXT01.pdfFile size 2098 kBChecksum SHA-512
b57da41f9b04c4bdf7fb9181d190a3769b53c9427e29774bd8dcc83c60742b01e77666d466b1ac232d5a0cebb7c38c31b6555501bd7e1a96071b9ab2bad153c9
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Ekström, LinusAugustsson, Andreas
By organisation
School of Informatics
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 20 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 46 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf