Högskolan i Skövde

his.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Multimodal fine-grained grocery product recognition using image and OCR text
University of Skövde, School of Engineering Science. University of Skövde, Virtual Engineering Research Environment. ITAB Shop Products AB, Jönköping, Sweden ; Jönköping University, Sweden. (Virtual Production Development (VPD))ORCID iD: 0000-0001-8880-7965
Department of Computer Science and Informatics, Jönköping University, Sweden.ORCID iD: 0000-0003-2900-9335
Department of Computing, Jönköping University, Sweden.
2024 (English)In: Machine Vision and Applications, ISSN 0932-8092, E-ISSN 1432-1769, Vol. 35, no 4, article id 79Article in journal (Refereed) Published
Abstract [en]

Automatic recognition of grocery products can be used to improve customer flow at checkouts and reduce labor costs and store losses. Product recognition is, however, a challenging task for machine learning-based solutions due to the large number of products and their variations in appearance. In this work, we tackle the challenge of fine-grained product recognition by first extracting a large dataset from a grocery store containing products that are only differentiable by subtle details. Then, we propose a multimodal product recognition approach that uses product images with extracted OCR text from packages to improve fine-grained recognition of grocery products. We evaluate several image and text models separately and then combine them using different multimodal models of varying complexities. The results show that image and textual information complement each other in multimodal models and enable a classifier with greater recognition performance than unimodal models, especially when the number of training samples is limited. Therefore, this approach is suitable for many different scenarios in which product recognition is used to further improve recognition performance. The dataset can be found at https://github.com/Tubbias/finegrainocr.

Place, publisher, year, edition, pages
Springer Nature, 2024. Vol. 35, no 4, article id 79
Keywords [en]
Grocery product recognition, Multimodal classification, Fine-grained recognition, Optical character recognition
National Category
Production Engineering, Human Work Science and Ergonomics Computer graphics and computer vision Natural Language Processing
Research subject
Virtual Production Development (VPD)
Identifiers
URN: urn:nbn:se:his:diva-23933DOI: 10.1007/s00138-024-01549-9ISI: 001243616100001Scopus ID: 2-s2.0-85195555790OAI: oai:DiVA.org:his-23933DiVA, id: diva2:1867571
Funder
Knowledge Foundation, 2020-0044Swedish National Infrastructure for Computing (SNIC), 2018-05973Swedish Research CouncilUniversity of Skövde
Note

CC BY 4.0

Tobias Pettersson tobias.pettersson@itab.com

The authors would like to thank ITAB Shop Products AB and Smart Industry Sweden (KKS-2020-0044) for their support. The machine learning training was enabled by resources provided by the Swedish National Infrastructure for Computing (SNIC) at C3SE, partially funded by the Swedish Research Council through grant agreement no. 2018-05973.

Open access funding provided by University of Skövde

Available from: 2024-06-10 Created: 2024-06-10 Last updated: 2025-02-01Bibliographically approved

Open Access in DiVA

fulltext(4918 kB)159 downloads
File information
File name FULLTEXT01.pdfFile size 4918 kBChecksum SHA-512
8d692264f4ee86e93acaeeb24f2c88c7e4e948a952abe88cf80062b480e0d0e0b7226f8a5728c93eef94bf6544583d667a8ceceb8cfb794177b52d795ae4bda8
Type fulltextMimetype application/pdf

Other links

Publisher's full textScopus

Authority records

Pettersson, TobiasRiveiro, Maria

Search in DiVA

By author/editor
Pettersson, TobiasRiveiro, Maria
By organisation
School of Engineering ScienceVirtual Engineering Research Environment
In the same journal
Machine Vision and Applications
Production Engineering, Human Work Science and ErgonomicsComputer graphics and computer visionNatural Language Processing

Search outside of DiVA

GoogleGoogle Scholar
Total: 159 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 855 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf