Högskolan i Skövde

his.sePublications
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Product Recognition with OCR Text: Advancing Grocery Product Recognition through Robust Approaches, Fine-Grained Recognition, and Domain Adaptation for Real-Time Performance
University of Skövde, School of Engineering Science. University of Skövde, Virtual Engineering Research Environment. ITAB Shop Products AB, Jönköping, Sweden ; Jönköping University, Sweden. (Virtual Production Development (VPD))ORCID iD: 0000-0001-8880-7965
2025 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

The physical retail sector faces challenges in improving operational efficiency, reducing costs, and enhancing customer experience. Over the past decade, companies have introduced product recognition technology solutions to improve checkout efficiency, inventory management, and fraud detection. However, most initiatives have struggled to scale or achieve sufficient accuracy due to the complex nature of physical retail, which includes a large number of products that continuously change, as well as varied environmental conditions. In parallel, academic research has tackled many of these challenges by providing datasets and new methods to improve recognition performance, but considerable challenges persist.

This thesis addresses three main challenges in grocery product recognition: robust recognition, recognition of visually similar products, and domain adaptation between different retail systems. To address these challenges, the work centers on the use of Optical Character Recognition (OCR) to extract textual information found on product packaging for product recognition. With extensive experiments and the creation of a dataset, the results show that OCR-based methods for product recognition can improve recognition robustness, enable more accurate differentiation between similar products, and also work across different retail systems.

Therefore, the main contribution of this thesis is the development and validation of these OCR text-based methods and approaches, specifically designed to address the requirements in physical retail.

Abstract [sv]

Den fysiska detaljhandeln står inför utmaningar med att förbättra den operativa effektiviteten, sänka kostnader och stärka kundupplevelsen. Under det senaste decenniet har företag introducerat lösningar baserade på produktigenkänningsteknik för att effektivisera kassaprocesser, förbättra lagerhantering och upptäcka bedrägerier. De flesta initiativ har dock haft svårt att skala upp eller uppnå tillräcklig noggrannhet på grund av detaljhandelns dynamiska natur, som kännetecknas av ett stort och ständigt föränderligt produktsortiment samt skiftande butiksmiljöer. Parallellt har akademisk forskning adresserat många av dessa utmaningar genom att tillhandahålla datamängder och nya metoder för att förbättra igenkänningsprestandan, men betydande utmaningar kvarstår.

Denna avhandling adresserar tre huvudsakliga utmaningar inom produktigenkänning i dagligvaruhandeln: robust igenkänning, finmaskig igenkänning av visuellt liknande produkter samt domänanpassning mellan olika retailsystem. För att hantera dessa utmaningar fokuserar arbetet på att använda optisk teckenläsning (OCR) för att extrahera textinformation från produktförpackningar för produktigenkänning. Genom omfattande experiment och skapandet av en datamängd visar resultaten att OCR-baserade metoder kan förbättra robustheten i igenkänningen, möjliggöra mer noggrann differentiering mellan produkter samt fungera över olika retailmiljöer.

Avhandlingens huvudsakliga bidrag är utvecklingen och valideringen av metoder och tillvägagångssätt för produktigenkänning med text från OCR som möter de unika kraven inom den fysiska detaljhandeln.

Place, publisher, year, edition, pages
Skövde: University of Skövde , 2025. , p. xv, 140
Series
Dissertation Series ; 67
National Category
Computer graphics and computer vision Natural Language Processing Artificial Intelligence
Research subject
Virtual Production Development (VPD)
Identifiers
URN: urn:nbn:se:his:diva-26062ISBN: 978-91-989080-7-7 (print)ISBN: 978-91-989080-8-4 (electronic)OAI: oai:DiVA.org:his-26062DiVA, id: diva2:2021302
Public defence
2026-01-23, ASSAR Industrial Innovation Arena, Kavelbrovägen 2B, 541 36, Skövde, 09:15 (English)
Opponent
Supervisors
Note

Ett av sex delarbeten (övriga se rubriken Delarbeten/List of papers):

6. Tobias Pettersson, Maria Riveiro, and Tuwe Löfström. “Real-Time OCR-Based Grocery Product Recognition with Orientation Alignment and Embedding-Driven Classification”. In: Accepted and presented at the International Conference on Machine Vision (ICMV 2025). 2025.

PUBLICATIONS WITH LOW RELEVANCE

7. Puneet Mishra, Aneesh Chauhan, and Tobias Pettersson. “Seeing through plastics: A novel combination of NIR hyperspectral imaging and spectral orthogonalization for detecting fresh fruit inside plastic packaging to support automated barcode less checkouts in supermarkets”. In: Food Control 150 (2023), p. 109762.

8. Faeze Zakaryapour Sayyad, Tobias Pettersson, Seyed Jalaleddin Mousavirad, Irida Shallari, and Mattias O’Nils. “AdAPT: Advertisement detector adaptation under newspaper domain shift with null-based pseudo-labeling”. In: Machine Learning with Applications (2025), p. 100806.DOI: https://doi.org/10.1016/j.mlwa.2025.100806.

Available from: 2025-12-15 Created: 2025-12-12 Last updated: 2025-12-15Bibliographically approved
List of papers
1. Product verification using OCR classification and Mondrian conformal prediction
Open this publication in new window or tab >>Product verification using OCR classification and Mondrian conformal prediction
2022 (English)In: Expert systems with applications, ISSN 0957-4174, E-ISSN 1873-6793, Vol. 188, article id 115942Article in journal (Refereed) Published
Abstract [en]

The retail sector is undergoing an apparent digital transformation that completely revolutionises shopping operations. To stay competitive, retailer stakeholders are forced to rethink and improve their business models to provide an attractive personalised experience to consumers. The self-service checkout process is at the heart of this transformation and should be designed to identify the products accurately and detect any possible anomalous behaviour. In this paper, we introduce a product verification system based on OCR classification and Mondrian conformal prediction. The proposed system includes three components: OCR reading, text classification and product verification. By using image data from existing grocery stores, the system can detect anomalies with high performance, even when there is partial text information on the products. This makes the system applicable for reducing shrinkage loss (caused, for example, by employee theft or shoplifting) in grocery stores by identifying fraudulent behaviours such as barcode switching and miss-scan. Additionally, OCR reading with NLP classification shows that it is in itself a powerful classifier of products. 

Place, publisher, year, edition, pages
Elsevier, 2022
Keywords
Mondrian conformal prediction, OCR classification, Retail product verification, Smart self-checkout system, Classification (of information), Forecasting, Text processing, Business models, Conformal predictions, Digital transformation, Grocery stores, Mondrian, Product verification, Sales
National Category
Computer Sciences
Research subject
Production and Automation Engineering
Identifiers
urn:nbn:se:his:diva-20675 (URN)10.1016/j.eswa.2021.115942 (DOI)000768193500002 ()2-s2.0-85117127725 (Scopus ID)
Note

CC BY-NC-ND 4.0

© 2021 The Authors

Corresponding author: rachid.oucheikh@ju.se (R. Oucheikh)

Corresponding author at ITAB Shop Products AB, Sweden: tobias.pettersson@itab.com (T. Pettersson)

tuwe.lofstrom@ju.se (T. Löfström)

URL: https://ju.se/jail/datakind (R. Oucheikh)

Available from: 2021-10-29 Created: 2021-10-29 Last updated: 2025-12-15Bibliographically approved
2. NLP Cross-Domain Recognition of Retail Products
Open this publication in new window or tab >>NLP Cross-Domain Recognition of Retail Products
2022 (English)In: ICMLT '22: Proceedings of the 2022 7th International Conference on Machine Learning Technologies, March 2022, Association for Computing Machinery (ACM), 2022, p. 237-243Conference paper, Published paper (Refereed)
Abstract [en]

Self‐checkout systems aim to provide a seamless and high-quality shopping experience and increase the profitability of stores. These advantages come with some challenges such as shrinkage loss. To overcome these challenges, automatic recognition of the purchased products is a potential solution. In this context, one of the big issues that emerge is the data shifting, which is caused by the difference between the environment in which the recognition model is trained and the environment in which the model is deployed. In this paper, we use transfer learning to handle the shift caused by the change of camera and lens or their position as well as critical factors, mainly lighting, reflection, and occlusion. We motivate the use of Natural Language Processing (NLP) techniques on textual data extracted from images instead of using image recognition to study the efficiency of transfer learning techniques. The results show that cross-domain NLP retail recognition using the BERT language model only results in a small reduction in performance between the source and target domain. Furthermore, a small number of additional training samples from the target domain improves the model to perform comparable as a model trained on the source domain.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2022
Keywords
Image recognition, Learning systems, Natural language processing systems, Sales, Text processing, Transfer learning, BERT, Cross-domain, Domain adaptation, Language processing, Natural language processing, Natural languages, Product recognition, Retail, Text classification, Classification (of information), NLP
National Category
Computer Sciences
Research subject
Virtual Production Development (VPD)
Identifiers
urn:nbn:se:his:diva-22974 (URN)10.1145/3529399.3529436 (DOI)001053939400037 ()2-s2.0-85132414844 (Scopus ID)978-1-4503-9574-8 (ISBN)
Conference
ICMLT 2022: 7th International Conference on Machine Learning Technologies (ICMLT), Virtual Conference, 11-13 March 2022
Funder
Knowledge Foundation, DATAKIND 20190194
Note

This work was supported by the Swedish Knowledge Foundation (DATAKIND 20190194), the company ITAB, and Smart Industry Sweden (KKS-2020-0044).

© 2022 Association for Computing Machinery.

Available from: 2023-07-05 Created: 2023-07-05 Last updated: 2025-12-15Bibliographically approved
3. Explainable Local and Global Models for Fine-Grained Multimodal Product Recognition
Open this publication in new window or tab >>Explainable Local and Global Models for Fine-Grained Multimodal Product Recognition
2023 (English)Conference paper, Published paper (Refereed)
Abstract [en]

Grocery product recognition techniques are emerging in the retail sector and are used to provide automatic checkout counters, reduce self-checkout fraud, and support inventory management. However, recognizing grocery products using machine learning models is challenging due to the vast number of products, their similarities, and changes in appearance. To address these challenges, more complex models are created by adding additional modalities, such as text from product packages. But these complex models pose additional challenges in terms of model interpretability. Machine learning experts and system developers need tools and techniques conveying interpretations to enable the evaluation and improvement of multimodal production recognition models. In this work, we propose thus an approach to provide local and global explanations that allow us to assess multimodal models for product recognition. We evaluate this approach on a large fine-grained grocery product dataset captured from a real-world environment. To assess the utility of our approach, experiments are conducted for three types of multimodal models. The results show that our approach provides fine-grained local explanations while being able to aggregate those into global explanations for each type of product. In addition, we observe a disparity between different multimodal models, in what type of features they learn and what modality each model focuses on. This provides valuable insight to further improve the accuracy and robustness of multimodal product recognition models for grocery product recognition.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2023
Keywords
Multimodal classification, Explainable AI, Grocery product recognition, LIME, Fine-grained recognition, Optical character recognition
National Category
Computer graphics and computer vision Computer Sciences
Research subject
Virtual Production Development (VPD)
Identifiers
urn:nbn:se:his:diva-25773 (URN)
Conference
Multimodal KDD 2023: International Workshop on Multimodal Learning, held in conjunction with KDD'23, 29TH ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Long Beach, CA, August 6-10, 2023
Available from: 2025-08-29 Created: 2025-08-29 Last updated: 2026-01-07
4. Multimodal fine-grained grocery product recognition using image and OCR text
Open this publication in new window or tab >>Multimodal fine-grained grocery product recognition using image and OCR text
2024 (English)In: Machine Vision and Applications, ISSN 0932-8092, E-ISSN 1432-1769, Vol. 35, no 4, article id 79Article in journal (Refereed) Published
Abstract [en]

Automatic recognition of grocery products can be used to improve customer flow at checkouts and reduce labor costs and store losses. Product recognition is, however, a challenging task for machine learning-based solutions due to the large number of products and their variations in appearance. In this work, we tackle the challenge of fine-grained product recognition by first extracting a large dataset from a grocery store containing products that are only differentiable by subtle details. Then, we propose a multimodal product recognition approach that uses product images with extracted OCR text from packages to improve fine-grained recognition of grocery products. We evaluate several image and text models separately and then combine them using different multimodal models of varying complexities. The results show that image and textual information complement each other in multimodal models and enable a classifier with greater recognition performance than unimodal models, especially when the number of training samples is limited. Therefore, this approach is suitable for many different scenarios in which product recognition is used to further improve recognition performance. The dataset can be found at https://github.com/Tubbias/finegrainocr.

Place, publisher, year, edition, pages
Springer Nature, 2024
Keywords
Grocery product recognition, Multimodal classification, Fine-grained recognition, Optical character recognition
National Category
Production Engineering, Human Work Science and Ergonomics Computer graphics and computer vision Natural Language Processing
Research subject
Virtual Production Development (VPD)
Identifiers
urn:nbn:se:his:diva-23933 (URN)10.1007/s00138-024-01549-9 (DOI)001243616100001 ()2-s2.0-85195555790 (Scopus ID)
Funder
Knowledge Foundation, 2020-0044Swedish National Infrastructure for Computing (SNIC), 2018-05973Swedish Research CouncilUniversity of Skövde
Note

CC BY 4.0

Tobias Pettersson tobias.pettersson@itab.com

The authors would like to thank ITAB Shop Products AB and Smart Industry Sweden (KKS-2020-0044) for their support. The machine learning training was enabled by resources provided by the Swedish National Infrastructure for Computing (SNIC) at C3SE, partially funded by the Swedish Research Council through grant agreement no. 2018-05973.

Open access funding provided by University of Skövde

Available from: 2024-06-10 Created: 2024-06-10 Last updated: 2025-12-15Bibliographically approved
5. Real-Time Automatic Checkout via Prompt-Based Product Extraction and Cross-Domain Learning
Open this publication in new window or tab >>Real-Time Automatic Checkout via Prompt-Based Product Extraction and Cross-Domain Learning
2024 (English)In: Proceedings 2024 International Conference on Machine Learning and Applications ICMLA 2024: Miami, Florida 18-20 December 2024 / [ed] M. Arif Wani; Plamen Angelov; Feng Luo; Mitsunori Ogihara Xintao Wu; Radu-Emil Precup; Ramin Ramezani; Xiaowei Gu, IEEE, 2024, p. 1396-1403Conference paper, Published paper (Refereed)
Abstract [en]

Automatic checkout systems are designed to predict a complete shopping receipt using an image from the checkout area. These systems require high classification accuracy across numerous classes and must operate in real-time, despite domain differences between training data and real-world conditions. Building on recent advancements, we propose a method that outperforms current solutions and can be applied in real-time in automatic checkout systems. Our method leverages the Segment Anything Model to extract high-quality masks from lab product images, which are then transformed into synthetic checkout images and adapted to the real domain using contrastive unpaired translation. We train a product recognition model with data augmentation, named SCA+Y8, and further improve it through fine-tuning with pseudo-labels from unlabeled checkout images, resulting in an improved model called SCAFT+Y8. SCAFT+Y8 achieves a great increase in state-of-the-art performance, with an average receipt classification accuracy of 97.58%, and shows strong performance in smaller models, indicating the potential for deployment on low-cost edge devices. 

Place, publisher, year, edition, pages
IEEE, 2024
Series
International Conference on Machine Learning and Applications (ICMLA), ISSN 1946-0740, E-ISSN 1946-0759
Keywords
Automatic Checkout, Domain Adaptation, Object Detection, YOLOv8, Contrastive Learning, Image enhancement, Image segmentation, Object recognition, Classification accuracy, Cross-domain learning, Domain differences, Objects detection, Real- time, Real-world, Training data
National Category
Computer Sciences Computer graphics and computer vision
Research subject
Virtual Production Development (VPD)
Identifiers
urn:nbn:se:his:diva-24982 (URN)10.1109/ICMLA61862.2024.00217 (DOI)001468515500208 ()2-s2.0-105000879245 (Scopus ID)979-8-3503-7489-6 (ISBN)979-8-3503-7488-9 (ISBN)
Conference
2024 International Conference on Machine Learning and Applications ICMLA 2024, Miami, Florida, 18-20 December 2024
Funder
Knowledge Foundation, 2020-0044Swedish Research Council, 2022-06725
Note

© 2024 IEEE

The authors would like to thank ITAB Shop Products AB and Smart Industry Sweden (KKS-2020-0044) for their support. The machine learning training was enabled by resources provided by the National Academic Infrastructure for Supercomputing in Sweden (NAISS), partially funded by the Swedish Research Council through grant agreement no. 2022-06725.

Available from: 2025-04-03 Created: 2025-04-03 Last updated: 2025-12-15Bibliographically approved

Open Access in DiVA

fulltext(38307 kB)46 downloads
File information
File name FULLTEXT01.pdfFile size 38307 kBChecksum SHA-512
32f0e831c113a2ce77e15be227917ca5031348da49f08baa44ed8b95aa1e5097a48429276f7ae726fb0fdf724b9d6460ffccd6927d9b7a844bc4377806c7efb6
Type fulltextMimetype application/pdf

Authority records

Pettersson, Tobias

Search in DiVA

By author/editor
Pettersson, Tobias
By organisation
School of Engineering ScienceVirtual Engineering Research Environment
Computer graphics and computer visionNatural Language ProcessingArtificial Intelligence

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 905 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf