Högskolan i Skövde

his.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Explainable Local and Global Models for Fine-Grained Multimodal Product Recognition
University of Skövde, School of Engineering Science. University of Skövde, Virtual Engineering Research Environment. Jönköping University, Sweden ; ITAB Shop Products AB Sweden. (Virtual Production Development (VPD))ORCID iD: 0000-0001-8880-7965
Dept Computer Science and Informatics, Jönköping University, Sweden.ORCID iD: 0000-0003-2900-9335
Jönköping University, Sweden.
2023 (English)Conference paper, Published paper (Refereed)
Abstract [en]

Grocery product recognition techniques are emerging in the retail sector and are used to provide automatic checkout counters, reduce self-checkout fraud, and support inventory management. However, recognizing grocery products using machine learning models is challenging due to the vast number of products, their similarities, and changes in appearance. To address these challenges, more complex models are created by adding additional modalities, such as text from product packages. But these complex models pose additional challenges in terms of model interpretability. Machine learning experts and system developers need tools and techniques conveying interpretations to enable the evaluation and improvement of multimodal production recognition models. In this work, we propose thus an approach to provide local and global explanations that allow us to assess multimodal models for product recognition. We evaluate this approach on a large fine-grained grocery product dataset captured from a real-world environment. To assess the utility of our approach, experiments are conducted for three types of multimodal models. The results show that our approach provides fine-grained local explanations while being able to aggregate those into global explanations for each type of product. In addition, we observe a disparity between different multimodal models, in what type of features they learn and what modality each model focuses on. This provides valuable insight to further improve the accuracy and robustness of multimodal product recognition models for grocery product recognition.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2023.
Keywords [en]
Multimodal classification, Explainable AI, Grocery product recognition, LIME, Fine-grained recognition, Optical character recognition
National Category
Computer graphics and computer vision Computer Sciences
Research subject
Virtual Production Development (VPD)
Identifiers
URN: urn:nbn:se:his:diva-25773OAI: oai:DiVA.org:his-25773DiVA, id: diva2:1993217
Conference
Multimodal KDD 2023: International Workshop on Multimodal Learning, held in conjunction with KDD'23, 29TH ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Long Beach, CA, August 6-10, 2023
Available from: 2025-08-29 Created: 2025-08-29 Last updated: 2025-12-15
In thesis
1.
The record could not be found. The reason may be that the record is no longer available or you may have typed in a wrong id in the address field.

Open Access in DiVA

No full text in DiVA

Other links

Fulltext (PDF)https://multimodal-kdd-2023.github.io/

Authority records

Pettersson, TobiasRiveiro, Maria

Search in DiVA

By author/editor
Pettersson, TobiasRiveiro, Maria
By organisation
School of Engineering ScienceVirtual Engineering Research Environment
Computer graphics and computer visionComputer Sciences

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 179 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf