Högskolan i Skövde

his.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Evaluation of Video Masked Autoencoders' Performance and Uncertainty Estimations for Driver Action and Intention Recognition
University of Skövde, School of Informatics. University of Skövde, Informatics Research Environment. Department of Data Analytics and Engineering, R&D, Volvo Car Corporation, Sweden. (Skövde Artificial Intelligence Lab (SAIL))ORCID iD: 0000-0003-2135-6615
University of Skövde, School of Informatics. University of Skövde, Informatics Research Environment. (Skövde Artificial Intelligence Lab (SAIL))ORCID iD: 0000-0003-2949-4123
University of Skövde, School of Informatics. University of Skövde, Informatics Research Environment. (Skövde Artificial Intelligence Lab (SAIL))ORCID iD: 0000-0001-8884-2154
Department of Data Analytics and Engineering, R&D, Volvo Car Corporation, Sweden.
2024 (English)In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), IEEE, 2024, p. 7429-7437Conference paper, Published paper (Refereed)
Abstract [en]

Traffic fatalities remain among the leading death causes worldwide. To reduce this figure, car safety is listed as one of the most important factors. To actively support human drivers, it is essential for advanced driving assistance systems to be able to recognize the driver's actions and intentions. Prior studies have demonstrated various approaches to recognize driving actions and intentions based on in-cabin and external video footage. Given the performance of self-supervised video pre-trained (SSVP) Video Masked Autoencoders (VMAEs) on multiple action recognition datasets, we evaluate the performance of SSVP VMAEs on the Honda Research Institute Driving Dataset for driver action recognition (DAR) and on the Brain4Cars dataset for driver intention recognition (DIR). Besides the performance, the application of an artificial intelligence system in a safety-critical environment must be capable to express when it is uncertain about the produced results. Therefore, we also analyze uncertainty estimations produced by a Bayes-by-Backprop last-layer (BBB-LL) and Monte-Carlo (MC) dropout variants of an VMAE. Our experiments show that an VMAE achieves a higher overall performance for both offline DAR and end-to-end DIR compared to the state-of-the-art. The analysis of the BBB-LL and MC dropout models show higher uncertainty estimates for incorrectly classified test instances compared to correctly predicted test instances.

Place, publisher, year, edition, pages
IEEE, 2024. p. 7429-7437
Series
Proceedings IEEE Workshop on Applications of Computer Vision, ISSN 2472-6737, E-ISSN 2642-9381
National Category
Computer graphics and computer vision
Research subject
Skövde Artificial Intelligence Lab (SAIL)
Identifiers
URN: urn:nbn:se:his:diva-23540DOI: 10.1109/WACV57701.2024.00726Scopus ID: 2-s2.0-85191986920ISBN: 979-8-3503-1893-7 (print)ISBN: 979-8-3503-1892-0 (electronic)OAI: oai:DiVA.org:his-23540DiVA, id: diva2:1828101
Conference
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), January 4-8, 2024, Waikoloha, Hawaii, USA
Funder
Vinnova, 2018-05012Available from: 2024-01-16 Created: 2024-01-16 Last updated: 2025-09-29Bibliographically approved
In thesis
1. Deep Learning-Based Driver Intention Recognition: Evaluating performance, complexity and uncertainty estimations
Open this publication in new window or tab >>Deep Learning-Based Driver Intention Recognition: Evaluating performance, complexity and uncertainty estimations
2025 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Deep learning (DL) methods have advanced rapidly and are commonly applied in high-risk, resource-constrained environments such as advanced driver assistance systems (ADAS), where misclassifications can have serious consequences. With upcoming artificial intelligence (AI) legislation, it is essential to extensively evaluate and minimize the undesirable behavior of DL-based systems in such settings. An example is an ADAS that continuously evaluates whether a driver’s intended maneuvers are safe to execute given the current traffic context. Driver intention recognition (DIR), which predicts the maneuver a driver intends to perform in the near future, is a central DL-based component of such systems. Since deep neural networks (DNNs) do not inherently provide uncertainty estimates for their predictions, probabilistic deep learning (PDL) methods can be applied to improve the identification of scenarios where model outputs may be unreliable. In this thesis, we first review the current state of DIR research, focusing on the recent shift toward DL methods. We then examine how both established and novel PDL methods influence DIR performance. We evaluate the uncertainty estimations by analyzing their ability to distinguish between correct and incorrect predictions and by measuring their effectiveness in out-of-distribution (OOD) detection. Furthermore, we employ neural architecture search with multiple objectives and search strategies to explore how architectural complexity impacts DIR and OOD detection performance. Finally, we conduct a comparative experiment to evaluate human performance against that of DL-based models in video-based recognition of road user intentions.

Place, publisher, year, edition, pages
Skövde: University of Skövde, 2025. p. xiii, 294
Series
Dissertation Series ; 66
National Category
Computer Sciences Computer graphics and computer vision Artificial Intelligence
Research subject
Skövde Artificial Intelligence Lab (SAIL)
Identifiers
urn:nbn:se:his:diva-25867 (URN)978-91-989080-5-3 (ISBN)978-91-989080-6-0 (ISBN)
Public defence
2025-11-12, University of Skövde, Building D, Room D107, Skövde, 13:15 (English)
Opponent
Supervisors
Note

Tre av nio delarbeten ("under submission"; övriga se rubriken Delarbeten/List of papers):

6. Koen Vellenga, H. Joe Steinhauer, Göran Falkman, Jonas Andersson, and Anders Sjögren (2025b). “Last Layer Hamiltonian Monte Carlo”

7. Koen Vellenga, H. Joe Steinhauer, Jonas Andersson, and Anders Sjögren (2025). “Latent Uncertainty Representations for Video-based Driver Action and Intention Recognition”

8. Koen Vellenga (2025). “Multi-Objective Architecture Search for Driver Action and Intention Recognition using Probabilistic Deep Neural Networks”

Available from: 2025-09-30 Created: 2025-09-29 Last updated: 2026-01-08Bibliographically approved

Open Access in DiVA

fulltext(434 kB)237 downloads
File information
File name FULLTEXT01.pdfFile size 434 kBChecksum SHA-512
8244ffec326d4a535a742e52251d2aa6a57b3ac55befeb58d853c64f1ea1f0e2246ab95ed5d84edb6bf5c49c1bbc0ab57cb572a704fdb567318d36dffbeef72a
Type fulltextMimetype application/pdf

Other links

Publisher's full textScopusFulltext

Authority records

Vellenga, KoenSteinhauer, H. JoeFalkman, Göran

Search in DiVA

By author/editor
Vellenga, KoenSteinhauer, H. JoeFalkman, Göran
By organisation
School of InformaticsInformatics Research Environment
Computer graphics and computer vision

Search outside of DiVA

GoogleGoogle Scholar
Total: 238 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 728 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf