Högskolan i Skövde

his.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
The Observer Lens: Characterizing Visuospatial Features in Multimodal Interactions
University of Skövde, School of Informatics. University of Skövde, Informatics Research Environment. (Interaction Lab (iLab))ORCID iD: 0000-0003-0517-8468
2024 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Understanding the intricate nature of human interactions relies heavily on the ability to discern and interpret inherent information. Central to this interpretation are the sensory features—visual, spatial, and auditory—collectively referred to as visuospatial in this thesis. The low-level (e.g., motion kinematics) and high-level (e.g., gestures and speech) visuospatial features significantly influence human perception, aiding in the deduction of intent, goals, emotions, and more. From a computational viewpoint, these features are crucial for interpreting events, from discerning body poses to evaluating action similarity, particularly for computational systems designed to interact closely with humans. This thesis examines the impact of visuospatial features on human event observation within an informatics context, concentrating on (1) Investigating the effect of visuospatial features on the observers’ perception; and (2) Aligning this investigation towards outcomes applicable to informatics.

Taking a human-centric perspective, the thesis methodically probes the role of visuospatial features, drawing from prior cognitive research, underscoring the significance of features like action kinematics, gaze, turn-taking, and gestures in event comprehension. Balancing both reductionist and naturalistic perspectives, the research examines specific visuospatial features and their impact on the human's visual processing and attention mechanisms:

- Visual Processing: It highlights the visual processing effects of action features, including kinematics, local-motion, and global form, as well as the role of factors like semantics and familiarity. These are demonstrated using human performance metrics in perceptual tasks and comparative analyses with selected computational models employing basic kinematic representations, revealing the adaptive nature of the visual system and enhancing Human Action Recognition models.

- Visual Attention: Also highlights the attentional effects of interaction cues, such as speech, hand action, body pose, motion, and gaze, using the developed `Visuospatial Model'. This model presents a systematic approach for characterizing visuospatial features in everyday events, exemplified using a curated movie dataset and a newly developed comprehensive dataset of naturalistic-multimodal events.

The findings emphasize the integration of behavioral and perceptual parameters with computationally aligned strategies, such as benchmarking perceptual tasks against human behavioral and psychophysical metrics, thereby providing a richer context for developing systematic tools and methodologies for multimodal event characterization. At its core, the thesis characterizes the role of visuospatial features in shaping human perception and its implications for the development of cognitive technologies equipped with autonomous perception and interaction capabilities -- essential for domains like social robotics, autonomous driving, media studies, traffic safety, and virtual characters.

Abstract [sv]

Förståelse för den intrikata naturen hos mänskliga interaktioner beror i stor utsträckning på förmågan att urskilja och tolka inneboende information. Centralt i denna tolkning är de sensoriska aspekterna – visuella, rumsliga och auditiva – som tillsammans kallas visuospatiala i denna avhandling. De visuella särdragen på låg nivå (t.ex. rörelsekinematik) och hög nivå (t.ex. gester och tal) påverkar människans uppfattningsförmåga avsevärt och är till hjälp för att dra slutsatser om avsikter, mål, känslor och mer. Ur en beräkningssynpunkt är dessa egenskaper avgörande för att tolka händelser, från att utläsa kroppsställningar till att bedöma liket mellan handlingar, särskilt för beräkningssystem utformade för att interagera nära med människor. Denna avhandling undersöker effekten av visuospatiala särdrag på observation av händelser som involverar människor inom en informatikkontext, med fokus på att (1) Undersöka effekten av visuospatiala särdrag på observatörernas perception; och (2) Anpassa denna undersökning till situationer som är tillämpliga inom informatik.

Med ett människocentrerat perspektiv undersöker avhandlingen metodiskt rollen av visuospatiala egenskaper, med utgångspunkt från tidigare kognitiv forskning, och understryker betydelsen av funktioner som aktionskinematik, ögonrörelser, turtagning och gester i händelseförståelse. Genom att balansera både reduktionistiska och naturalistiska perspektiv undersöker forskningen specifika visuospatiala egenskaper och deras inverkan på människans visuella bearbetnings- och uppmärksamhetsmekanismer:

- Visuell bearbetning: Den belyser de visuella bearbetningseffekterna av handlingsaspekter, inklusive kinematik, lokal rörelse och global form, såväl som rollen av faktorer som semantik och förtrogenhet. Dessa demonstreras med hjälp av mänskliga prestationsmått i perceptuella uppgifter och jämförande analyser med utvalda beräkningsmodeller som använder grundläggande kinematiska representationer, vilket avslöjar det visuella systemets adaptiva karaktär och förbättrar modeller för igenkänning av mänskliga handlingar.

- Visuell uppmärksamhet: Belyser också uppmärksamhetseffekterna av interaktionssignaler, såsom tal, handrörelser, kroppshållning, rörelser och blick, med hjälp av den utvecklade ``Visuospatiala modellen". Denna modell presenterar ett systematiskt tillvägagångssätt för att karakterisera visuospatiala funktioner i vardagliga händelser, exemplifierat med hjälp av en datamäng bestående av kurerat filmmaterial och en nyutvecklad omfattande datamängd av naturalistiska multimodala händelser.

Fynden betonar integrationen av beteendemässiga och perceptuella parametrar med beräkningsmässigt anpassade strategier, såsom benchmarking av perceptuella uppgifter mot mänskliga beteendemässiga och psykofysiska mått, vilket ger ett rikare sammanhang för att utveckla verktyg och metoder för systematisk multimodal händelsekategorisering. I sin kärna betonar avhandlingen visuospatiala egenskapers roll i att forma mänsklig perception och dess implikationer för utvecklingen av kognitiva teknologier utrustade med autonom perception och interaktionsförmåga - väsentligt för domäner som social robotik, autonom körning, mediestudier, trafiksäkerhet, och virtuella karaktärer.

Place, publisher, year, edition, pages
Skövde: University of Skövde , 2024. , p. xvi, 221
Series
Dissertation Series ; 59
National Category
Computer and Information Sciences Human Aspects of ICT Interaction Technologies
Research subject
Interaction Lab (ILAB)
Identifiers
URN: urn:nbn:se:his:diva-23802ISRN: 978-91-987907-3-3 (print)ISBN: 978-91-987907-3-3 (print)OAI: oai:DiVA.org:his-23802DiVA, id: diva2:1855964
Public defence
2024-05-24, G110, Högskolevägen 3, Skövde, 13:15
Opponent
Supervisors
Note

Tre av sex delarbeten (övriga se rubriken Delarbeten/List of papers):

In addition to the papers listed under 'List of Papers,' the following journal manuscripts are included in the thesis. These manuscripts are not present in the online copy of the thesis. Upon acceptance of the papers, the online document, along with this entry, will be updated.

Paper-I: Hemeren, P., Nair, V., & Drejing, K. (Submitted Manuscript). "Biological motion and attention: Interactions influenced by walking orientations, styles, and motion features". Submitted Manuscript for Scientific Journal, pp. 1-18. 

Paper-V: Nair, V., Bhatt, M., Suchan, J., Billing, E., & Hemeren, P. (Submitted Manuscript). "How do naturalistic visuo-auditory cues guide human attention?" Submitted Manuscript for Scientific Journal, pp. 1-48. 

Paper-VI: Nair, V., Bhatt, M., Suchan, J., Billing, E., & Hemeren, P. (Manuscript). "Human interaction scenarios: A naturalistic multimodal event dataset for behavioral research". Manuscript for Scientific Journal, pp. 1-28.

In reference to IEEE copyrighted material which is used with permission in this thesis, the IEEE does not endorse any of The University of Skövde's products or services. Internal or personal use of this material is permitted. If interested in reprinting/republishing IEEE copyrighted material for advertising or promotional purposes or for creating new collective works for resale or redistribution, please go to http://www.ieee.org/publications_standards/publications/rights/rights_link.html to learn how to obtain a License from RightsLink.

Available from: 2024-05-03 Created: 2024-05-03 Last updated: 2024-05-07Bibliographically approved
List of papers
1. Action similarity judgment based on kinematic primitives
Open this publication in new window or tab >>Action similarity judgment based on kinematic primitives
Show others...
2020 (English)In: 2020 Joint IEEE 10th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob), IEEE, 2020Conference paper, Published paper (Refereed)
Abstract [en]

Understanding which features humans rely on - in visually recognizing action similarity is a crucial step towards a clearer picture of human action perception from a learning and developmental perspective. In the present work, we investigate to which extent a computational model based on kinematics can determine action similarity and how its performance relates to human similarity judgments of the same actions. To this aim, twelve participants perform an action similarity task, and their performances are compared to that of a computational model solving the same task. The chosen model has its roots in developmental robotics and performs action classification based on learned kinematic primitives. The comparative experiment results show that both the model and human participants can reliably identify whether two actions are the same or not. However, the model produces more false hits and has a greater selection bias than human participants. A possible reason for this is the particular sensitivity of the model towards kinematic primitives of the presented actions. In a second experiment, human participants' performance on an action identification task indicated that they relied solely on kinematic information rather than on action semantics. The results show that both the model and human performance are highly accurate in an action similarity task based on kinematic-level features, which can provide an essential basis for classifying human actions.

Place, publisher, year, edition, pages
IEEE, 2020
Series
IEEE International Conference on Development and Learning, ISSN 2161-9484, E-ISSN 2161-9484
Keywords
Kinematics, Computational modeling, Task analysis, Biological system modeling, Dictionaries, Visualization, Semantics
National Category
Interaction Technologies
Research subject
Interaction Lab (ILAB)
Identifiers
urn:nbn:se:his:diva-19425 (URN)10.1109/ICDL-EpiRob48136.2020.9278047 (DOI)000692524300007 ()2-s2.0-85097550238 (Scopus ID)978-1-7281-7306-1 (ISBN)978-1-7281-7320-7 (ISBN)
Conference
2020 Joint IEEE 10th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob), 26th October to 27th of November 2020, Online
Funder
Knowledge FoundationEU, European Research Council, 20140220Swedish Research Council, 804388
Note

Funding Agency:10.13039/100003077-Knowledge Foundation; 10.13039/100010663-European Research Council

Available from: 2021-01-23 Created: 2021-01-23 Last updated: 2024-05-03Bibliographically approved
2. Kinematic primitives in action similarity judgments: A human-centered computational model
Open this publication in new window or tab >>Kinematic primitives in action similarity judgments: A human-centered computational model
Show others...
2023 (English)In: IEEE Transactions on Cognitive and Developmental Systems, ISSN 2379-8920, E-ISSN 2379-8939, Vol. 15, no 4, p. 1981-1992Article in journal (Refereed) Published
Abstract [en]

This paper investigates the role that kinematic features play in human action similarity judgments. The results of three experiments with human participants are compared with the computational model that solves the same task. The chosen model has its roots in developmental robotics and performs action classification based on learned kinematic primitives. The comparative experimental results show that both model and human participants can reliably identify whether two actions are the same or not. Specifically, most of the given actions could be similarity judged based on very limited information from a single feature domain (velocity or spatial). Both velocity and spatial features were however necessary to reach a level of human performance on evaluated actions. The experimental results also show that human performance on an action identification task indicated that they clearly relied on kinematic information rather than on action semantics. The results show that both the model and human performance are highly accurate in an action similarity task based on kinematic-level features, which can provide an essential basis for classifying human actions. 

Place, publisher, year, edition, pages
IEEE, 2023
Keywords
Biological systems, Computation theory, Computational methods, Job analysis, Kinematics, Semantics, Action matching, Action similarity, Biological motion, Biological system modeling, Comparatives studies, Computational modelling, Kinematic primitive, Light display, Point light display, Task analysis, Optical flows, Biology, comparative study, computational model, Computational modeling, Data models, Dictionaries, kinematic primitives, Optical flow
National Category
Computer Sciences Human Computer Interaction
Research subject
Interaction Lab (ILAB)
Identifiers
urn:nbn:se:his:diva-22308 (URN)10.1109/TCDS.2023.3240302 (DOI)001126639000035 ()2-s2.0-85148457281 (Scopus ID)
Note

CC BY 4.0

Corresponding author: Vipul Nair.

This work has been partially carried out at the Machine Learning Genoa (MaLGa) center, Università di Genova (IT). It has been partially supported by AFOSR, grant n. FA8655-20-1-7035, and research collaboration between University of Skövde and Istituto Italiano di Tecnologia, Genoa.

Available from: 2023-03-02 Created: 2023-03-02 Last updated: 2024-06-24Bibliographically approved
3. Attentional synchrony in films: A window to visuospatial characterization of events
Open this publication in new window or tab >>Attentional synchrony in films: A window to visuospatial characterization of events
2022 (English)In: Proceedings SAP 2022: ACM Symposium on Applied Perception September 22 – 23, 2022 / [ed] Stephen N. Spencer, Association for Computing Machinery (ACM), 2022, article id 8Conference paper, Published paper (Refereed)
Abstract [en]

The study of event perception emphasizes the importance of visuospatial attributes in everyday human activities and how they influence event segmentation, prediction and retrieval. Attending to these visuospatial attributes is the first step toward event understanding, and therefore correlating attentional measures to such attributes would help to further our understanding of event comprehension. In this study, we focus on attentional synchrony amongst other attentional measures and analyze select film scenes through the lens of a visuospatial event model. Here we present the first results of an in-depth multimodal (such as head-turn, hand-action etc.) visuospatial analysis of 10 movie scenes correlated with visual attention (eye-tracking 32 participants per scene). With the results, we tease apart event segments of high and low attentional synchrony and describe the distribution of attention in relation to the visuospatial features. This analysis gives us an indirect measure of attentional saliency for a scene with a particular visuospatial complexity, ultimately directing the attentional selection of the observers in a given context.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2022
Keywords
Visuoauditory cues, Human-interaction, Eye-tracking, Attention
National Category
Human Computer Interaction Media and Communication Studies
Research subject
Interaction Lab (ILAB)
Identifiers
urn:nbn:se:his:diva-22205 (URN)10.1145/3548814.3551466 (DOI)001147642300008 ()2-s2.0-85139425610 (Scopus ID)978-1-4503-9455-0 (ISBN)
Conference
SAP 2022, ACM Symposium on Applied Perception September 22 – 23, 2022, TBC, USA
Note

CC BY 4.0

Available from: 2023-01-25 Created: 2023-01-25 Last updated: 2025-02-11Bibliographically approved

Open Access in DiVA

fulltext(32456 kB)652 downloads
File information
File name FULLTEXT01.pdfFile size 32456 kBChecksum SHA-512
14959145064b2ba8d71d76bf17870e1d658549246af13118ff89e545ed3729e73341539c622e7a400aa941bd66610f9b21871e0093497239cf11729d3791f36f
Type fulltextMimetype application/pdf

Authority records

Nair, Vipul

Search in DiVA

By author/editor
Nair, Vipul
By organisation
School of InformaticsInformatics Research Environment
Computer and Information SciencesHuman Aspects of ICTInteraction Technologies

Search outside of DiVA

GoogleGoogle Scholar
Total: 652 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 1194 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf