A growing body of evidence in cognitive science and neuroscience points towards the existence of a deep interconnection between cognition, perception and action. According to this embodied perspective language is grounded in the sensorimotor system and language understanding is based on a mental simulation process (Jeannerod, 2007; Gallese, 2008; Barsalou, 2009). This means that during action words and sentence comprehension the same perception, action, and emotion mechanisms implied during interaction with objects are recruited. Among the neural underpinnings of this simulation process an important role is played by a sensorimotor matching system known as the mirror neuron system (Rizzolatti and Craighero, 2004). Despite a growing number of studies, the precise dynamics underlying the relation between language and action are not yet well understood. In fact, experimental studies are not always coherent as some report that language processing interferes with action execution while others find facilitation. In this work we present a detailed neural network model capable of reproducing experimentally observed influences of the processing of action-related sentences on the execution of motor sequences. The proposed model is based on three main points. The first is that the processing of action-related sentences causes the resonance of motor and mirror neurons encoding the corresponding actions. The second is that there exists a varying degree of crosstalk between neuronal populations depending on whether they encode the same motor act, the same effector or the same action-goal. The third is the fact that neuronal populations’ internal dynamics, which results from the combination of multiple processes taking place at different time scales, can facilitate or interfere with successive activations of the same or of partially overlapping pools.
The identification of learning mechanisms for locomotion has been the subject of much research for some time but many challenges remain. Dynamic systems theory (DST) offers a novel approach to humanoid learning through environmental interaction. Reinforcement learning (RL) has offered a promising method to adaptively link the dynamic system to the environment it interacts with via a reward-based value system. In this paper, we propose a model that integrates the above perspectives and applies it to the case of a humanoid (NAO) robot learning to walk the ability of which emerges from its value-based interaction with the environment. In the model, a simplified central pattern generator (CPG) architecture inspired by neuroscientific research and DST is integrated with an actor-critic approach to RL (cpg-actor-critic). In the cpg-actor-critic architecture, least-square-temporal-difference based learning converges to the optimal solution quickly by using natural gradient learning and balancing exploration and exploitation. Futhermore, rather than using a traditional (designer-specified) reward it uses a dynamic value function as a stability indicator that adapts to the environment. The results obtained are analyzed using a novel DST-based embodied cognition approach. Learning to walk, from this perspective, is a process of integrating levels of sensorimotor activity and value.
Both nociception and punishment signals have been used in robotics. However, the potential for using these negatively valenced types of reinforcement learning signals for robot learning has not been exploited in detail yet. Nociceptive signals are primarily used as triggers of preprogrammed action sequences. Punishment signals are typically disembodied, i. e., with no or little relation to the agent-intrinsic limitations, and they are often used to impose behavioral constraints. Here, we provide an alternative approach for nociceptive signals as drivers of learning rather than simple triggers of preprogrammed behavior. Explicitly, we use nociception to expand the state space while we use punishment as a negative reinforcement learning signal. We compare the performance-in terms of task error, the amount of perceived nociception, and length of learned action sequences-of different neural networks imbued with punishment-based reinforcement signals for inverse kinematic learning. We contrast the performance of a version of the neural network that receives nociceptive inputs to that without such a process. Furthermore, we provide evidence that nociception can improve learning-making the algorithm more robust against network initializations-as well as behavioral performance by reducing the task error, perceived nociception, and length of learned action sequences. Moreover, we provide evidence that punishment, at least as typically used within reinforcement learning applications, may be detrimental in all relevant metrics.