his.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Humanoids learning to walk: a natural CPG-actor-critic architecture
University of Skövde, School of Humanities and Informatics. University of Skövde, The Informatics Research Centre.
University of Skövde, School of Humanities and Informatics. University of Skövde, The Informatics Research Centre.
University of Skövde, School of Humanities and Informatics. University of Skövde, The Informatics Research Centre.
2013 (English)In: Frontiers in Neurorobotics, ISSN 1662-5218, Vol. 7, no 5Article in journal (Refereed) Published
Abstract [en]

The identification of learning mechanisms for locomotion has been the subject of much research for some time but many challenges remain. Dynamic systems theory (DST) offers a novel approach to humanoid learning through environmental interaction. Reinforcement learning (RL) has offered a promising method to adaptively link the dynamic system to the environment it interacts with via a reward-based value system. In this paper, we propose a model that integrates the above perspectives and applies it to the case of a humanoid (NAO) robot learning to walk the ability of which emerges from its value-based interaction with the environment. In the model, a simplified central pattern generator (CPG) architecture inspired by neuroscientific research and DST is integrated with an actor-critic approach to RL (cpg-actor-critic). In the cpg-actor-critic architecture, least-square-temporal-difference based learning converges to the optimal solution quickly by using natural gradient learning and balancing exploration and exploitation. Futhermore, rather than using a traditional (designer-specified) reward it uses a dynamic value function as a stability indicator that adapts to the environment. The results obtained are analyzed using a novel DST-based embodied cognition approach. Learning to walk, from this perspective, is a process of integrating levels of sensorimotor activity and value.

Place, publisher, year, edition, pages
Frontiers Media S.A., 2013. Vol. 7, no 5
Keyword [en]
reinforcement learning, humanoid walking, central pattern generators, actor-critic, dynamical systems theory, embodied cognition, value system
National Category
Computer and Information Science
Research subject
Technology
Identifiers
URN: urn:nbn:se:his:diva-8368DOI: 10.3389/fnbot.2013.00005ISI: 000209437600005PubMedID: 23675345Scopus ID: 2-s2.0-84902356043OAI: oai:DiVA.org:his-8368DiVA: diva2:639509
Available from: 2013-08-08 Created: 2013-08-08 Last updated: 2017-05-12Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textPubMedScopusLänk till fulltext

Search in DiVA

By author/editor
Li, CaiLowe, RobertZiemke, Tom
By organisation
School of Humanities and InformaticsThe Informatics Research Centre
In the same journal
Frontiers in Neurorobotics
Computer and Information Science

Search outside of DiVA

GoogleGoogle Scholar

Altmetric score

Total: 601 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf