his.sePublikasjoner
Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Humanoids learning to walk: a natural CPG-actor-critic architecture
Högskolan i Skövde, Institutionen för kommunikation och information. Högskolan i Skövde, Forskningscentrum för Informationsteknologi.ORCID-id: 0000-0002-7236-997X
Högskolan i Skövde, Institutionen för kommunikation och information. Högskolan i Skövde, Forskningscentrum för Informationsteknologi.
Högskolan i Skövde, Institutionen för kommunikation och information. Högskolan i Skövde, Forskningscentrum för Informationsteknologi.
2013 (engelsk)Inngår i: Frontiers in Neurorobotics, ISSN 1662-5218, Vol. 7, nr 5Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

The identification of learning mechanisms for locomotion has been the subject of much research for some time but many challenges remain. Dynamic systems theory (DST) offers a novel approach to humanoid learning through environmental interaction. Reinforcement learning (RL) has offered a promising method to adaptively link the dynamic system to the environment it interacts with via a reward-based value system. In this paper, we propose a model that integrates the above perspectives and applies it to the case of a humanoid (NAO) robot learning to walk the ability of which emerges from its value-based interaction with the environment. In the model, a simplified central pattern generator (CPG) architecture inspired by neuroscientific research and DST is integrated with an actor-critic approach to RL (cpg-actor-critic). In the cpg-actor-critic architecture, least-square-temporal-difference based learning converges to the optimal solution quickly by using natural gradient learning and balancing exploration and exploitation. Futhermore, rather than using a traditional (designer-specified) reward it uses a dynamic value function as a stability indicator that adapts to the environment. The results obtained are analyzed using a novel DST-based embodied cognition approach. Learning to walk, from this perspective, is a process of integrating levels of sensorimotor activity and value.

sted, utgiver, år, opplag, sider
Frontiers Media S.A., 2013. Vol. 7, nr 5
Emneord [en]
reinforcement learning, humanoid walking, central pattern generators, actor-critic, dynamical systems theory, embodied cognition, value system
HSV kategori
Forskningsprogram
Teknik
Identifikatorer
URN: urn:nbn:se:his:diva-8368DOI: 10.3389/fnbot.2013.00005ISI: 000209437600005PubMedID: 23675345Scopus ID: 2-s2.0-84902356043OAI: oai:DiVA.org:his-8368DiVA, id: diva2:639509
Tilgjengelig fra: 2013-08-08 Laget: 2013-08-08 Sist oppdatert: 2018-01-11bibliografisk kontrollert

Open Access i DiVA

Fulltekst mangler i DiVA

Andre lenker

Forlagets fulltekstPubMedScopusLänk till fulltext

Personposter BETA

Li, CaiLowe, RobertZiemke, Tom

Søk i DiVA

Av forfatter/redaktør
Li, CaiLowe, RobertZiemke, Tom
Av organisasjonen
I samme tidsskrift
Frontiers in Neurorobotics

Søk utenfor DiVA

GoogleGoogle Scholar

doi
pubmed
urn-nbn

Altmetric

doi
pubmed
urn-nbn
Totalt: 684 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf