Change search
ReferencesLink to record
Permanent link

Direct link
Language bootstrapping: Learning Word Meanings From Perception-Action Association
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.ORCID iD: 0000-0002-3323-5311
2012 (English)In: IEEE transactions on systems, man and cybernetics. Part B. Cybernetics, ISSN 1083-4419, E-ISSN 1941-0492, Vol. 42, no 3, 660-671 p.Article in journal (Refereed) Published
Abstract [en]

We address the problem of bootstrapping language acquisition for an artificial system similarly to what is observed in experiments with human infants. Our method works by associating meanings to words in manipulation tasks, as a robot interacts with objects and listens to verbal descriptions of the interactions. The model is based on an affordance network, i.e., a mapping between robot actions, robot perceptions and the perceived effects of these actions upon objects. We extend the affordance model to incorporate spoken words, which allows us to ground the verbal symbols to the execution of actions and the perception of the environment.The model takes verbal descriptions of a task as the input, and uses temporal co-occurrence to create links between speech utterances and the involved objects, actions and effects. We show that the robot is able form useful word-to-meaning associations, even without considering grammatical structure in the learning process and in the presence of recognition errors. These word-to-meaning associations are embedded in the robot’s own understanding of its actions. Thus, they can be directly used to instruct the robot to perform tasks and also allow to incorporate context in the speech recognition task. We believe that the encouraging results with our approach may afford robots with a capacity to acquire language descriptors in their operation's environment as well as to shed some light as to how this challenging process develops with human infants.

Place, publisher, year, edition, pages
2012. Vol. 42, no 3, 660-671 p.
Keyword [en]
Affordances, automatic speech recognition, Bayesian networks, cognitive robotics, grasping, humanoid robots, language, unsupervised learning
National Category
Computer Science Language Technology (Computational Linguistics)
URN: urn:nbn:se:kth:diva-52231DOI: 10.1109/TSMCB.2011.2172420ISI: 000304163200007ScopusID: 2-s2.0-84861192014OAI: diva2:465527
QC 20120614Available from: 2011-12-14 Created: 2011-12-14 Last updated: 2012-06-14Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textScopus

Search in DiVA

By author/editor
Salvi, Giampiero
By organisation
Speech Communication and Technology
In the same journal
IEEE transactions on systems, man and cybernetics. Part B. Cybernetics
Computer ScienceLanguage Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 40 hits
ReferencesLink to record
Permanent link

Direct link