kth.sePublikationer KTH
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Audio-Visual Classification and Detection of Human Manipulation Actions
KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Autonoma System, CAS.ORCID-id: 0000-0003-2314-2880
KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.ORCID-id: 0000-0002-3323-5311
Universidad de Granada, Spain.ORCID-id: 0000-0003-3731-0582
KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Autonoma System, CAS.ORCID-id: 0000-0002-5750-9655
2014 (Engelska)Ingår i: 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2014), IEEE conference proceedings, 2014, s. 3045-3052Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Humans are able to merge information from multiple perceptional modalities and formulate a coherent representation of the world. Our thesis is that robots need to do the same in order to operate robustly and autonomously in an unstructured environment. It has also been shown in several fields that multiple sources of information can complement each other, overcoming the limitations of a single perceptual modality. Hence, in this paper we introduce a data set of actions that includes both visual data (RGB-D video and 6DOF object pose estimation) and acoustic data. We also propose a method for recognizing and segmenting actions from continuous audio-visual data. The proposed method is employed for extensive evaluation of the descriptive power of the two modalities, and we discuss how they can be used jointly to infer a coherent interpretation of the recorded action.

Ort, förlag, år, upplaga, sidor
IEEE conference proceedings, 2014. s. 3045-3052
Serie
IEEE International Conference on Intelligent Robots and Systems, ISSN 2153-0858
Nyckelord [en]
Acoustic data, Audio-visual, Audio-visual data, Coherent representations, Human manipulation, Multiple source, Unstructured environments, Visual data
Nationell ämneskategori
Datorgrafik och datorseende
Identifikatorer
URN: urn:nbn:se:kth:diva-158004DOI: 10.1109/IROS.2014.6942983ISI: 000349834603023Scopus ID: 2-s2.0-84911478073ISBN: 978-1-4799-6934-0 (tryckt)OAI: oai:DiVA.org:kth-158004DiVA, id: diva2:773353
Konferens
2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2014, Palmer House Hilton Hotel Chicago, United States, 14 September 2014 through 18 September 2014
Anmärkning

QC 20150122

Tillgänglig från: 2014-12-18 Skapad: 2014-12-18 Senast uppdaterad: 2025-02-07Bibliografiskt granskad
Ingår i avhandling
1. Action Recognition for Robot Learning
Öppna denna publikation i ny flik eller fönster >>Action Recognition for Robot Learning
2015 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

This thesis builds on the observation that robots cannot be programmed to handle any possible situation in the world. Like humans, they need mechanisms to deal with previously unseen situations and unknown objects. One of the skills humans rely on to deal with the unknown is the ability to learn by observing others. This thesis addresses the challenge of enabling a robot to learn from a human instructor. In particular, it is focused on objects. How can a robot find previously unseen objects? How can it track the object with its gaze? How can the object be employed in activities? Throughout this thesis, these questions are addressed with the end goal of allowing a robot to observe a human instructor and learn how to perform an activity. The robot is assumed to know very little about the world and it is supposed to discover objects autonomously. Given a visual input, object hypotheses are formulated by leveraging on common contextual knowledge often used by humans (e.g. gravity, compactness, convexity). Moreover, unknown objects are tracked and their appearance is updated over time since only a small fraction of the object is visible from the robot initially. Finally, object functionality is inferred by looking how the human instructor is manipulating objects and how objects are used in relation to others. All the methods included in this thesis have been evaluated on datasets that are publicly available or that we collected, showing the importance of these learning abilities.

Ort, förlag, år, upplaga, sidor
Stockholm: KTH Royal Institute of Technology, 2015. s. v, 38
Serie
TRITA-CSC-A, ISSN 1653-5723 ; 2015:09
Nationell ämneskategori
Datorgrafik och datorseende
Forskningsämne
Datalogi
Identifikatorer
urn:nbn:se:kth:diva-165680 (URN)
Disputation
2015-05-21, F3, Lindstedtsvägen 26, KTH, Stockholm, 10:00 (Engelska)
Opponent
Handledare
Anmärkning

QC 20150504

Tillgänglig från: 2015-05-04 Skapad: 2015-04-29 Senast uppdaterad: 2025-02-07Bibliografiskt granskad

Open Access i DiVA

fulltext(11742 kB)700 nedladdningar
Filinformation
Filnamn FULLTEXT01.pdfFilstorlek 11742 kBChecksumma SHA-512
a3cc92c1e8f3e5292e6b1e4a4ba19090292fe3883384724f4fe562d45ff4ef59c0697c4b906e522d160fd4fcd2c724f9f8723c2a2e83d0a475450b77a1fc50b0
Typ fulltextMimetyp application/pdf

Övriga länkar

Förlagets fulltextScopusIEEEXploreConference website

Person

Pieropan, AlessandroSalvi, GiampieroPauwels, KarlKjellström, Hedvig

Sök vidare i DiVA

Av författaren/redaktören
Pieropan, AlessandroSalvi, GiampieroPauwels, KarlKjellström, Hedvig
Av organisationen
Datorseende och robotik, CVAPCentrum för Autonoma System, CASTal, musik och hörsel, TMH
Datorgrafik och datorseende

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 700 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

doi
isbn
urn-nbn

Altmetricpoäng

doi
isbn
urn-nbn
Totalt: 482 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf