Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Improving Visual Question Answering by Leveraging Depth and Adapting Explainability
KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Robotik, perception och lärande, RPL.
KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Robotik, perception och lärande, RPL.ORCID-id: 0000-0002-1733-7019
KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Robotik, perception och lärande, RPL.ORCID-id: 0000-0002-2212-4325
2022 (engelsk)Inngår i: 2022 31St Ieee International Conference On Robot And Human Interactive Communication (Ieee Ro-Man 2022), Institute of Electrical and Electronics Engineers (IEEE) , 2022, s. 252-259Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

During human-robot conversation, it is critical for robots to be able to answer users' questions accurately and provide a suitable explanation for why they arrive at the answer they provide. Depth is a crucial component in producing more intelligent robots that can respond correctly as some questions might rely on spatial relations within the scene, for which 2D RGB data alone would be insufficient. Due to the lack of existing depth datasets for the task of VQA, we introduce a new dataset, VQA-SUNRGBD. When we compare our proposed model on this RGB-D dataset against the baseline VQN network on RGB data alone, we show that ours outperforms, particularly in questions relating to depth such as asking about the proximity of objects and relative positions of objects to one another. We also provide Grad-CAM activations to gain insight regarding the predictions on depth-related questions and find that our method produces better visual explanations compared to Grad-CAM on RGB data. To our knowledge, this work is the first of its kind to leverage depth and an explainability module to produce an explainable Visual Question Answering (VQA) system.

sted, utgiver, år, opplag, sider
Institute of Electrical and Electronics Engineers (IEEE) , 2022. s. 252-259
Emneord [en]
Visual Question Answering, Leveraging Depth, Explainability
HSV kategori
Identifikatorer
URN: urn:nbn:se:kth:diva-322304DOI: 10.1109/RO-MAN53752.2022.9900586ISI: 000885903300037Scopus ID: 2-s2.0-85140744461OAI: oai:DiVA.org:kth-322304DiVA, id: diva2:1718210
Konferanse
31st IEEE International Conference on Robot and Human Interactive Communication (RO-MAN) - Social, Asocial, and Antisocial Robots, AUG 29-SEP 02, 2022, Napoli, ITALY
Merknad

QC 20221212

Part of proceedings: ISBN 978-1-7281-8859-1

Tilgjengelig fra: 2022-12-12 Laget: 2022-12-12 Sist oppdatert: 2022-12-15bibliografisk kontrollert

Open Access i DiVA

Fulltekst mangler i DiVA

Andre lenker

Forlagets fulltekstScopus

Person

Panesar, AmritaDogan, Fethiye IrmakLeite, Iolanda

Søk i DiVA

Av forfatter/redaktør
Panesar, AmritaDogan, Fethiye IrmakLeite, Iolanda
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric

doi
urn-nbn
Totalt: 72 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf