Improving Visual Question Answering by Leveraging Depth and Adapting Explainability
2022 (English)In: 2022 31St Ieee International Conference On Robot And Human Interactive Communication (Ieee Ro-Man 2022), Institute of Electrical and Electronics Engineers (IEEE) , 2022, p. 252-259Conference paper, Published paper (Refereed)
Abstract [en]
During human-robot conversation, it is critical for robots to be able to answer users' questions accurately and provide a suitable explanation for why they arrive at the answer they provide. Depth is a crucial component in producing more intelligent robots that can respond correctly as some questions might rely on spatial relations within the scene, for which 2D RGB data alone would be insufficient. Due to the lack of existing depth datasets for the task of VQA, we introduce a new dataset, VQA-SUNRGBD. When we compare our proposed model on this RGB-D dataset against the baseline VQN network on RGB data alone, we show that ours outperforms, particularly in questions relating to depth such as asking about the proximity of objects and relative positions of objects to one another. We also provide Grad-CAM activations to gain insight regarding the predictions on depth-related questions and find that our method produces better visual explanations compared to Grad-CAM on RGB data. To our knowledge, this work is the first of its kind to leverage depth and an explainability module to produce an explainable Visual Question Answering (VQA) system.
Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE) , 2022. p. 252-259
Keywords [en]
Visual Question Answering, Leveraging Depth, Explainability
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:kth:diva-322304DOI: 10.1109/RO-MAN53752.2022.9900586ISI: 000885903300037Scopus ID: 2-s2.0-85140744461OAI: oai:DiVA.org:kth-322304DiVA, id: diva2:1718210
Conference
31st IEEE International Conference on Robot and Human Interactive Communication (RO-MAN) - Social, Asocial, and Antisocial Robots, AUG 29-SEP 02, 2022, Napoli, ITALY
Note
QC 20221212
Part of proceedings: ISBN 978-1-7281-8859-1
2022-12-122022-12-122022-12-15Bibliographically approved