Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A hierarchical grocery store image dataset with visual and semantic labels
KTH, School of Electrical Engineering and Computer Science (EECS), Robotics, Perception and Learning, RPL.
KTH, School of Electrical Engineering and Computer Science (EECS), Robotics, Perception and Learning, RPL.ORCID iD: 0000-0002-5750-9655
2019 (English)In: Proceedings - 2019 IEEE Winter Conference on Applications of Computer Vision, WACV 2019, Institute of Electrical and Electronics Engineers (IEEE), 2019, p. 491-500, article id 8658240Conference paper, Published paper (Refereed)
Abstract [en]

Image classification models built into visual support systems and other assistive devices need to provide accurate predictions about their environment. We focus on an application of assistive technology for people with visual impairments, for daily activities such as shopping or cooking. In this paper, we provide a new benchmark dataset for a challenging task in this application – classification of fruits, vegetables, and refrigerated products, e.g. milk packages and juice cartons, in grocery stores. To enable the learning process to utilize multiple sources of structured information, this dataset not only contains a large volume of natural images but also includes the corresponding information of the product from an online shopping website. Such information encompasses the hierarchical structure of the object classes, as well as an iconic image of each type of object. This dataset can be used to train and evaluate image classification models for helping visually impaired people in natural environments. Additionally, we provide benchmark results evaluated on pretrained convolutional neural networks often used for image understanding purposes, and also a multi-view variational autoencoder, which is capable of utilizing the rich product information in the dataset.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2019. p. 491-500, article id 8658240
Keywords [en]
Benchmarking, Computer vision, Electronic commerce, Image classification, Large dataset, Learning systems, Neural networks, Semantics, Accurate prediction, Assistive technology, Classification models, Convolutional neural network, Hierarchical structures, Natural environments, Structured information, Visually impaired people, Classification (of information)
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
URN: urn:nbn:se:kth:diva-252223DOI: 10.1109/WACV.2019.00058ISI: 000469423400051Scopus ID: 2-s2.0-85063566822ISBN: 9781728119755 (print)OAI: oai:DiVA.org:kth-252223DiVA, id: diva2:1322857
Conference
19th IEEE Winter Conference on Applications of Computer Vision, WACV 2019, 7 January 2019 through 11 January 2019
Note

QC 20190611

Available from: 2019-06-11 Created: 2019-06-11 Last updated: 2019-06-26Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records BETA

Klasson, MarcusKjellström, Hedvig

Search in DiVA

By author/editor
Klasson, MarcusKjellström, Hedvig
By organisation
Robotics, Perception and Learning, RPL
Computer Vision and Robotics (Autonomous Systems)

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 210 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf