kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Fine-Grained and Continual Visual Recognition for Assisting Visually Impaired People
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Robotics, Perception and Learning, RPL.ORCID iD: 0000-0002-8633-281X
2022 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

In recent years, computer vision-based assistive technologies have enabled visually impaired people to use automatic visual recognition on their mobile phones. These systems should be capable of recognizing objects on fine-grained levels to provide the user with accurate predictions. Additionally, the user should have the option to update the system continuously to recognize new objects of interest. However, there are several challenges that need to be tackled to enable such features with assistive vision systems in real and highly-varying environments. For instance, fine-grained image recognition usually requires large amounts of labeled data to be robust. Moreover, image classifiers struggle with retaining performance of previously learned abilities when they are adapted to new tasks. This thesis is divided into two parts where we address these challenges. First, we focus on the application of using assistive vision systems for grocery shopping, where items are naturally structured based on fine-grained details. We demonstrate how image classifiers can be trained with a combination of natural images and web-scraped information about the groceries to obtain more accurate classification performance compared to only using natural images for training. Thereafter, we bring forward a new approach for continual learning called replay scheduling, where we select which tasks to replay at different times to improve memory retention. Furthermore, we propose a novel framework for learning replay scheduling policies that can generalize to new continual learning scenarios for mitigating the catastrophic forgetting effect in image classifiers. This thesis provides insights on practical challenges that need to be addressed to enhance the usefulness of computer vision for assisting the visually impaired in real-world scenarios.

Abstract [sv]

De senaste åren har teknologiska hjälpmedel baserade på datorseende möjliggjort för synskadade personer att använda sig av automatisk visuell igenkänning på deras mobiltelefoner. Dessa system bör kunna känna igen objekt på finfördelade nivåer för att förse användaren med noggranna prediktioner. Användaren bör även ha möjligheten att uppdatera systemet kontinuerligt till att känna igen nya objekt av intresse. Dock finns det flera utmaningar som behöver avklaras för att aktivera dessa funktioner i synhjälpmedelssystem i reella och mycket varierande miljöer. Exempelvis behöver finfördelad bildigenkänning vanligtvis stora mängder märkt data för att vara robust. Dessutom har bildklassificerare besvär med att behålla sin prestanda av tidigare inlärda förmågor när de anpassas till nya uppgifter. Denna avhandling är uppdelad i två delar, där vi tar oss an dessa utmaningar. Först fokuserar vi på tillämpningen av att använda synhjälpmedelssystem för att handla matvaror, där varorna är naturligt strukturerade enligt finfördelade detaljer. Vi påvisar hur bildklassificerare kan tränas med en kombination av naturliga bilder och webbskrapad information om matvarorna för att erhålla mer träffsäker klassificeringsförmåga jämfört med att enbart använda naturliga bilder för träning. Därefter lägger vi fram ett nytt tillvägagångssätt för kontinuerlig inlärning som kallas replay scheduling (repris-schemaläggning), där vi väljer vilka uppgifter som ska repeteras vid olika tidpunkter för att förbättra bibehållande av minnen. Vi föreslår även ett nytt ramverk för inlärning av policyer för replay scheduling som kan generalisera till nya scenarion för kontinuerlig inlärning för att mildra effekten av katastrofal glömska i bildklassificerare. Denna avhandling ger insyn till praktiska utmaningar som behöver lösas för att förbättra användbarheten hos datorseende till att hjälpa synskadade personer i verkliga scenarier.

Place, publisher, year, edition, pages
Stockholm, Sweden: KTH Royal Institute of Technology, 2022. , p. 89
Series
TRITA-EECS-AVL ; 2022:63
Keywords [en]
Fine-Grained Image Recognition; Continual Learning; Visually Impaired People; Image Classification; Replay Scheduling
National Category
Computer graphics and computer vision
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-320067ISBN: 978-91-8040-377-1 (print)OAI: oai:DiVA.org:kth-320067DiVA, id: diva2:1703481
Public defence
2022-11-08, F3, Lindstedtsvägen 26, Stockholm, 09:00 (English)
Opponent
Supervisors
Funder
Promobilia foundation, F-16500
Note

QC 20221014

Available from: 2022-10-14 Created: 2022-10-13 Last updated: 2025-02-07Bibliographically approved
List of papers
1. A hierarchical grocery store image dataset with visual and semantic labels
Open this publication in new window or tab >>A hierarchical grocery store image dataset with visual and semantic labels
2019 (English)In: Proceedings - 2019 IEEE Winter Conference on Applications of Computer Vision, WACV 2019, Institute of Electrical and Electronics Engineers (IEEE), 2019, p. 491-500, article id 8658240Conference paper, Published paper (Refereed)
Abstract [en]

Image classification models built into visual support systems and other assistive devices need to provide accurate predictions about their environment. We focus on an application of assistive technology for people with visual impairments, for daily activities such as shopping or cooking. In this paper, we provide a new benchmark dataset for a challenging task in this application – classification of fruits, vegetables, and refrigerated products, e.g. milk packages and juice cartons, in grocery stores. To enable the learning process to utilize multiple sources of structured information, this dataset not only contains a large volume of natural images but also includes the corresponding information of the product from an online shopping website. Such information encompasses the hierarchical structure of the object classes, as well as an iconic image of each type of object. This dataset can be used to train and evaluate image classification models for helping visually impaired people in natural environments. Additionally, we provide benchmark results evaluated on pretrained convolutional neural networks often used for image understanding purposes, and also a multi-view variational autoencoder, which is capable of utilizing the rich product information in the dataset.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2019
Keywords
Benchmarking, Computer vision, Electronic commerce, Image classification, Large dataset, Learning systems, Neural networks, Semantics, Accurate prediction, Assistive technology, Classification models, Convolutional neural network, Hierarchical structures, Natural environments, Structured information, Visually impaired people, Classification (of information)
National Category
Computer graphics and computer vision Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-252223 (URN)10.1109/WACV.2019.00058 (DOI)000469423400051 ()2-s2.0-85063566822 (Scopus ID)
Conference
19th IEEE Winter Conference on Applications of Computer Vision, WACV 2019, 7 January 2019 through 11 January 2019
Note

QC 20190611

Part of ISBN 9781728119755

Available from: 2019-06-11 Created: 2019-06-11 Last updated: 2025-02-07Bibliographically approved
2. Using Variational Multi-view Learning for Classification of Grocery Items
Open this publication in new window or tab >>Using Variational Multi-view Learning for Classification of Grocery Items
2020 (English)In: Patterns, ISSN 2666-3899, Vol. 1, no 8Article in journal (Refereed) Published
Abstract [en]

An essential task for computer vision-based assistive technologies is to help visually impaired people to recognize objects in constrained environments, for instance, recognizing food items in grocery stores. In this paper, we introduce a novel dataset with natural images of groceries—fruits, vegetables, and packaged products—where all images have been taken inside grocery stores to resemble a shopping scenario. Additionally, we download iconic images and text descriptions for each item that can be utilized for better representation learning of groceries. We select a multi-view generative model, which can combine the different item information into lower-dimensional representations. The experiments show that utilizing the additional information yields higher accuracies on classifying grocery items than only using the natural images. We observe that iconic images help to construct representations separated by visual differences of the items, while text descriptions enable the model to distinguish between visually similar items by different ingredients.

Place, publisher, year, edition, pages
Elsevier, 2020
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-292181 (URN)
Note

QC 20220426

Available from: 2021-03-25 Created: 2021-03-25 Last updated: 2025-02-07Bibliographically approved
3. Learn the Time to Learn: Replay Scheduling in Continual Learning
Open this publication in new window or tab >>Learn the Time to Learn: Replay Scheduling in Continual Learning
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Replay-based continual learning have shown to be successful in mitigating catastrophic forgetting despite having limited access to historical data. However, storing historical data is cheap in many real-world applications, yet replaying all seen data would be prohibited due to processing time constraints. In such settings, we propose learning the time to learn for a continual learning system, in which we learn replay schedules over which tasks to replay at different time steps. To demonstrate the importance of learning the time to learn, we use Monte Carlo tree search in an ideal continual learning scenario to find the proper replay schedule. We perform extensive evaluations to show the benefits of replay scheduling in various memory settings and in combination with different replay methods. Moreover, the results indicate that the found schedules are consistent with human learning insights. Our findings opens up for new research directions that can bring current continual learning research closer to real-world needs.

Keywords
Continual Learning; Replay Memory
National Category
Computer graphics and computer vision
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-320005 (URN)
Funder
Promobilia foundation, F-16500
Note

QC 20221018

Available from: 2022-10-12 Created: 2022-10-12 Last updated: 2025-02-07Bibliographically approved
4. Policy Learning for Replay Scheduling in Continual Learning
Open this publication in new window or tab >>Policy Learning for Replay Scheduling in Continual Learning
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Scheduling over which tasks to select for replay at different times have been demonstrated to be important in continual learning. However, a replay scheduling policy that can be applied in any continual learning scenario is currently missing, which makes replay scheduling infeasible in real-world scenarios. To this end, we propose using reinforcement learning to enable learning policies that can be applied in new continual learning scenarios without additional computational cost. In our experiments, we show that the learned policies can propose replay schedules that efficiently mitigate catastrophic forgetting in environments with previously unseen task orders and datasets. The proposed approach opens up for new research directions in replay-based continual learning that stems well with real-world needs.

Keywords
Continual Learning; Replay Memory; Replay Scheduling; Reinforcement Learning
National Category
Computer graphics and computer vision
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-320006 (URN)
Funder
Promobilia foundation, F-16500
Note

QC 20221018

Available from: 2022-10-12 Created: 2022-10-12 Last updated: 2025-02-07Bibliographically approved

Open Access in DiVA

MarcusKlasson_thesis_kappa(2416 kB)409 downloads
File information
File name FULLTEXT01.pdfFile size 2416 kBChecksum SHA-512
bc14249021e03d42d30785dc9c56a8086200eaaf81f5479c0c29abdfd6da0af550dbdde94694a8d75f42ae16906a5534356c52e9c3b7b35957b9fdda573eff13
Type fulltextMimetype application/pdf

Other links

zoom link for online defense

Search in DiVA

By author/editor
Klasson, Marcus
By organisation
Robotics, Perception and Learning, RPL
Computer graphics and computer vision

Search outside of DiVA

GoogleGoogle Scholar
Total: 409 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 993 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf