kth.sePublications KTH
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Variable Rate Allocation for Vector-Quantized Autoencoders
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Robotics, Perception and Learning, RPL.ORCID iD: 0000-0001-8152-767x
2023 (English)In: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Institute of Electrical and Electronics Engineers (IEEE) , 2023Conference paper, Published paper (Refereed)
Abstract [en]

Vector-quantized autoencoders have recently gained interest in image compression, generation and self-supervised learning. However, as a neural compression method, they lack the possibility to allocate a variable number of bits to each image location, e.g. according to the semantic content or local saliency. In this paper, we address this limitation in a simple yet effective way. We adopt a product quantizer (PQ) that produces a set of discrete codes for each image patch rather than a single index. This PQ-autoencoder is trained end-to-end with a structured dropout that selectively masks a variable number of codes at each location. These mechanisms force the decoder to reconstruct the original image based on partial information and allow us to control the local rate. The resulting model can compress images on a wide range of operating points of the rate-distortion curve and can be paired with any external method for saliency estimation to control the compression rate at a local level. We demonstrate the effectiveness of our approach on the popular Kodak and ImageNet datasets by measuring both distortion and perceptual quality metrics.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE) , 2023.
National Category
Computer graphics and computer vision
Identifiers
URN: urn:nbn:se:kth:diva-326854DOI: 10.1109/ICASSP49357.2023.10095451Scopus ID: 2-s2.0-85168851171OAI: oai:DiVA.org:kth-326854DiVA, id: diva2:1756662
Conference
ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Note

QC 20230516

Available from: 2023-05-12 Created: 2023-05-12 Last updated: 2025-02-07Bibliographically approved
In thesis
1. Structured Representations for Explainable Deep Learning
Open this publication in new window or tab >>Structured Representations for Explainable Deep Learning
2023 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Deep learning has revolutionized scientific research and is being used to take decisions in increasingly complex scenarios. With growing power comes a growing demand for transparency and interpretability. The field of Explainable AI aims to provide explanations for the predictions of AI systems. The state of the art of AI explainability, however, is far from satisfactory. For example, in Computer Vision, the most prominent post-hoc explanation methods produce pixel-wise heatmaps over the input domain, which are meant to visualize the importance of individual pixels of an image or video. We argue that such dense attribution maps are poorly interpretable to non-expert users because of the domain in which explanations are formed - we may recognize shapes in a heatmap but they are just blobs of pixels. In fact, the input domain is closer to the raw data of digital cameras than to the interpretable structures that humans use to communicate, e.g. objects or concepts. In this thesis, we propose to move beyond dense feature attributions by adopting structured internal representations as a more interpretable explanation domain. Conceptually, our approach splits a Deep Learning model in two: the perception step that takes as input dense representations and the reasoning step that learns to perform the task at hand. At the interface between the two are structured representations that correspond to well-defined objects, entities, and concepts. These representations serve as the interpretable domain for explaining the predictions of the model, allowing us to move towards more meaningful and informative explanations. The proposed approach introduces several challenges, such as how to obtain structured representations, how to use them for downstream tasks, and how to evaluate the resulting explanations. The works included in this thesis address these questions, validating the approach and providing concrete contributions to the field. For the perception step, we investigate how to obtain structured representations from dense representations, whether by manually designing them using domain knowledge or by learning them from data without supervision. For the reasoning step, we investigate how to use structured representations for downstream tasks, from Biology to Computer Vision, and how to evaluate the learned representations. For the explanation step, we investigate how to explain the predictions of models that operate in a structured domain, and how to evaluate the resulting explanations. Overall, we hope that this work inspires further research in Explainable AI and helps bridge the gap between high-performing Deep Learning models and the need for transparency and interpretability in real-world applications.

Abstract [sv]

Deep Learning har revolutionerat den vetenskapliga forskningen och används för att fatta beslut i allt mer komplexa scenarier. Med växande makt kommer ett växande krav på transparens och tolkningsbarhet. Området Explainable AI syftar till att ge förklaringar till AI-systems förutsägelser. Prestandan hos existerande lösningar för AI-förklarbarhet är dock långt ifrån tillfredsställande.Till exempel, inom datorseendeområdet, producerar de mest framträdande post-hoc-förklaringsmetoderna pixelvisa värmekartor, som är avsedda att visualisera hur viktiga enskilda pixlar är i en bild eller video. Vi hävdar att sådana metoder är svårtolkade på grund av den domän där förklaringar bildas - vi kanske känner igen former i en värmekarta men de är bara pixlar. Faktum är att indatadomänen ligger närmare digitalkamerors rådata än de strukturer som människor använder för att kommunicera, t.ex. objekt eller koncept.I den här avhandlingen föreslår vi att vi går bortom täta egenskapsattributioner genom att använda strukturerade interna representationer som en mer tolkningsbar förklaringsdomän. Begreppsmässigt delar vårt tillvägagångssätt en Deep Learning-modell i två: perception-steget som tar täta representationer som indata och reasoning-steget som lär sig att utföra uppgiften. I gränssnittet mellan de två finns strukturerade representationer som motsvarar väldefinierade objekt, entiteter och begrepp. Dessa representationer fungerar som den tolkbara domänen för att förklara modellens förutsägelser, vilket gör att vi kan gå mot mer meningsfulla och informativa förklaringar.Det föreslagna tillvägagångssättet introducerar flera utmaningar, såsom hur man skapar strukturerade representationer, hur man använder dem för senare uppgifter och hur man utvärderar de resulterande förklaringarna. Forskningen som ingår i denna avhandling tar upp dessa frågor, validerar tillvägagångssättet och ger konkreta bidrag till området. För steget perception undersöker vi hur man får strukturerade representationer från täta representationer, antingen genom att manuellt designa dem med hjälp av domänkunskap eller genom att lära dem från data utan övervakning. För steget reasoning undersöker vi hur man använder strukturerade representationer för senare uppgifter, från biologi till datorseende, och hur man utvärderar de inlärda representationerna. För steget explanation undersöker vi hur man förklarar förutsägelserna för modeller som fungerar i en strukturerad domän, och hur man utvärderar de resulterande förklaringarna. Sammantaget hoppas vi att detta arbete inspirerar till ytterligare forskning inom Explainable AI och hjälper till att överbrygga klyftan mellan högpresterande Deep Learning-modeller och behovet av transparens och tolkningsbarhet i verkliga applikationer.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2023. p. xi, 103
Series
TRITA-EECS-AVL ; 2023:49
Keywords
Explainable AI, Deep Learning, Self-supervised Learning, Transformers, Graph Networks, Computer Vision, Explainable AI, Deep Learning, Self-supervised Learning, Transformers, Graph Networks, Computer Vision
National Category
Computer graphics and computer vision
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-326958 (URN)978-91-8040-606-2 (ISBN)
Public defence
2023-06-12, F3 https://kth-se.zoom.us/j/66725845533, Lindstedtsvägen 26, Stockholm, 14:00 (English)
Opponent
Supervisors
Funder
Swedish Research Council, 2017-04609
Note

QC 20230516

Available from: 2023-05-16 Created: 2023-05-16 Last updated: 2025-02-07Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Baldassarre, Federico

Search in DiVA

By author/editor
Baldassarre, Federico
By organisation
Robotics, Perception and Learning, RPL
Computer graphics and computer vision

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 237 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf