kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
GMC - Geometric Multimodal Contrastive Representation Learning
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).ORCID iD: 0000-0001-6920-5109
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Robotics, Perception and Learning, RPL.ORCID iD: 0000-0002-3599-440x
Show others and affiliations
2022 (English)Conference paper, Published paper (Refereed)
Abstract [en]

Learning representations of multimodal data that are both informative and robust to missing modalities at test time remains a challenging problem due to the inherent heterogeneity of data obtained from different channels. To address it, we present a novel Geometric Multimodal Contrastive (GMC) representation learning method comprised of two main components: i) a two level architecture consisting of modality-specific base encoder, allowing to process an arbitrary number of modalities to an intermediate representation of fixed dimensionality, and a shared projection head, mapping the intermediate representations to a latent representation space; ii) a multimodal contrastive loss function that encourages the geometric alignment of the learned representations. We experimentally demonstrate that GMC representations are semantically rich and achieve state-of-the-art performance with missing modality information on three different learning problems including prediction and reinforcement learning tasks.

Place, publisher, year, edition, pages
2022.
Keywords [en]
Representation Learning, Machine Learning, Multimodal, Contrastive Learning
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:kth:diva-312719OAI: oai:DiVA.org:kth-312719DiVA, id: diva2:1659731
Conference
International Conference on Machine Learning
Note

QC 20220614

Available from: 2022-05-20 Created: 2022-05-20 Last updated: 2022-06-25Bibliographically approved
In thesis
1. Learning and Evaluating the Geometric Structure of Representation Spaces
Open this publication in new window or tab >>Learning and Evaluating the Geometric Structure of Representation Spaces
2022 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Efficient representations of observed input data have been shown to significantly accelerate the performance of subsequent learning tasks in numerous domains. To obtain such representations automatically, we need to design both i) models that identify useful patterns in the input data and encode them into structured low dimensional representations, and ii) evaluation measures that accurately assess the quality of the resulting representations. In this thesis, we present work that addresses both these requirements, where we extensively focus on requirement ii) since the evaluation of representations has been largely unexplored in the machine learning research. We begin with an overview of representation learning techniques and different structures that can be imposed on representation spaces, thus first addressing i). In this regard,we present a representation learning model that identifies useful patterns from multimodal data, and describe an approach that promotes a structure on there presentation space that is favourable for performing a robotics task. We then thoroughly study the problem of assessing the quality of learned representations and overview the pitfalls of current practices. With this, we motivate the evaluation based on analyzing geometric properties of representations and present two novel evaluation algorithms constituting the core of this thesis. Finally, we present an application of the proposed evaluation algorithms to compare large input graphs.

Abstract [sv]

Effektive representationer av observerat input-data har visat sig ge ensignifikant ökning av prestandan för träningsproblem i ett flertal områden.För att på ett automatiskt sett få fram sådana representationer behövervi både i) modeller som kan identifiera användbara mönster i input-datatoch koda dessa till strukturerade lågdimensionella representationer, samtii) utvärderingsmått som på ett tillförlitligt sätt mäter kvaliteten av dessarepresentationer. I denna avhandling presenterar vi arbete som hanterar bådadessa krav, där fokus ligger på ii) eftersom utvärdering av representationerhar varit ett i stort sätt outforskat ämne i litteraturen för maskininlärning.Vi börjar med en översikt av representationsinlärningstekniker och typer avstrukturer som man kan förelägga på representationsrymden, vilket tillhöri). I detta avseende, presenterar vi modell för representationsinlärning somidentifierar användbara mönster från multimodal data, samt beskriver enmetod som framhäver struktur på representationsrymden som gör sig välpassande för robotikuppgift. Vi studerar sedan genomgående problemet medatt avgöra kvaliteten av dessa inlärda representationer och ger en översikt avvanliga fallgropar som finns med nuvarande metoder. Vi motiverar med dettautvärderingen baserat på av representationernas geometriska egenskaper ochpresenterar två nya utvärderingsalgoritmer vilka huvuddelen av avhandlingenbestår av. Slutligen så presenterar vi ett praktiskt användningsområde avalgoritmerna för att jämföra stora inputgrafer.

Place, publisher, year, edition, pages
KTH Royal Institute of Technology, 2022. p. 54
Series
TRITA-EECS-AVL ; 2022:33
Keywords
Representation Learning, Machine Learning, Generative Models
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-312723 (URN)978-91-8040-228-6 (ISBN)
Public defence
2022-06-13, https://kth-se.zoom.us/j/65953366981, F3, Lindstedtsvägen 26, Stockholm, 15:00 (English)
Opponent
Supervisors
Note

QC 20220523

Available from: 2022-05-23 Created: 2022-05-20 Last updated: 2022-06-25Bibliographically approved

Open Access in DiVA

fulltext(1168 kB)170 downloads
File information
File name FULLTEXT01.pdfFile size 1168 kBChecksum SHA-512
9b83e6a6b89436aa2a3020e61ef6efe3d48d6f89d0de319cf279b5bc1274302b4af9e9197508f3ab9c0c1460fad01ef6916633ca23c5d81854636f273e34995c
Type fulltextMimetype application/pdf

Authority records

Poklukar, PetraYin, HangKragic, Danica

Search in DiVA

By author/editor
Poklukar, PetraYin, HangKragic, Danica
By organisation
Computational Science and Technology (CST)Robotics, Perception and Learning, RPLCentre for Autonomous Systems, CAS
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 170 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 272 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf