kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
CoLLIE: Continual Learning of Language Grounding from Language-Image Embeddings
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0002-8579-1790
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0003-2140-0612
2022 (English)In: The journal of artificial intelligence research, ISSN 1076-9757, E-ISSN 1943-5037, Vol. 74, p. 1201-1223Article in journal (Refereed) Published
Abstract [en]

This paper presents CoLLIE: a simple, yet effective model for continual learning of how language is grounded in vision. Given a pre-trained multimodal embedding model, where language and images are projected in the same semantic space (in this case CLIP by OpenAI), CoLLIE learns a transformation function that adjusts the language embeddings when needed to accommodate new language use. This is done by predicting the difference vector that needs to be applied, as well as a scaling factor for this vector, so that the adjustment is only applied when needed. Unlike traditional few-shot learning, the model does not just learn new classes and labels, but can also generalize to similar language use and leverage semantic compositionality. We verify the model's performance on two different tasks of identifying the targets of referring expressions, where it has to learn new language use. The results show that the model can efficiently learn and generalize from only a few examples, with little interference with the model's original zero-shot performance.

Place, publisher, year, edition, pages
AI Access Foundation , 2022. Vol. 74, p. 1201-1223
National Category
Software Engineering Language Technology (Computational Linguistics)
Identifiers
URN: urn:nbn:se:kth:diva-315872DOI: 10.1613/JAIR.1.13689ISI: 000825139300002Scopus ID: 2-s2.0-85136141290OAI: oai:DiVA.org:kth-315872DiVA, id: diva2:1684836
Note

QC 20220728

Available from: 2022-07-28 Created: 2022-07-28 Last updated: 2023-06-14Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Skantze, GabrielWillemsen, Bram

Search in DiVA

By author/editor
Skantze, GabrielWillemsen, Bram
By organisation
Speech, Music and Hearing, TMH
In the same journal
The journal of artificial intelligence research
Software EngineeringLanguage Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 76 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf