kth.sePublications KTH
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Understanding when spatial transformer networks do not support invariance, and what to do about it
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST). (Computational Brain Science Lab)ORCID iD: 0000-0001-8548-5788
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST). (Computational Brain Science Lab)ORCID iD: 0000-0003-0011-6444
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST). (Computational Brain Science Lab)ORCID iD: 0000-0002-9081-2170
2021 (English)In: ICPR 2020: International Conference on Pattern Recognition, Institute of Electrical and Electronics Engineers (IEEE) , 2021, p. 3427-3434Conference paper, Published paper (Refereed)
Abstract [en]

Spatial transformer networks (STNs) were designed to enable convolutional neural networks (CNNs) to learn invariance to image transformations. STNs were originally proposed to transform CNN feature maps as well as input images. This enables the use of more complex features when predicting transformation parameters. However, since STNs perform a purely spatial transformation, they do not, in the general case, have the ability to align the feature maps of a transformed image with those of its original. STNs are therefore unable to support invariance when transforming CNN feature maps. We present a simple proof for this and study the practical implications, showing that this inability is coupled with decreased classification accuracy. We therefore investigate alternative STN architectures that make use of complex features. We find that while deeper localization networks are difficult to train, localization networks that share parameters with the classification network remain stable as they grow deeper, which allows for higher classification accuracy on difficult datasets. Finally, we explore the interaction between localization network complexity and iterative image alignment.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE) , 2021. p. 3427-3434
Keywords [en]
deep learning, convolutional neural networks, invariant neural networks, spatial transformer networks
National Category
Computer graphics and computer vision
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-288723DOI: 10.1109/ICPR48806.2021.9412997ISI: 000678409203071Scopus ID: 2-s2.0-85106434896OAI: oai:DiVA.org:kth-288723DiVA, id: diva2:1516191
Conference
ICPR 2020: 25th International Conference on Pattern Recognition, Milan, Italy, January 10-15, 2021
Funder
Swedish Research Council, 2018-03586
Note

Not duplicate with DiVA 1428271QC 20210831

Available from: 2021-01-11 Created: 2021-01-11 Last updated: 2025-02-07Bibliographically approved

Open Access in DiVA

fulltext(1308 kB)650 downloads
File information
File name FULLTEXT01.pdfFile size 1308 kBChecksum SHA-512
cb5495cbb89ac38ab33278afc0600007b57c93637fa616bbc5e54b20b407e63fbaf4bd87f41c87d5ef58df9717feafe9bd4c9f068044cb1d2f67c3e10053b3ea
Type fulltextMimetype application/pdf

Other links

Publisher's full textScopusICPR 2020 home pagearXiv:2004.11678 extended version

Authority records

Finnveden, LukasJansson, YlvaLindeberg, Tony

Search in DiVA

By author/editor
Finnveden, LukasJansson, YlvaLindeberg, Tony
By organisation
Computational Science and Technology (CST)
Computer graphics and computer vision

Search outside of DiVA

GoogleGoogle Scholar
Total: 650 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 1687 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf