kth.sePublications KTH
Change search
Link to record
Permanent link

Direct link
Publications (10 of 14) Show all publications
Jansson, Y. & Lindeberg, T. (2022). Scale-invariant scale-channel networks: Deep networks that generalise to previously unseen scales. Journal of Mathematical Imaging and Vision, 64(5), 506-536
Open this publication in new window or tab >>Scale-invariant scale-channel networks: Deep networks that generalise to previously unseen scales
2022 (English)In: Journal of Mathematical Imaging and Vision, ISSN 0924-9907, E-ISSN 1573-7683, Vol. 64, no 5, p. 506-536Article in journal (Refereed) Published
Abstract [en]

The ability to handle large scale variations is crucial for many real world visual tasks. A straightforward approach for handling scale in a deep network is toprocess an image at several scales simultaneously in a set of scale channels. Scale invariance can then, in principle, be achieved by using weight sharing between the scale channels together with max or average pooling over the outputs from the scale channels. The ability of such scale-channel networks to generalise to scales not present in the training set over significant scale ranges has, however, not previously been explored. 

In this paper, we present a systematic study of this methodology by implementing different types of scale-channel networks and evaluating their ability to generalise to previously unseen scales. We develop a formalism for analysing the covariance and invariance properties of scale-channel networks, including exploring their relations to scale-space theory, and exploring how different design choices, unique to scaling transformations, affect the overall performance of scale-channel networks. We first show that two previously proposed scale-channel network designs, in one case, generalise no better than a standard CNN to scales not present in the training set, and in the second case, have limited scale generalisation ability. We explain theoretically and demonstrate experimentally why generalisation fails or is limited in these cases. We then propose a new type of foveated scale-channel architecture, where the scale channels process increasingly larger parts of the image with decreasing resolution. This new type of scale-channel network is shown to generalise extremely well, provided sufficient image resolution and the absence of boundary effects. Our proposed FovMax and FovAvg networks perform almost identically over a scale range of 8, also when training on single-scale training data, and do also give improved performance  when learning from datasets with large scale variations in the small sample regime.

Place, publisher, year, edition, pages
Springer Science+Business Media B.V., 2022
Keywords
Deep learning, Convolutional neural networks, Invariant neural networks, Scale covariance, Scale invariance, Scale generalisation, Scale space
National Category
Computer graphics and computer vision
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-309251 (URN)10.1007/s10851-022-01082-2 (DOI)000780653200001 ()2-s2.0-85127949060 (Scopus ID)
Projects
Scale-space theory for covariant and invariant visual perception
Funder
Swedish Research Council, 2018-03586
Note

QC 20220530

Available from: 2022-02-24 Created: 2022-02-24 Last updated: 2025-02-07Bibliographically approved
Jansson, Y. & Lindeberg, T. (2021). Exploring the ability of CNNs to generalise to previously unseen scales over wide scale ranges. In: ICPR 2020: International Conference on Pattern Recognition: . Paper presented at ICPR 2020: 25th International Conference on Pattern Recognition, Milan, Italy, January 10-15, 2021 (pp. 1181-1188). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Exploring the ability of CNNs to generalise to previously unseen scales over wide scale ranges
2021 (English)In: ICPR 2020: International Conference on Pattern Recognition, Institute of Electrical and Electronics Engineers (IEEE) , 2021, p. 1181-1188Conference paper, Published paper (Refereed)
Abstract [en]

The ability to handle large scale variations is crucial for many real world visual tasks. A straightforward approach for handling scale in a deep network is to process an image at several scales simultaneously in a set of scale channels. Scale invariance can then, in principle, be achieved by using weight sharing between the scale channels together with max or average pooling over the outputs from the scale channels. The ability of such scale channel networks to generalise to scales not present in the training set over significant scale ranges has, however, not previously been explored. We, therefore, present a theoretical analysis of invariance and covariance properties of scale channel networks and perform an experimental evaluation of the ability of different types of scale channel networks to generalise to previously unseen scales. We identify limitations of previous approaches and propose a new type of foveated scale channel architecture, where the scale channels process increasingly larger parts of the image with decreasing resolution. Our proposed FovMax and FovAvg networks perform almost identically over a scale range of 8, also when training on single scale training data, and do also give improvements in the small sample regime.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2021
Keywords
deep learning, convolutional neural networks, invariant neural networks, scale invariance
National Category
Computer graphics and computer vision
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-288539 (URN)10.1109/ICPR48806.2021.9413276 (DOI)000678409201038 ()2-s2.0-85103171938 (Scopus ID)
Conference
ICPR 2020: 25th International Conference on Pattern Recognition, Milan, Italy, January 10-15, 2021
Funder
Swedish Research Council, 2018-03586
Note

Part of proceedings: ISBN 978-1-7281-8808-9, Not duplicate with diva 1423788, QC 20220517

Available from: 2021-01-08 Created: 2021-01-08 Last updated: 2025-02-07Bibliographically approved
Jansson, Y. & Lindeberg, T. (2021). Scale-invariant scale-channel networks: Deep networks that generalise to previously unseen scales.
Open this publication in new window or tab >>Scale-invariant scale-channel networks: Deep networks that generalise to previously unseen scales
2021 (English)Report (Other academic)
Abstract [en]

The ability to handle large scale variations is crucial for many real world visual tasks. A straightforward approach for handling scale in a deep network is to process an image at several scales simultaneously in a set of scale channels. Scale invariance can then, in principle, be achieved by using weight sharing between the scale channels together with max or average pooling over the outputs from the scale channels. The ability of such scale channel networks to generalise to scales not present in the training set over significant scale ranges has, however, not previously been explored. 

In this paper, we present a systematic study of this methodology by implementing different types of scale channel networks and evaluating their ability to generalise to previously unseen scales. We develop a formalism for analysing the covariance and invariance properties of scale channel networks, and explore how different design choices, unique to scaling transformations, affect the overall performance of scale channel networks. We first show that two previously proposed scale channel network designs do not generalise well to scales not present in the training set. We explain theoretically and demonstrate experimentally why generalisation fails in these cases. 

We then propose a new type of foveated scale channel architecture, where the scale channels process increasingly larger parts of the image with decreasing resolution. This new type of scale channel network is shown to generalise extremely well, provided sufficient image resolution and the absence of boundary effects. Our proposed FovMax and FovAvg networks perform almost identically over a scale range of 8, also when training on single scale training data, and do also give improved performance  when learning from datasets with large scale variations in the small sample regime.

Keywords
Deep learning, Convolutional neural networks, Invariant neural networks, Scale covariance, Scale invariance, Scale generalisation, Scale space
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-297340 (URN)
Projects
Scale-space theory for covariant and invariant visual perception
Funder
Swedish Research Council, 2018-03586
Note

QC 20210617

Available from: 2021-06-14 Created: 2021-06-14 Last updated: 2025-02-07Bibliographically approved
Finnveden, L., Jansson, Y. & Lindeberg, T. (2021). Understanding when spatial transformer networks do not support invariance, and what to do about it. In: ICPR 2020: International Conference on Pattern Recognition: . Paper presented at ICPR 2020: 25th International Conference on Pattern Recognition, Milan, Italy, January 10-15, 2021 (pp. 3427-3434). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Understanding when spatial transformer networks do not support invariance, and what to do about it
2021 (English)In: ICPR 2020: International Conference on Pattern Recognition, Institute of Electrical and Electronics Engineers (IEEE) , 2021, p. 3427-3434Conference paper, Published paper (Refereed)
Abstract [en]

Spatial transformer networks (STNs) were designed to enable convolutional neural networks (CNNs) to learn invariance to image transformations. STNs were originally proposed to transform CNN feature maps as well as input images. This enables the use of more complex features when predicting transformation parameters. However, since STNs perform a purely spatial transformation, they do not, in the general case, have the ability to align the feature maps of a transformed image with those of its original. STNs are therefore unable to support invariance when transforming CNN feature maps. We present a simple proof for this and study the practical implications, showing that this inability is coupled with decreased classification accuracy. We therefore investigate alternative STN architectures that make use of complex features. We find that while deeper localization networks are difficult to train, localization networks that share parameters with the classification network remain stable as they grow deeper, which allows for higher classification accuracy on difficult datasets. Finally, we explore the interaction between localization network complexity and iterative image alignment.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2021
Keywords
deep learning, convolutional neural networks, invariant neural networks, spatial transformer networks
National Category
Computer graphics and computer vision
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-288723 (URN)10.1109/ICPR48806.2021.9412997 (DOI)000678409203071 ()2-s2.0-85106434896 (Scopus ID)
Conference
ICPR 2020: 25th International Conference on Pattern Recognition, Milan, Italy, January 10-15, 2021
Funder
Swedish Research Council, 2018-03586
Note

Not duplicate with DiVA 1428271QC 20210831

Available from: 2021-01-11 Created: 2021-01-11 Last updated: 2025-02-07Bibliographically approved
Jansson, Y. & Lindeberg, T. (2020). Exploring the ability of CNNs to generalise to previously unseen scales over wide scale ranges.
Open this publication in new window or tab >>Exploring the ability of CNNs to generalise to previously unseen scales over wide scale ranges
2020 (English)Report (Other academic)
Abstract [en]

The ability to handle large scale variations is crucial for many real world visual tasks. A straightforward approach for handling scale in a deep network is to process an image at several scales simultaneously in a set of scale channels. Scale invariance can then, in principle, be achieved by using weight sharing between the scale channels together with max or average pooling over the outputs from the scale channels. The ability of such scale channel networks to generalise to scales not present in the training set over significant scale ranges has, however, not previously been explored. We, therefore, present a theoretical analysis of invariance and covariance properties of scale channel networks and perform an experimental evaluation of the ability of different types of scale channel networks to generalise to previously unseen scales. We identify limitations of previous approaches and propose a new type of foveated scale channel architecture, where the scale channels process increasingly larger parts of the image with decreasing resolution. Our proposed FovMax and FovAvg networks perform almost identically over a scale range of 8 also when training on single scale training data and give improvements in the small sample regime.

Keywords
deep learning, convolutional neural networks, invariant neural networks, scale invariance
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-272013 (URN)
Funder
Swedish Research Council, 2018-03586
Note

Not duplicate with 1515273

QC 20220517

Available from: 2020-04-15 Created: 2020-04-15 Last updated: 2025-02-07Bibliographically approved
Jansson, Y., Maydanskiy, M., Finnveden, L. & Lindeberg, T. (2020). Inability of spatial transformations of CNN feature maps to support invariant recognition.
Open this publication in new window or tab >>Inability of spatial transformations of CNN feature maps to support invariant recognition
2020 (English)Report (Other academic)
Abstract [en]

A large number of deep learning architectures use spatial transformations of CNN feature maps or filters to better deal with variability in object appearance caused by natural image transformations. In this paper, we prove that spatial transformations of CNN feature maps cannot align the feature maps of a transformed image to match those of its original, for general affine transformations, unless the extracted features are themselves invariant. Our proof is based on elementary analysis for both the single- and multi-layer network case. The results imply that methods based on spatial transformations of CNN feature maps or filters cannot replace image alignment of the input and cannot enable invariant recognition for general affine transformations, specifically not for scaling transformations or shear transformations. For rotations and reflections, spatially transforming feature maps or filters can enable invariance but only for networks with learnt or hardcoded rotation- or reflection-invariant features

Publisher
p. 22
Keywords
deep learning, convolutional neural networks, invariant neural networks, spatial transformer networks
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-272970 (URN)
Funder
Swedish Research Council, 2018-03586
Note

QC 20200507

Available from: 2020-05-05 Created: 2020-05-05 Last updated: 2025-02-07Bibliographically approved
Jansson, Y. & Lindeberg, T. (2020). MNIST Large Scale data set.
Open this publication in new window or tab >>MNIST Large Scale data set
2020 (English)Data set
Keywords
computer vision, classification, scale invariance, large scale variations, MNIST
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-273715 (URN)10.5281/zenodo.3820247 (DOI)
Funder
Swedish Research Council, 2018-03586
Available from: 2020-05-26 Created: 2020-05-26 Last updated: 2025-02-07
Jansson, Y., Maydanskiy, M., Finnveden, L. & Lindeberg, T. (2020). Spatial transformations in convolutional networks and invariant recognition.. In: : . Paper presented at DeepMath2020 Conference on the Mathematical Theory of Deep Neural Networks Nov 5 - Nov 6, 2020.
Open this publication in new window or tab >>Spatial transformations in convolutional networks and invariant recognition.
2020 (English)Conference paper, Oral presentation only (Refereed)
Abstract [en]

We show that spatial transformations of CNN feature maps cannot align the feature maps of a transformed image to match those of it’s original for general affine transformations. This implies that methods that spatially transform CNN feature maps, such as spatial transformer networks, dilated or deformable convolutions or spatial pyramid pooling cannot enable true invariance. Our proof is based on elementary analysis for both the single- and multi-layer network cases.

National Category
Computer graphics and computer vision
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-287481 (URN)
Conference
DeepMath2020 Conference on the Mathematical Theory of Deep Neural Networks Nov 5 - Nov 6, 2020
Funder
Swedish Research Council, 2018-03586
Note

QC 20201224

Available from: 2020-12-21 Created: 2020-12-21 Last updated: 2025-02-07Bibliographically approved
Finnveden, L., Jansson, Y. & Lindeberg, T. (2020). The problems with using STNs to align CNN feature maps. In: : . Paper presented at Northern Lights Deep Learning Workshop 2020, Tromsø, Norway, 20-21 Jan 2020.
Open this publication in new window or tab >>The problems with using STNs to align CNN feature maps
2020 (English)Conference paper, Oral presentation with published abstract (Other academic)
Abstract [en]

Spatial transformer networks (STNs) were designed to enable CNNs to learn invariance to image transformations. STNs were originally proposed to transform CNN feature maps as well as input images. This enables the use of more complex features when predicting transformation parameters. However, since STNs perform a purely spatial transformation, they do not, in the general case, have the ability to align the feature maps of a transformed image and its original. We present a theoretical argument for this and investigate the practical implications, showing that this inability is coupled with decreased classification accuracy. We advocate taking advantage of more complex features in deeper layers by instead sharing parameters between the classification and the localisation network.

Keywords
deep learning, convolutional neural networks, spatial transformer networks, invariant neural networks
National Category
Computer graphics and computer vision
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-266471 (URN)
Conference
Northern Lights Deep Learning Workshop 2020, Tromsø, Norway, 20-21 Jan 2020
Funder
Swedish Research Council, 2018-03586
Note

QC 20200123

Available from: 2020-01-23 Created: 2020-01-23 Last updated: 2025-02-07Bibliographically approved
Finnveden, L., Jansson, Y. & Lindeberg, T. (2020). The problems with using STNs to align CNN feature maps.
Open this publication in new window or tab >>The problems with using STNs to align CNN feature maps
2020 (English)Report (Other academic)
Abstract [en]

Spatial transformer networks (STNs) were designed to enable CNNs to learn invariance to image transformations. STNs were originally proposed to transform CNN feature maps as well as input images. This enables the use of more complex features when predicting transformation parameters. However, since STNs perform a purely spatial transformation, they do not, in the general case, have the ability to align the feature maps of a transformed image and its original. We present a theoretical argument for this and investigate the practical implications, showing that this inability is coupled with decreased classification accuracy. We advocate taking advantage of more complex features in deeper layers by instead sharing parameters between the classification and the localisation network.

Publisher
p. 2
National Category
Computer graphics and computer vision
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-363943 (URN)10.48550/arXiv.2001.05858 (DOI)
Funder
Swedish Research Council, 2018-03586
Note

QC 20250814

Available from: 2025-05-28 Created: 2025-05-28 Last updated: 2025-08-14
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0003-0011-6444

Search in DiVA

Show all publications