Open this publication in new window or tab >>2022 (English)In: Journal of Mathematical Imaging and Vision, ISSN 0924-9907, E-ISSN 1573-7683, Vol. 64, no 5, p. 506-536Article in journal (Refereed) Published
Abstract [en]
The ability to handle large scale variations is crucial for many real world visual tasks. A straightforward approach for handling scale in a deep network is toprocess an image at several scales simultaneously in a set of scale channels. Scale invariance can then, in principle, be achieved by using weight sharing between the scale channels together with max or average pooling over the outputs from the scale channels. The ability of such scale-channel networks to generalise to scales not present in the training set over significant scale ranges has, however, not previously been explored.
In this paper, we present a systematic study of this methodology by implementing different types of scale-channel networks and evaluating their ability to generalise to previously unseen scales. We develop a formalism for analysing the covariance and invariance properties of scale-channel networks, including exploring their relations to scale-space theory, and exploring how different design choices, unique to scaling transformations, affect the overall performance of scale-channel networks. We first show that two previously proposed scale-channel network designs, in one case, generalise no better than a standard CNN to scales not present in the training set, and in the second case, have limited scale generalisation ability. We explain theoretically and demonstrate experimentally why generalisation fails or is limited in these cases. We then propose a new type of foveated scale-channel architecture, where the scale channels process increasingly larger parts of the image with decreasing resolution. This new type of scale-channel network is shown to generalise extremely well, provided sufficient image resolution and the absence of boundary effects. Our proposed FovMax and FovAvg networks perform almost identically over a scale range of 8, also when training on single-scale training data, and do also give improved performance when learning from datasets with large scale variations in the small sample regime.
Place, publisher, year, edition, pages
Springer Science+Business Media B.V., 2022
Keywords
Deep learning, Convolutional neural networks, Invariant neural networks, Scale covariance, Scale invariance, Scale generalisation, Scale space
National Category
Computer graphics and computer vision
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-309251 (URN)10.1007/s10851-022-01082-2 (DOI)000780653200001 ()2-s2.0-85127949060 (Scopus ID)
Projects
Scale-space theory for covariant and invariant visual perception
Funder
Swedish Research Council, 2018-03586
Note
QC 20220530
2022-02-242022-02-242025-02-07Bibliographically approved