kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Scale generalisation properties of extended scale-covariant and scale-invariant Gaussian derivative networks on image datasets with spatial scaling variations
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST). (Computational Brain Science Lab)ORCID iD: 0009-0004-7143-5447
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST). (Computational Brain Science Lab)ORCID iD: 0000-0002-9081-2170
2025 (English)Report (Other academic)
Abstract [en]

Due to the variabilities in image structures caused by perspective scaling transformations, it is essential for deep networks to have an ability to generalise to scales not seen during training. This paper presents an in-depth analysis of the scale generalisation properties of the scale-covariant and scale-invariant Gaussian derivative networks, complemented with both conceptual and algorithmic extensions. For this purpose, Gaussian derivative networks (GaussDerNets) are evaluated on new rescaled versions of the Fashion-MNIST and the CIFAR-10 datasets, with spatial scaling variations over a factor of 4 in the testing data, that are not present in the training data. Additionally, evaluations on the previously existing STIR datasets show that the GaussDerNets achieve better scale generalisation than previously reported for these datasets for other types of deep networks.

We first experimentally demonstrate that the GaussDerNets have quite good scale generalisation properties on the new datasets, and that average pooling of feature responses over scales may sometimes also lead to better results than the previously used approach of max pooling over scales. Then, we demonstrate that using a spatial max pooling mechanism after the final layer enables localisation of non-centred objects in image domain, with maintained scale generalisation properties. We also show that regularisation during training, by applying dropout across the scale channels, referred to as scale-channel dropout, improves both the performance and the scale generalisation.

In additional ablation studies, we show that, for the rescaled CIFAR-10 dataset, basing the layers in the GaussDerNets on derivatives up to order three leads to better performance and scale generalisation for coarser scales, whereas networks based on derivatives up to order two achieve better scale generalisation for finer scales. Moreover, we demonstrate that discretisations of GaussDerNets based on the discrete analogue of the Gaussian kernel in combination with central difference operators perform best or among the best, compared to a set of other discrete approximations of the Gaussian derivative kernels. Furthermore, we show that the improvement in performance obtained by learning the scale values of the Gaussian derivatives, as opposed to using the previously proposed choice of a fixed logarithmic distribution of the scale levels, is usually only minor, thus supporting the previously postulated choice of using a logarithmic distribution as a very reasonable prior.

Finally, by visualising the activation maps and the learned receptive fields, we demonstrate that the GaussDerNets have very good explainability properties.

Place, publisher, year, edition, pages
2025. , p. 52
Keywords [en]
Scale covariance, Scale invariance, Scale generalisation, Scale selection, Gaussian derivative, Scale space, Deep learning, Receptive fields
National Category
Computer graphics and computer vision
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-354182DOI: 10.48550/arXiv.2409.11140OAI: oai:DiVA.org:kth-354182DiVA, id: diva2:1902537
Projects
Covariant and invariant deep networks
Funder
Swedish Research Council, 2018-03586, 2022-02969
Note

QC 20241001

Available from: 2024-10-01 Created: 2024-10-01 Last updated: 2025-03-27Bibliographically approved

Open Access in DiVA

fulltext(4264 kB)26 downloads
File information
File name FULLTEXT02.pdfFile size 4264 kBChecksum SHA-512
4b759820f560e14d039a08470ad83b2cbf86985b87bfd1879b2054b65677cacb2266d24ce66a78b181ba18ff8c88ed7e11c8c837c1d91010eb6a5735f5934444
Type fulltextMimetype application/pdf

Other links

Publisher's full text[v2]

Authority records

Lindeberg, Tony

Search in DiVA

By author/editor
Perzanowski, AndrzejLindeberg, Tony
By organisation
Computational Science and Technology (CST)
Computer graphics and computer vision

Search outside of DiVA

GoogleGoogle Scholar
Total: 74 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 969 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf