Endre søk
Begrens søket
1234 1 - 50 of 159
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Treff pr side
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Disputationsdatum (tidligste først)
  • Disputationsdatum (siste først)
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Disputationsdatum (tidligste først)
  • Disputationsdatum (siste først)
Merk
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 1. Almansa, A.
    et al.
    Lindeberg, Tony
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB.
    Fingerprint enhancement by shape adaptation of scale-space operators with automatic scale selection2000Inngår i: IEEE Transactions on Image Processing, ISSN 1057-7149, E-ISSN 1941-0042, Vol. 9, nr 12, s. 2027-2042Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    This work presents two mechanisms for processing fingerprint images; shape-adapted smoothing based on second moment descriptors and automatic scale selection based on normalized derivatives. The shape adaptation procedure adapts the smoothing operation to the local ridge structures, which allows interrupted ridges to be joined without destroying essential singularities such as branching points and enforces continuity of their directional fields. The Scale selection procedure estimates local ridge width and adapts the amount of smoothing to the local amount of noise. In addition, a ridgeness measure is defined, which reflects how well the local image structure agrees with a qualitative ridge model, and is used for spreading the results of shape adaptation into noisy areas. The combined approach makes it possible to resolve fine scale structures in clear areas while reducing the risk of enhancing noise in blurred or fragmented areas. The result is a reliable and adaptively detailed estimate of the ridge orientation field and ridge width, as well as a Smoothed grey-level version of the input image. We propose that these general techniques should be of interest to developers of automatic fingerprint identification systems as well as in other applications of processing related types of imagery.

    Fulltekst (pdf)
    fulltext
  • 2. Almansa, Andrés
    et al.
    Lindeberg, Tony
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB.
    Enhancement of Fingerprint Images by Shape-Adapted Scale-Space Operators1996Inngår i: Gaussian Scale-Space Theory. Part I: Proceedings of PhD School on Scale-Space Theory (Copenhagen, Denmark) May 1996 / [ed] J. Sporring, M. Nielsen, L. Florack, and P. Johansen, Springer Science+Business Media B.V., 1996, s. 21-30Kapittel i bok, del av antologi (Fagfellevurdert)
    Abstract [en]

    This work presents a novel technique for preprocessing fingerprint images. The method is based on the measurements of second moment descriptors and shape adaptation of scale-space operators with automatic scale selection (Lindeberg 1994). This procedure, which has been successfully used in the context of shape-from-texture and shape from disparity gradients, has several advantages when applied to fingerprint image enhancement, as observed by (Weickert 1995). For example, it is capable of joining interrupted ridges, and enforces continuity of their directional fields.

    In this work, these abovementioned general ideas are applied and extended in the following ways: Two methods for estimating local ridge width are explored and tuned to the problem of fingerprint enhancement. A ridgeness measure is defined, which reflects how well the local image structure agrees with a qualitative ridge model. This information is used for guiding a scale-selection mechanism, and for spreading the results of shape adaptation into noisy areas.

    The combined approach makes it possible to resolve fine scale structures in clear areas while reducing the risk of enhancing noise in blurred or fragmented areas. To a large extent, the scheme has the desirable property of joining interrupted lines without destroying essential singularities such as branching points. Thus, the result is a reliable and adaptively detailed estimate of the ridge orientation field and ridge width, as well as a smoothed grey-level version of the input image.

    A detailed experimental evaluation is presented, including a comparison with other techniques. We propose that the techniques presented provide mechanisms of interest to developers of automatic fingerprint identification systems.

    Fulltekst (pdf)
    fulltext
  • 3. Björkman, Eva
    et al.
    Zagal, Juan Cristobal
    Lindeberg, Tony
    KTH, Tidigare Institutioner (före 2005), Numerisk analys och datalogi, NADA.
    Roland, Per E.
    Evaluation of design options for the scale-space primal sketch analysis of brain activation images2000Inngår i: : HBM'00, published in Neuroimage, volume 11, number 5, 2000, 2000, Vol. 11, s. 656-656Konferansepaper (Fagfellevurdert)
    Abstract [en]

    A key issue in brain imaging concerns how to detect the functionally activated regions from PET and fMRI images. In earlier work, it has been shown that the scale-space primal sketch provides a useful tool for such analysis [1]. The method includes presmoothing with different filter widths and automatic estimation of the spatial extent of the activated regions (blobs).

    The purpose is to present two modifications of the scale-space primal sketch, as well as a quantitative evaluation which shows that these modifications improve the performance, measured as the separation between blob descriptors extracted from PET images and from noise images. This separation is essential for future work of associating a statistical p-value with the scale-space blob descriptors.

    Fulltekst (pdf)
    fulltext
  • 4.
    Bretzner, Lars
    et al.
    KTH, Tidigare Institutioner (före 2005), Numerisk analys och datalogi, NADA.
    Laptev, Ivan
    KTH, Tidigare Institutioner (före 2005), Numerisk analys och datalogi, NADA.
    Lindeberg, Tony
    KTH, Tidigare Institutioner (före 2005), Numerisk analys och datalogi, NADA.
    Hand-gesture recognition using multi-scale colour features, hierarchical features and particle filtering2002Inngår i: Fifth IEEE International Conference on Automatic Face and Gesture Recognition, 2002. Proceedings, IEEE conference proceedings, 2002, s. 63-74Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper presents algorithms and a prototype systemfor hand tracking and hand posture recognition. Hand posturesare represented in terms of hierarchies of multi-scalecolour image features at different scales, with qualitativeinter-relations in terms of scale, position and orientation. Ineach image, detection of multi-scale colour features is performed.Hand states are then simultaneously detected andtracked using particle filtering, with an extension of layeredsampling referred to as hierarchical layered sampling. Experimentsare presented showing that the performance ofthe system is substantially improved by performing featuredetection in colour space and including a prior with respectto skin colour. These components have been integrated intoa real-time prototype system, applied to a test problem ofcontrolling consumer electronics using hand gestures. In asimplified demo scenario, this system has been successfullytested by participants at two fairs during 2001.

    Fulltekst (pdf)
    fulltext
  • 5.
    Bretzner, Lars
    et al.
    KTH, Tidigare Institutioner (före 2005), Numerisk analys och datalogi, NADA.
    Laptev, Ivan
    KTH, Tidigare Institutioner (före 2005), Numerisk analys och datalogi, NADA.
    Lindeberg, Tony
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB.
    Lenman, S.
    Sundblad, Y.
    A Prototype System for Computer Vision Based Human Computer Interaction2001Rapport (Annet vitenskapelig)
    Fulltekst (pdf)
    fulltext
  • 6.
    Bretzner, Lars
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Lindeberg, Tony
    KTH, Tidigare Institutioner (före 2005), Numerisk analys och datalogi, NADA.
    Feature Tracking with Automatic Selection of Spatial Scales1998Inngår i: Computer Vision and Image Understanding, ISSN 1077-3142, E-ISSN 1090-235X, Vol. 71, nr 3, s. 385-393Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    When observing a dynamic world, the size of image structures may vary over time. This article emphasizes the need for including explicit mechanisms for automatic scale selection in feature tracking algorithms in order to: (i) adapt the local scale of processing to the local image structure, and (ii) adapt to the size variations that may occur over time. The problems of corner detection and blob detection are treated in detail, and a combined framework for feature tracking is presented. The integrated tracking algorithm overcomes some of the inherent limitations of exposing fixed-scale tracking methods to image sequences in which the size variations are large. It is also shown how the stability over time of scale descriptors can be used as a part of a multi-cue similarity measure for matching. Experiments on real-world sequences are presented showing the performance of the algorithm when applied to (individual) tracking of corners and blobs.

    Fulltekst (pdf)
    fulltext
  • 7.
    Bretzner, Lars
    et al.
    KTH, Tidigare Institutioner (före 2005), Numerisk analys och datalogi, NADA.
    Lindeberg, Tony
    KTH, Tidigare Institutioner (före 2005), Numerisk analys och datalogi, NADA.
    Feature tracking with automatic selection of spatial scales1998Rapport (Annet vitenskapelig)
    Abstract [en]

    When observing a dynamic world, the size of image structures may vary over nada. This article emphasizes the need for including explicit mechanisms for automatic scale selection in feature tracking algorithms in order to: (i) adapt the local scale of processing to the local image structure, and (ii) adapt to the size variations that may occur over time.

    The problems of corner detection and blob detection are treated in detail, and a combined framework for feature tracking is presented in which the image features at every time moment are detected at locally determined and automatically selected nadaes. A useful property of the scale selection method is that the scale levels selected in the feature detection step reflect the spatial extent of the image structures. Thereby, the integrated tracking algorithm has the ability to adapt to spatial as well as temporal size variations, and can in this way overcome some of the inherent limitations of exposing fixed-scale tracking methods to image sequences in which the size variations are large.

    In the composed tracking procedure, the scale information is used for two additional major purposes: (i) for defining local regions of interest for searching for matching candidates as well as setting the window size for correlation when evaluating matching candidates, and (ii) stability over time of the scale and significance descriptors produced by the scale selection procedure are used for formulating a multi-cue similarity measure for matching.

    Experiments on real-world sequences are presented showing the performance of the algorithm when applied to (individual) tracking of corners and blobs. Specifically, comparisons with fixed-scale tracking methods are included as well as illustrations of the increase in performance obtained by using multiple cues in the feature matching step.

    Fulltekst (pdf)
    fulltext
  • 8.
    Bretzner, Lars
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Lindeberg, Tony
    KTH, Tidigare Institutioner (före 2005), Numerisk analys och datalogi, NADA.
    On the handling of spatial and temporal scales in feature tracking1997Inngår i: Scale-Space Theory in Computer Vision: First International Conference, Scale-Space'97 Utrecht, The Netherlands, July 2–4, 1997 Proceedings, Springer Berlin/Heidelberg, 1997, s. 128-139Konferansepaper (Fagfellevurdert)
    Fulltekst (pdf)
    fulltext
  • 9.
    Bretzner, Lars
    et al.
    KTH, Tidigare Institutioner (före 2005), Numerisk analys och datalogi, NADA.
    Lindeberg, Tony
    KTH, Tidigare Institutioner (före 2005), Numerisk analys och datalogi, NADA.
    Qualitative multi-scale feature hierarchies for object tracking1999Inngår i: Proc Scale-Space Theories in Computer Vision Med, Elsevier, 1999, s. 117-128Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper shows how the performance of feature trackers can be improved by building a view-based object representation consisting of qualitative relations between image structures at different scales. The idea is to track all image features individually, and to use the qualitative feature relations for resolving ambiguous matches and for introducing feature hypotheses whenever image features are mismatched or lost. Compared to more traditional work on view-based object tracking, this methodology has the ability to handle semi-rigid objects and partial occlusions. Compared to trackers based on three-dimensional object models, this approach is much simpler and of a more generic nature. A hands-on example is presented showing how an integrated application system can be constructed from conceptually very simple operations.

    Fulltekst (pdf)
    fulltext
  • 10.
    Bretzner, Lars
    et al.
    KTH, Tidigare Institutioner (före 2005), Numerisk analys och datalogi, NADA.
    Lindeberg, Tony
    KTH, Tidigare Institutioner (före 2005), Numerisk analys och datalogi, NADA.
    Qualitative Multi-Scale Feature Hierarchies for Object Tracking2000Inngår i: Journal of Visual Communication and Image Representation, ISSN 1047-3203, E-ISSN 1095-9076, Vol. 11, s. 115-129Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    This paper shows how the performance of feature trackers can be improved by building a view-based object representation consisting of qualitative relations between image structures at different scales. The idea is to track all image features individually, and to use the qualitative feature relations for resolving ambiguous matches and for introducing feature hypotheses whenever image features are mismatched or lost. Compared to more traditional work on view-based object tracking, this methodology has the ability to handle semi-rigid objects and partial occlusions. Compared to trackers based on three-dimensional object models, this approach is much simpler and of a more generic nature. A hands-on example is presented showing how an integrated application system can be constructed from conceptually very simple operations.

    Fulltekst (pdf)
    fulltext
  • 11.
    Bretzner, Lars
    et al.
    KTH, Tidigare Institutioner (före 2005), Numerisk analys och datalogi, NADA.
    Lindeberg, Tony
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB.
    Qualitative multiscale feature hierarchies for object tracking2000Rapport (Fagfellevurdert)
    Abstract [en]

    This paper shows how the performance of feature trackers can be improved by building a hierarchical view-based object representation consisting of qualitative relations between image structures at different scales. The idea is to track all image features individually and to use the qualitative feature relations for avoiding mismatches, for resolving ambiguous matches, and for introducing feature hypotheses whenever image features are lost. Compared to more traditional work on view-based object tracking, this methodology has the ability to handle semirigid objects and partial occlusions. Compared to trackers based on three-dimensional object models, this approach is much simpler and of a more generic nature. A hands-on example is presented showing how an integrated application system can be constructed from conceptually very simple operations.

    Fulltekst (pdf)
    fulltext
  • 12.
    Bretzner, Lars
    et al.
    KTH, Tidigare Institutioner (före 2005), Numerisk analys och datalogi, NADA.
    Lindeberg, Tony
    KTH, Tidigare Institutioner (före 2005), Numerisk analys och datalogi, NADA.
    Structure and Motion Estimation using Sparse Point and Line Correspondences in Multiple Affine Views1999Rapport (Annet vitenskapelig)
    Abstract [en]

    This paper addresses the problem of computing three-dimen\-sional structure and motion from an unknown rigid configuration of points and lines viewed by an affine projection model. An algebraic structure, analogous to the trilinear tensor for three perspective cameras, is defined for configurations of three centered affine cameras. This centered affine trifocal tensor contains 12 non-zero coefficients and involves linear relations between point correspondences and trilinear relations between line correspondences. It is shown how the affine trifocal tensor relates to the perspective trilinear tensor, and how three-dimensional motion can be computed from this tensor in a straightforward manner. A factorization approach is developed to handle point features and line features simultaneously in image sequences, and degenerate feature configurations are analysed. This theory is applied to a specific problem in human-computer interaction of capturing three-dimensional rotations from gestures of a human hand. This application to quantitative gesture analyses illustrates the usefulness of the affine trifocal tensor in a situation where sufficient information is not available to compute the perspective trilinear tensor, while the geometry requires point correspondences as well as line correspondences over at least three views.

    Fulltekst (pdf)
    fulltext
  • 13.
    Bretzner, Lars
    et al.
    KTH, Tidigare Institutioner (före 2005), Numerisk analys och datalogi, NADA.
    Lindeberg, Tony
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB.
    Use your hand as a 3-D mouse or relative orientation from extended sequences of sparse point and line correspondances using the affine trifocal tensor1998Inngår i: Computer Vision — ECCV'98: 5th European Conference on Computer Vision Freiburg, Germany, June, 2–6, 1998 Proceedings, Volume I, Springer Berlin/Heidelberg, 1998, Vol. 1406, s. 141-157Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper addresses the problem of computing three-dimensional structure and motion from an unknown rigid configuration of point and lines viewed by an affine projection model. An algebraic structure, analogous to the trilinear tensor for three perspective cameras, is defined for configurations of three centered affine cameras. This centered affine trifocal tensor contains 12 coefficients and involves linear relations between point correspondences and trilinear relations between line correspondences It is shown how the affine trifocal tensor relates to the perspective trilinear tensor, and how three-dimensional motion can be computed from this tensor in a straightforward manner. A factorization approach is also developed to handle point features and line features simultaneously in image sequences.

    This theory is applied to a specific problem of human-computer interaction of capturing three-dimensional rotations from gestures of a human hand. A qualitative model is presented, in which three fingers are represented by their position and orientation, and it is shown how three point correspondences (blobs at the finger tips) and three line correspondences (ridge features at the fingers) allow the affine trifocal tensor to be determined, from which the rotation is computed. Besides the obvious application, this test problem illustrates the usefulness of the affine trifocal tensor in a situation where sufficient information is not available to compute the perspective trilinear tensor, while the geometry requires point correspondences as well as line correspondences over at least three views.

    Fulltekst (pdf)
    fulltext
  • 14.
    Brunnström, Kjell
    et al.
    KTH, Tidigare Institutioner (före 2005), Numerisk analys och datalogi, NADA.
    Eklundh, Jan-Olof
    KTH, Tidigare Institutioner (före 2005), Numerisk analys och datalogi, NADA.
    Lindeberg, Tony
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB.
    On Scale and Resolution in the Analysis of Local Image Structure1990Inngår i: Proc. 1st European Conf. on Computer Vision, 1990, Vol. 427, s. 3-12Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Focus-of-attention is extremely important in human visual perception. If computer vision systems are to perform tasks in a complex, dynamic world they will have to be able to control processing in a way that is analogous to visual attention in humans.

    In this paper we will investigate problems in connection with foveation, that is examining selected regions of the world at high resolution. We will especially consider the problem of finding and classifying junctions from this aspect. We will show that foveation as simulated by controlled, active zooming in conjunction with scale-space techniques allows robust detection and classification of junctions.

  • 15.
    Brunnström, Kjell
    et al.
    KTH, Tidigare Institutioner (före 2005), Numerisk analys och datalogi, NADA.
    Eklundh, Jan-Olof
    KTH, Tidigare Institutioner (före 2005), Numerisk analys och datalogi, NADA.
    Lindeberg, Tony
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB.
    Scale and Resolution in Active Analysis of Local Image Structure1990Inngår i: Image and Vision Computing, Vol. 8, s. 289-296Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Focus-of-attention is extremely important in human visual perception. If computer vision systems are to perform tasks in a complex, dynamic world they will have to be able to control processing in a way that is analogous to visual attention in humans. Problems connected to foveation (examination of selected regions of the world at high resolution) are examined. In particular, the problem of finding and classifying junctions from this aspect is considered. It is shown that foveation as simulated by controlled, active zooming in conjunction with scale-space techniques allows for robust detection and classification of junctions.

  • 16.
    Brunnström, Kjell
    et al.
    KTH, Tidigare Institutioner (före 2005), Numerisk analys och datalogi, NADA.
    Lindeberg, Tony
    KTH, Tidigare Institutioner (före 2005), Numerisk analys och datalogi, NADA.
    Eklundh, Jan-Olof
    KTH, Tidigare Institutioner (före 2005), Numerisk analys och datalogi, NADA.
    Active detection and classification of junctions by foveation with a head-eye system guided by the scale-space primal sketch1992Inngår i: Computer Vision — ECCV'92: Second European Conference on Computer Vision Santa Margherita Ligure, Italy, May 19–22, 1992 Proceedings / [ed] Guilo Sandini, Springer Berlin/Heidelberg, 1992, s. 701-709Konferansepaper (Fagfellevurdert)
    Abstract [en]

    We consider how junction detection and classification can be performed in an active visual system. This is to exemplify that feature detection and classification in general can be done by both simple and robust methods, if the vision system is allowed to look at the world rather than at prerecorded images. We address issues on how to attract the attention to salient local image structures, as well as on how to characterize those.

    Fulltekst (pdf)
    fulltext
  • 17.
    Ekeberg, Örjan
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Fransén, Erik
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Hellgren Kotaleski, Jeanette
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Herman, Pawel
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Kumar, Arvind
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Lansner, Anders
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Lindeberg, Tony
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Computational Brain Science at CST, CSC, KTH2016Annet (Annet vitenskapelig)
    Abstract [en]

    Mission and Vision - Computational Brain Science Lab at CST, CSC, KTH

    The scientific mission of the Computational Brain Science Lab at CSC is to be at the forefront of mathematical modelling, quantitative analysis and mechanistic understanding of brain function. We perform research on (i) computational modelling of biological brain function and on (ii) developing theory, algorithms and software for building computer systems that can perform brain-like functions. Our research answers scientific questions and develops methods in these fields. We integrate results from our science-driven brain research into our work on brain-like algorithms and likewise use theoretical results about artificial brain-like functions as hypotheses for biological brain research.

    Our research on biological brain function includes sensory perception (vision, hearing, olfaction, pain), cognition (action selection, memory, learning) and motor control at different levels of biological detail (molecular, cellular, network) and mathematical/functional description. Methods development for investigating biological brain function and its dynamics as well as dysfunction comprises biomechanical simulation engines for locomotion and voice, machine learning methods for analysing functional brain images, craniofacial morphology and neuronal multi-scale simulations. Projects are conducted in close collaborations with Karolinska Institutet and Karolinska Hospital in Sweden as well as other laboratories in Europe, U.S., Japan and India.

    Our research on brain-like computing concerns methods development for perceptual systems that extract information from sensory signals (images, video and audio), analysis of functional brain images and EEG data, learning for autonomous agents as well as development of computational architectures (both software and hardware) for neural information processing. Our brain-inspired approach to computing also applies more generically to other computer science problems such as pattern recognition, data analysis and intelligent systems. Recent industrial collaborations include analysis of patient brain data with MentisCura and the startup company 13 Lab bought by Facebook.

    Our long term vision is to contribute to (i) deeper understanding of the computational mechanisms underlying biological brain function and (ii) better theories, methods and algorithms for perceptual and intelligent systems that perform artificial brain-like functions by (iii) performing interdisciplinary and cross-fertilizing research on both biological and artificial brain-like functions. 

    On one hand, biological brains provide existence proofs for guiding our research on artificial perceptual and intelligent systems. On the other hand, applying Richard Feynman’s famous statement ”What I cannot create I do not understand” to brain science implies that we can only claim to fully understand the computational mechanisms underlying biological brain function if we can build and implement corresponding computational mechanisms on a computerized system that performs similar brain-like functions.

    Fulltekst (pdf)
    fulltext
  • 18.
    Finnveden, Lukas
    et al.
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Jansson, Ylva
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Lindeberg, Tony
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    The problems with using STNs to align CNN feature maps2020Konferansepaper (Annet vitenskapelig)
    Abstract [en]

    Spatial transformer networks (STNs) were designed to enable CNNs to learn invariance to image transformations. STNs were originally proposed to transform CNN feature maps as well as input images. This enables the use of more complex features when predicting transformation parameters. However, since STNs perform a purely spatial transformation, they do not, in the general case, have the ability to align the feature maps of a transformed image and its original. We present a theoretical argument for this and investigate the practical implications, showing that this inability is coupled with decreased classification accuracy. We advocate taking advantage of more complex features in deeper layers by instead sharing parameters between the classification and the localisation network.

    Fulltekst (pdf)
    STN_problems_to_align_CNN_feature_maps
  • 19.
    Finnveden, Lukas
    et al.
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Jansson, Ylva
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Lindeberg, Tony
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Understanding when spatial transformer networks do not support invariance, and what to do about it2021Inngår i: ICPR 2020: International Conference on Pattern Recognition, Institute of Electrical and Electronics Engineers (IEEE) , 2021, s. 3427-3434Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Spatial transformer networks (STNs) were designed to enable convolutional neural networks (CNNs) to learn invariance to image transformations. STNs were originally proposed to transform CNN feature maps as well as input images. This enables the use of more complex features when predicting transformation parameters. However, since STNs perform a purely spatial transformation, they do not, in the general case, have the ability to align the feature maps of a transformed image with those of its original. STNs are therefore unable to support invariance when transforming CNN feature maps. We present a simple proof for this and study the practical implications, showing that this inability is coupled with decreased classification accuracy. We therefore investigate alternative STN architectures that make use of complex features. We find that while deeper localization networks are difficult to train, localization networks that share parameters with the classification network remain stable as they grow deeper, which allows for higher classification accuracy on difficult datasets. Finally, we explore the interaction between localization network complexity and iterative image alignment.

    Fulltekst (pdf)
    fulltext
  • 20.
    Finnveden, Lukas
    et al.
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Jansson, Ylva
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Lindeberg, Tony
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Understanding when spatial transformer networks do not support invariance, and what to do about it2020Rapport (Annet vitenskapelig)
    Abstract [en]

    Spatial transformer networks (STNs) were designed to enable convolutional neural networks (CNNs) to learn invariance to image transformations. STNs were originally proposed to transform CNN feature maps as well as input images. This enables the use of more complex features when predicting transformation parameters. However, since STNs perform a purely spatial transformation, they do not, in the general case, have the ability to align the feature maps of a transformed image with those of its original. STNs are therefore unable to support invariance when transforming CNN feature maps. We present a simple proof for this and study the practical implications, showing that this inability is coupled with decreased classification accuracy. We therefore investigate alternative STN architectures that make use of complex features. We find that while deeper localization networks are difficult to train, localization networks that share parameters with the classification network remain stable as they grow deeper, which allows for higher classification accuracy on difficult datasets. Finally, we explore the interaction between localization network complexity and iterative image alignment.

    Fulltekst (pdf)
    fulltext
  • 21.
    Friberg, Anders
    et al.
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.
    Lindeberg, Tony
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Hellwagner, Martin
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.
    Helgason, Pétur
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.
    Salomão, Gláucia Laís
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.
    Elowsson, Anders
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.
    Lemaitre, Guillaume
    Institute for Research and Coordination in Acoustics and Music, Paris, France.
    Ternström, Sten
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.
    Prediction of three articulatory categories in vocal sound imitations using models for auditory receptive fields2018Inngår i: Journal of the Acoustical Society of America, ISSN 0001-4966, E-ISSN 1520-8524, Vol. 144, nr 3, s. 1467-1483Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Vocal sound imitations provide a new challenge for understanding the coupling between articulatory mechanisms and the resulting audio. In this study, we have modeled the classification of three articulatory categories, phonation, supraglottal myoelastic vibrations, and turbulence from audio recordings. Two data sets were assembled, consisting of different vocal imitations by four professional imitators and four non-professional speakers in two different experiments. The audio data were manually annotated by two experienced phoneticians using a detailed articulatory description scheme. A separate set of audio features was developed specifically for each category using both time-domain and spectral methods. For all time-frequency transformations, and for some secondary processing, the recently developed Auditory Receptive Fields Toolbox was used. Three different machine learning methods were applied for predicting the final articulatory categories. The result with the best generalization was found using an ensemble of multilayer perceptrons. The cross-validated classification accuracy was 96.8 % for phonation, 90.8 % for supraglottal myoelastic vibrations, and 89.0 % for turbulence using all the 84 developed features. A final feature reduction to 22 features yielded similar results.

    Fulltekst (pdf)
    fulltext
  • 22.
    Gårding, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Lindeberg, Tony
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB.
    CanApp: The Candela Application Library1989Rapport (Annet vitenskapelig)
    Abstract [en]

    This paper describes CanApp, the Candela Application Library. CanApp is a software package for image processing and image analysis. Most of the subroutines in CanApp are available both as stand-alone programs and C subroutines.

    CanApp currently comprises some 50 programs and 75 subroutines, and these numbers are expected to grow continuously as a result of joint efforts of the members of the CVAP group at the Royal Institute of Technology in Stockholm.

    CanApp is currently installed and running under UNIX on Sun workstations

    Fulltekst (pdf)
    fulltext
  • 23.
    Gårding, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Lindeberg, Tony
    KTH, Tidigare Institutioner (före 2005), Numerisk analys och datalogi, NADA.
    Direct computation of shape cues using scale-adapted spatial derivative operators1996Inngår i: International Journal of Computer Vision, ISSN 0920-5691, E-ISSN 1573-1405, Vol. 17, nr 2, s. 163-191Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    This paper addresses the problem of computing cues to the three-dimensional structure of surfaces in the world directly from the local structure of the brightness pattern of either a single monocular image or a binocular image pair.It is shown that starting from Gaussian derivatives of order up to two at a range of scales in scale-space, local estimates of (i) surface orientation from monocular texture foreshortening, (ii) surface orientation from monocular texture gradients, and (iii) surface orientation from the binocular disparity gradient can be computed without iteration or search, and by using essentially the same basic mechanism.The methodology is based on a multi-scale descriptor of image structure called the windowed second moment matrix, which is computed with adaptive selection of both scale levels and spatial positions. Notably, this descriptor comprises two scale parameters; a local scale parameter describing the amount of smoothing used in derivative computations, and an integration scale parameter determining over how large a region in space the statistics of regional descriptors is accumulated.Experimental results for both synthetic and natural images are presented, and the relation with models of biological vision is briefly discussed.

    Fulltekst (pdf)
    fulltext
  • 24.
    Gårding, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Lindeberg, Tony
    KTH, Tidigare Institutioner (före 2005), Numerisk analys och datalogi, NADA.
    Direct estimation of local surface shape in a fixating binocular vision system1994Inngår i: Computer Vision — ECCV '94: Third European Conference on Computer Vision Stockholm, Sweden, May 2–6, 1994 Proceedings, Volume I, Springer Berlin/Heidelberg, 1994, s. 365-376Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper addresses the problem of computing cues to the three-dimensional structure of surfaces in the world directly from the local structure of the brightness pattern of a binocular image pair. The geometric information content of the gradient of binocular disparity is analyzed for the general case of a fixating vision system with symmetric or asymmetric vergence, and with either known or unknown viewing geometry. A computationally inexpensive technique which exploits this analysis is proposed. This technique allows a local estimate of surface orientation to be computed directly from the local statistics of the left and right image brightness gradients, without iterations or search. The viability of the approach is demonstrated with experimental results for both synthetic and natural gray-level images.

    Fulltekst (pdf)
    fulltext
  • 25.
    Jansson, Ylva
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Lindeberg, Tony
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Dynamic texture recognition using time-causal and time-recursive spatio-temporal receptive fields2017Rapport (Annet vitenskapelig)
    Abstract [en]

    This work presents a first evaluation of using spatiotemporal receptive fields from a recently proposed time-causal spatio-temporal scale-space framework as primitives for video analysis. We propose a new family of video descriptors based on regional statistics of spatio-temporal receptive field responses and evaluate this approach on the problem of dynamic texture recognition. Our approach generalises a previously used method, based on joint histograms of receptive field responses, from the spatial to the spatio-temporal domain and from object recognition to dynamic texture recognition. The time-recursive formulation enables computationally efficient time-causal recognition.

    The experimental evaluation demonstrates competitive performance compared to state-of-the-art. Especially, it is shown that binary versions of our dynamic texture descriptors achieve improved performance compared to a large range of similar methods using different primitives either handcrafted or learned from data. Further, our qualitative and quantitative investigation into parameter choices and the use of different sets of receptive fields highlights the robustness and flexibility of our approach. Together, these results support the descriptive power of this family of time-causal spatio-temporal receptive fields, validate our approach for dynamic texture recognition and point towards the possibility of designing a range of video analysis methods based on these new time-causal spatio-temporal primitives.

    Fulltekst (pdf)
    Dynamic_texture_recognition_JanssonLindeberg_arXiv2017
  • 26.
    Jansson, Ylva
    et al.
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Lindeberg, Tony
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Dynamic texture recognition using time-causal and time-recursive spatio-temporal receptive fields2018Inngår i: Journal of Mathematical Imaging and Vision, ISSN 0924-9907, E-ISSN 1573-7683, Vol. 60, nr 9, s. 1369-1398Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    This work presents a first evaluation of using spatio-temporal receptive fields from a recently proposed time-causal spatiotemporal scale-space framework as primitives for video analysis. We propose a new family of video descriptors based on regional statistics of spatio-temporal receptive field responses and evaluate this approach on the problem of dynamic texture recognition. Our approach generalises a previously used method, based on joint histograms of receptive field responses, from the spatial to the spatio-temporal domain and from object recognition to dynamic texture recognition. The time-recursive formulation enables computationally efficient time-causal recognition. The experimental evaluation demonstrates competitive performance compared to state of the art. In particular, it is shown that binary versions of our dynamic texture descriptors achieve improved performance compared to a large range of similar methods using different primitives either handcrafted or learned from data. Further, our qualitative and quantitative investigation into parameter choices and the use of different sets of receptive fields highlights the robustness and flexibility of our approach. Together, these results support the descriptive power of this family of time-causal spatio-temporal receptive fields, validate our approach for dynamic texture recognition and point towards the possibility of designing a range of video analysis methods based on these new time-causal spatio-temporal primitives.

    Fulltekst (pdf)
    fulltext
  • 27.
    Jansson, Ylva
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Lindeberg, Tony
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Dynamic texture recognition using time-causal spatio-temporal scale-space filters2017Inngår i: Scale Space and Variational Methods in Computer Vision, Springer, 2017, Vol. 10302, s. 16-28Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This work presents an evaluation of using time-causal scale-space filters as primitives for video analysis. For this purpose, we present a new family of video descriptors based on regional statistics of spatiotemporal scale-space filter responses and evaluate this approach on the problem of dynamic texture recognition. Our approach generalises a previously used method, based on joint histograms of receptive field responses, from the spatial to the spatio-temporal domain. We evaluate one member in this family, constituting a joint binary histogram, on two widely used dynamic texture databases. The experimental evaluation shows competitive performance compared to previous methods for dynamic texture recognition, especially on the more complex DynTex database. These results support the descriptive power of time-causal spatio-temporal scale-space filters as primitives for video analysis.

    Fulltekst (pdf)
    DTRecognSpatioTemporalScSpFilters_JanssonLindeberg_SSVM2017
  • 28.
    Jansson, Ylva
    et al.
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Lindeberg, Tony
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Exploring the ability of CNNs to generalise to previously unseen scales over wide scale ranges2021Inngår i: ICPR 2020: International Conference on Pattern Recognition, Institute of Electrical and Electronics Engineers (IEEE) , 2021, s. 1181-1188Konferansepaper (Fagfellevurdert)
    Abstract [en]

    The ability to handle large scale variations is crucial for many real world visual tasks. A straightforward approach for handling scale in a deep network is to process an image at several scales simultaneously in a set of scale channels. Scale invariance can then, in principle, be achieved by using weight sharing between the scale channels together with max or average pooling over the outputs from the scale channels. The ability of such scale channel networks to generalise to scales not present in the training set over significant scale ranges has, however, not previously been explored. We, therefore, present a theoretical analysis of invariance and covariance properties of scale channel networks and perform an experimental evaluation of the ability of different types of scale channel networks to generalise to previously unseen scales. We identify limitations of previous approaches and propose a new type of foveated scale channel architecture, where the scale channels process increasingly larger parts of the image with decreasing resolution. Our proposed FovMax and FovAvg networks perform almost identically over a scale range of 8, also when training on single scale training data, and do also give improvements in the small sample regime.

    Fulltekst (pdf)
    fulltext
  • 29.
    Jansson, Ylva
    et al.
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Lindeberg, Tony
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Exploring the ability of CNNs to generalise to previously unseen scales over wide scale ranges2020Rapport (Annet vitenskapelig)
    Abstract [en]

    The ability to handle large scale variations is crucial for many real world visual tasks. A straightforward approach for handling scale in a deep network is to process an image at several scales simultaneously in a set of scale channels. Scale invariance can then, in principle, be achieved by using weight sharing between the scale channels together with max or average pooling over the outputs from the scale channels. The ability of such scale channel networks to generalise to scales not present in the training set over significant scale ranges has, however, not previously been explored. We, therefore, present a theoretical analysis of invariance and covariance properties of scale channel networks and perform an experimental evaluation of the ability of different types of scale channel networks to generalise to previously unseen scales. We identify limitations of previous approaches and propose a new type of foveated scale channel architecture, where the scale channels process increasingly larger parts of the image with decreasing resolution. Our proposed FovMax and FovAvg networks perform almost identically over a scale range of 8 also when training on single scale training data and give improvements in the small sample regime.

    Fulltekst (pdf)
    fulltext
  • 30.
    Jansson, Ylva
    et al.
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Lindeberg, Tony
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    MNIST Large Scale data set2020Dataset
  • 31.
    Jansson, Ylva
    et al.
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Lindeberg, Tony
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Scale-invariant scale-channel networks: Deep networks that generalise to previously unseen scales2022Inngår i: Journal of Mathematical Imaging and Vision, ISSN 0924-9907, E-ISSN 1573-7683, Vol. 64, nr 5, s. 506-536Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    The ability to handle large scale variations is crucial for many real world visual tasks. A straightforward approach for handling scale in a deep network is toprocess an image at several scales simultaneously in a set of scale channels. Scale invariance can then, in principle, be achieved by using weight sharing between the scale channels together with max or average pooling over the outputs from the scale channels. The ability of such scale-channel networks to generalise to scales not present in the training set over significant scale ranges has, however, not previously been explored. 

    In this paper, we present a systematic study of this methodology by implementing different types of scale-channel networks and evaluating their ability to generalise to previously unseen scales. We develop a formalism for analysing the covariance and invariance properties of scale-channel networks, including exploring their relations to scale-space theory, and exploring how different design choices, unique to scaling transformations, affect the overall performance of scale-channel networks. We first show that two previously proposed scale-channel network designs, in one case, generalise no better than a standard CNN to scales not present in the training set, and in the second case, have limited scale generalisation ability. We explain theoretically and demonstrate experimentally why generalisation fails or is limited in these cases. We then propose a new type of foveated scale-channel architecture, where the scale channels process increasingly larger parts of the image with decreasing resolution. This new type of scale-channel network is shown to generalise extremely well, provided sufficient image resolution and the absence of boundary effects. Our proposed FovMax and FovAvg networks perform almost identically over a scale range of 8, also when training on single-scale training data, and do also give improved performance  when learning from datasets with large scale variations in the small sample regime.

    Fulltekst (pdf)
    fulltext
  • 32.
    Jansson, Ylva
    et al.
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Lindeberg, Tony
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Scale-invariant scale-channel networks: Deep networks that generalise to previously unseen scales2021Rapport (Annet vitenskapelig)
    Abstract [en]

    The ability to handle large scale variations is crucial for many real world visual tasks. A straightforward approach for handling scale in a deep network is to process an image at several scales simultaneously in a set of scale channels. Scale invariance can then, in principle, be achieved by using weight sharing between the scale channels together with max or average pooling over the outputs from the scale channels. The ability of such scale channel networks to generalise to scales not present in the training set over significant scale ranges has, however, not previously been explored. 

    In this paper, we present a systematic study of this methodology by implementing different types of scale channel networks and evaluating their ability to generalise to previously unseen scales. We develop a formalism for analysing the covariance and invariance properties of scale channel networks, and explore how different design choices, unique to scaling transformations, affect the overall performance of scale channel networks. We first show that two previously proposed scale channel network designs do not generalise well to scales not present in the training set. We explain theoretically and demonstrate experimentally why generalisation fails in these cases. 

    We then propose a new type of foveated scale channel architecture, where the scale channels process increasingly larger parts of the image with decreasing resolution. This new type of scale channel network is shown to generalise extremely well, provided sufficient image resolution and the absence of boundary effects. Our proposed FovMax and FovAvg networks perform almost identically over a scale range of 8, also when training on single scale training data, and do also give improved performance  when learning from datasets with large scale variations in the small sample regime.

    Fulltekst (pdf)
    fulltext
  • 33.
    Jansson, Ylva
    et al.
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Maydanskiy, Maksim
    Finnveden, Lukas
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Lindeberg, Tony
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Inability of spatial transformations of CNN feature maps to support invariant recognition2020Rapport (Annet vitenskapelig)
    Abstract [en]

    A large number of deep learning architectures use spatial transformations of CNN feature maps or filters to better deal with variability in object appearance caused by natural image transformations. In this paper, we prove that spatial transformations of CNN feature maps cannot align the feature maps of a transformed image to match those of its original, for general affine transformations, unless the extracted features are themselves invariant. Our proof is based on elementary analysis for both the single- and multi-layer network case. The results imply that methods based on spatial transformations of CNN feature maps or filters cannot replace image alignment of the input and cannot enable invariant recognition for general affine transformations, specifically not for scaling transformations or shear transformations. For rotations and reflections, spatially transforming feature maps or filters can enable invariance but only for networks with learnt or hardcoded rotation- or reflection-invariant features

    Fulltekst (pdf)
    fulltext
  • 34.
    Jansson, Ylva
    et al.
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Maydanskiy, Maksim
    Finnveden, Lukas
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Lindeberg, Tony
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).
    Spatial transformations in convolutional networks and invariant recognition.2020Konferansepaper (Fagfellevurdert)
    Abstract [en]

    We show that spatial transformations of CNN feature maps cannot align the feature maps of a transformed image to match those of it’s original for general affine transformations. This implies that methods that spatially transform CNN feature maps, such as spatial transformer networks, dilated or deformable convolutions or spatial pyramid pooling cannot enable true invariance. Our proof is based on elementary analysis for both the single- and multi-layer network cases.

    Fulltekst (pdf)
    fulltext
  • 35. Laptev, I.
    et al.
    Lindeberg, Tony
    KTH, Tidigare Institutioner (före 2005), Numerisk analys och datalogi, NADA.
    A distance measure and a feature likelihood map concept for scale-invariant model matching2003Rapport (Fagfellevurdert)
    Abstract [en]

    This paper presents two approaches for evaluating multi-scale feature-based object models. Within the first approach, a scale-invariant distance measure is proposed for comparing two image representations in terms of multi-scale features. Based on this measure, the maximisation of the likelihood of parameterised feature models allows for simultaneous model selection and parameter estimation. The idea of the second approach is to avoid an explicit feature extraction step and to evaluate models using a function defined directly from the image data. For this purpose, we propose the concept of a feature likelihood map, which is a function normalised to the interval [0, 1], and that approximates the likelihood of image features at all points in scale-space. To illustrate the applicability of both methods, we consider the area of hand gesture analysis and show how the proposed evaluation schemes can be integrated within a particle filtering approach for performing simultaneous tracking and recognition of hand models under variations in the position, orientation, size and posture of the hand. The experiments demonstrate the feasibility of the approach, and that real time performance can be obtained by pyramid implementations of the proposed concepts.

  • 36.
    Laptev, Ivan
    et al.
    IRISA/INRIA.
    Caputo, Barbara
    Schüldt, Christian
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Lindeberg, Tony
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Local velocity-adapted motion events for spatio-temporal recognition2007Inngår i: Computer Vision and Image Understanding, ISSN 1077-3142, E-ISSN 1090-235X, Vol. 108, nr 3, s. 207-229Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    In this paper, we address the problem of motion recognition using event-based local motion representations. We assume that similar patterns of motion contain similar events with consistent motion across image sequences. Using this assumption, we formulate the problem of motion recognition as a matching of corresponding events in image sequences. To enable the matching, we present and evaluate a set of motion descriptors that exploit the spatial and the temporal coherence of motion measurements between corresponding events in image sequences. As the motion measurements may depend on the relative motion of the camera, we also present a mechanism for local velocity adaptation of events and evaluate its influence when recognizing image sequences subjected to different camera motions. When recognizing motion patterns, we compare the performance of a nearest neighbor (NN) classifier with the performance of a support vector machine (SVM). We also compare event-based motion representations to motion representations in terms of global histograms. A systematic experimental evaluation on a large video database with human actions demonstrates that (i) local spatio-temporal image descriptors can be defined to carry important information of space-time events for subsequent recognition, and that (ii) local velocity adaptation is an important mechanism in situations when the relative motion between the camera and the interesting events in the scene is unknown. The particular advantage of event-based representations and velocity adaptation is further emphasized when recognizing human actions in unconstrained scenes with complex and non-stationary backgrounds.

    Fulltekst (pdf)
    fulltext
  • 37.
    Laptev, Ivan
    et al.
    KTH, Tidigare Institutioner (före 2005), Numerisk analys och datalogi, NADA.
    Lindeberg, Tony
    KTH, Tidigare Institutioner (före 2005), Numerisk analys och datalogi, NADA.
    A Distance Measure and a Feature Likelihood Map Concept for Scale-Invariant Model Matching2003Inngår i: International Journal of Computer Vision, ISSN 0920-5691, E-ISSN 1573-1405, Vol. 52, nr 2, s. 97-120Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    This paper presents two approaches for evaluating multi-scale feature-based object models. Within the first approach, a scale-invariant distance measure is proposed for comparing two image representations in terms of multi-scale features. Based on this measure, the maximisation of the likelihood of parameterised feature models allows for simultaneous model selection and parameter estimation.

    The idea of the second approach is to avoid an explicit feature extraction step and to evaluate models using a function defined directly from the image data. For this purpose, we propose the concept of a feature likelihood map, which is a function normalised to the interval [0, 1], and that approximates the likelihood of image features at all points in scale-space.

    To illustrate the applicability of both methods, we consider the area of hand gesture analysis and show how the proposed evaluation schemes can be integrated within a particle filtering approach for performing simultaneous tracking and recognition of hand models under variations in the position, orientation, size and posture of the hand. The experiments demonstrate the feasibility of the approach, and that real time performance can be obtained by pyramid implementations of the proposed concepts.

    Fulltekst (pdf)
    fulltext
  • 38.
    Laptev, Ivan
    et al.
    KTH, Tidigare Institutioner (före 2005), Numerisk analys och datalogi, NADA.
    Lindeberg, Tony
    KTH, Tidigare Institutioner (före 2005), Numerisk analys och datalogi, NADA.
    A multi-scale feature likelihood map for direct evaluation of object hypotheses2001Inngår i: Proc Scale-Space and Morphology in Computer Vision, Springer Berlin/Heidelberg, 2001, Vol. 2106, s. 98-110Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper develops and investigates a new approach for evaluating feature based object hypotheses in a direct way. The idea is to compute a feature likelihood map (FLM), which is a function normalized to the interval [0, 1], and which approximates the likelihood of image features at all points in scale-space. In our case, the FLM is defined from Gaussian derivative operators and in such a way that it assumes its strongest responses near the centers of symmetric blob-like or elongated ridge-like structures and at scales that reflect the size of these structures in the image domain. While the FLM inherits several advantages of feature based image representations, it also (i) avoids the need for explicit search when matching features in object models to image data, and (ii) eliminates the need for thresholds present in most traditional feature based approaches. In an application presented in this paper, the FLM is applied to simultaneous tracking and recognition of hand models based on particle filtering. The experiments demonstrate the feasibility of the approach, and that real time performance can be obtained by a pyramid implementation of the proposed concept.

    Fulltekst (pdf)
    fulltext
  • 39.
    Laptev, Ivan
    et al.
    KTH, Tidigare Institutioner (före 2005), Numerisk analys och datalogi, NADA.
    Lindeberg, Tony
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB.
    Interest point detection and scale selection in space-time2003Inngår i: Scale Space Methods in Computer Vision: 4th International Conference, Scale Space 2003 Isle of Skye, UK, June 10–12, 2003 Proceedings, Springer Berlin/Heidelberg, 2003, Vol. 2695, s. 372-387Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Several types of interest point detectors have been proposed for spatial images. This paper investigates how this notion can be generalised to the detection of interesting events in space-time data. Moreover, we develop a mechanism for spatio-temporal scale selection and detect events at scales corresponding to their extent in both space and time. To detect spatio-temporal events, we build on the idea of the Harris and Forstner interest point operators and detect regions in space-time where the image structures have significant local variations in both space and time. In this way, events that correspond to curved space-time structures are emphasised, while structures with locally constant motion are disregarded. To construct this operator, we start from a multi-scale windowed second moment matrix in space-time, and combine the determinant and the trace in a similar way as for the spatial Harris operator. All space-time maxima of this operator are then adapted to characteristic scales by maximising a scale-normalised space-time Laplacian operator over both spatial scales and temporal scales. The motivation for performing temporal scale selection as a complement to previous approaches of spatial scale selection is to be able to robustly capture spatio-temporal events of different temporal extent. It is shown that the resulting approach is truly scale invariant with respect to both spatial scales and temporal scales. The proposed concept is tested on synthetic and real image sequences. It is shown that the operator responds to distinct and stable points in space-time that often correspond to interesting events. The potential applications of the method are discussed.

    Fulltekst (pdf)
    fulltext
  • 40.
    Laptev, Ivan
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Lindeberg, Tony
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Local descriptors for spatio-temporal recognition2006Inngår i: Spatial Coherence For Visual Motion Analysis: First International Workshop, SCVMA 2004, Prague, Czech Republic, May 15, 2004. Revised Papers / [ed] MacLean, WJ, Springer Berlin/Heidelberg, 2006, Vol. 3667, s. 91-103Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper presents and investigates a set of local space-time descriptors for representing and recognizing motion patterns in video. Following the idea of local features in the spatial domain, we use the notion of space-time interest points and represent video data in terms of local space-time events. To describe such events, we define several types of image descriptors over local spatio-temporal neighborhoods and evaluate these descriptors in the context of recognizing human activities. In particular, we compare motion representations in terms of spatio-temporal jets, position dependent histograms, position independent histograms, and principal component analysis computed for either spatio-temporal gradients or optic flow. An experimental evaluation on a video database with human actions shows that high classification performance can be achieved, and that there is a clear advantage of using local position dependent histograms, consistent with previously reported findings regarding spatial recognition.

    Fulltekst (pdf)
    fulltext
  • 41.
    Laptev, Ivan
    et al.
    KTH, Tidigare Institutioner (före 2005), Numerisk analys och datalogi, NADA.
    Lindeberg, Tony
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB.
    On Space-Time Interest Points2003Rapport (Annet vitenskapelig)
    Abstract [en]

    Local image features or interest points provide compact and abstract representations of patterns in an image. In this paper, we extend the notion of spatial interest points into the spatio-temporal domain and show how the resulting features capture interesting events in video and can be used for a compact representation and for interpretation of video data.

    To detect spatio-temporal events, we build on the idea of the Harris and Forstner interest point operators and detect local structures in space-time where the image values have significant local variations in both space and time. We estimate the spatio-temporal extents of the detected events by maximizing a normalized spatio-temporal Laplacian operator over spatial and temporal scales. To represent the detected events we then compute local, spatio-temporal, scale-invariant N-jets and classify each event with respect to its jet descriptor. For the problem of human motion analysis, we illustrate how video representation in terms of local space-time features allows for detection of walking people in scenes with occlusions and dynamic cluttered backgrounds.

    Fulltekst (pdf)
    fulltext
  • 42.
    Laptev, Ivan
    et al.
    KTH, Tidigare Institutioner (före 2005), Numerisk analys och datalogi, NADA.
    Lindeberg, Tony
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB.
    Space-time interest points2003Inngår i: Proceedings of Ninth IEEE International Conference on Computer Vision, 2003: ICCV'03, IEEE conference proceedings, 2003, s. 432-439Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Local image features or interest points provide compact and abstract representations of patterns in an image. We propose to extend the notion of spatial interest points into the spatio-temporal domain and show how the resulting features often reflect interesting events that can be used for a compact representation of video data as well as for its interpretation. To detect spatio-temporal events, we build on the idea of the Harris and Forstner interest point operators and detect local structures in space-time where the image values have significant local variations in both space and time. We then estimate the spatio-temporal extents of the detected events and compute their scale-invariant spatio-temporal descriptors. Using such descriptors, we classify events and construct video representation in terms of labeled space-time points. For the problem of human motion analysis, we illustrate how the proposed method allows for detection of walking people in scenes with occlusions and dynamic backgrounds.

    Fulltekst (pdf)
    fulltext
  • 43.
    Laptev, Ivan
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Lindeberg, Tony
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB.
    Tracking of multi-state hand models using particle filtering and a hierarchy of multi-scale image features2001Rapport (Fagfellevurdert)
    Abstract [en]

    This paper presents an approach for simultaneous tracking and recognition of hierarchical object representations in terms of multiscale image features. A scale-invariant dissimilarity measure is proposed for comparing scale-space features at different positions and scales. Based on this measure, the likelihood of hierarchical, parameterized models can be evaluated in such a way that maximization of the measure over different models and their parameters allows for both model selection and parameter estimation. Then, within the framework of particle filtering, we consider the area of hand gesture analysis, and present a method for simultaneous tracking and recognition of hand models under variations in the position, orientation, size and posture of the hand. In this way, qualitative hand states and quantitative hand motions can be captured, and be used for controlling different types of computerised equipment.

    Fulltekst (pdf)
    fulltext
  • 44.
    Laptev, Ivan
    et al.
    KTH, Tidigare Institutioner (före 2005), Numerisk analys och datalogi, NADA.
    Lindeberg, Tony
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB.
    Tracking of multi-state hand models using particle filtering and a hierarchy of multi-scale image features2001Inngår i: Scale-Space and Morphology in Computer Vision: Third International Conference, Scale-Space 2001 Vancouver, Canada, July 7–8, 2001 Proceedings, Springer Berlin/Heidelberg, 2001, Vol. 2106, s. 63-74Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper presents an approach for simultaneous tracking and recognition of hierarchical object representations in terms of multiscale image features. A scale-invariant dissimilarity measure is proposed for comparing scale-space features at different positions and scales. Based on this measure, the likelihood of hierarchical, parameterized models can be evaluated in such a way that maximization of the measure over different models and their parameters allows for both model selection and parameter estimation. Then, within the framework of particle filtering, we consider the area of hand gesture analysis, and present a method for simultaneous tracking and recognition of hand models under variations in the position, orientation, size and posture of the hand. In this way, qualitative hand states and quantitative hand motions can be captured, and be used for controlling different types of computerised equipment.

    Fulltekst (pdf)
    fulltext
  • 45.
    Laptev, Ivan
    et al.
    KTH, Tidigare Institutioner (före 2005), Numerisk analys och datalogi, NADA.
    Lindeberg, Tony
    KTH, Tidigare Institutioner (före 2005), Numerisk analys och datalogi, NADA.
    Velocity adaptation of space-time interest points2004Inngår i: Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004 / [ed] Kittler, J; Petrou, M; Nixon, M, IEEE conference proceedings, 2004, s. 52-56Konferansepaper (Fagfellevurdert)
    Abstract [en]

    The notion of local features in space-time has recently been proposed to capture and describe local events in video. When computing space-time descriptors, however, the result may strongly depend on the relative motion between the object and the camera. To compensate for this variation, we present a method that automatically adapts the features to the local velocity of the image pattern and, hence, results in a video representation that is stable with respect to different amounts of camera motion. Experimentally we show that the use of velocity adaptation substantially increases the repeatability of interest points as well as the stability of their associated descriptors. Moreover for an application to human action recognition we demonstrate how velocity adapted features enable recognition of human actions in situations with unknown camera motion and complex, nonstationary backgrounds.

    Fulltekst (pdf)
    fulltext
  • 46.
    Laptev, Ivan
    et al.
    KTH, Tidigare Institutioner (före 2005), Numerisk analys och datalogi, NADA.
    Lindeberg, Tony
    KTH, Tidigare Institutioner (före 2005), Numerisk analys och datalogi, NADA.
    Velocity adaptation of spatio-temporal receptive fields for direct recognition of activities: an experimental study2004Inngår i: Image and Vision Computing, ISSN 0262-8856, E-ISSN 1872-8138, Vol. 22, nr 2, s. 105-116Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    This article presents an experimental study of the influence of velocity adaptation when recognizing spatio-temporal patterns using a histogram-based statistical framework. The basic idea consists of adapting the shapes of the filter kernels to the local direction of motion, so as to allow the computation of image descriptors that are invariant to the relative motion in the image plane between the camera and the objects or events that are studied. Based on a framework of recursive spatio-temporal scale-space, we first outline how a straightforward mechanism for local velocity adaptation can be expressed. Then, for a test problem of recognizing activities, we present an experimental evaluation, which shows the advantages of using velocity-adapted spatio-temporal receptive fields, compared to directional derivatives or regular partial derivatives for which the filter kernels have not been adapted to the local image motion.

    Fulltekst (pdf)
    fulltext
  • 47.
    Laptev, Ivan
    et al.
    KTH, Tidigare Institutioner (före 2005), Numerisk analys och datalogi, NADA.
    Lindeberg, Tony
    KTH, Tidigare Institutioner (före 2005), Numerisk analys och datalogi, NADA.
    Velocity-adapted spatio-temporal receptive fields for direct recognition of activities2002Inngår i: Proc. ECCV’02 Workshop on Statistical Methods in Video Processing, 2002, s. 61-66Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This article presents an experimental study of the influence of velocity adaptation when recognizing spatio-temporal patterns using a histogram-based statistical framework. The basic idea consists of adapting the shapes of the filter kernels to the local direction of motion, so as to allow the computation of image descriptors that are invariant to the relative motion in the image plane between the camera and the objects or events that are studied. Based on a framework of recursive spatio-temporal scale-space, we first outline how a straightforward mechanism for local velocity adaptation can be expressed. Then, for a test problem of recognizing activities, we present an experimental evaluation, which shows the advantages of using velocity-adapted spatio-temporal receptive fields, compared to directional derivatives or regular partial derivatives for which the filter kernels have not been adapted to the local image motion.

    Fulltekst (pdf)
    fulltext
  • 48.
    Laptev, Ivan
    et al.
    KTH, Tidigare Institutioner (före 2005), Numerisk analys och datalogi, NADA.
    Mayer, H.
    Lindeberg, Tony
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB.
    Eckstein, W.
    Steger, C.
    Baumgartner, A.
    Automatic extraction of roads from aerial images based on scale space and snakes2000Inngår i: Machine Vision and Applications, ISSN 0932-8092, E-ISSN 1432-1769, Vol. 12, nr 1, s. 23-31Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    We propose a new approach for automatic road extraction from aerial imagery with a model and a strategy mainly based on the multi-scale detection of roads in combination with geometry-constrained edge extraction using snakes. A main advantage of our approach is, that it allows for the first time a bridging of shadows and partially occluded areas using the heavily disturbed evidence in the image. Additionally, it has only few parameters to be adjusted. The road network is constructed after extracting crossings with varying shape and topology. We show the feasibility of the approach not only by presenting reasonable results but also by evaluating them quantitatively based on ground truth.

    Fulltekst (pdf)
    fulltext
  • 49.
    Linde, Oskar
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB.
    Lindeberg, Tony
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB.
    Composed Complex-Cue Histograms: An Investigation of the Information Content in Receptive Field Based Image Descriptors for Object Recognition2012Inngår i: Computer Vision and Image Understanding, ISSN 1077-3142, E-ISSN 1090-235X, Vol. 116, nr 4, s. 538-560Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Recent work has shown that effective methods for recognizing objects and spatio-temporal events can be constructed based on histograms of receptive field like image operations.

    This paper presents the results of an extensive study of the performance of different types of receptive field like image descriptors for histogram-based object recognition, based on different combinations of image cues in terms of Gaussian derivatives or differential invariants applied to either intensity information, colour-opponent channels or both. A rich set of composed complex-cue image descriptors is introduced and evaluated with respect to the problems of (i) recognizing previously seen object instances from previously unseen views, and (ii) classifying previously unseen objects into visual categories.

    It is shown that there exist novel histogram descriptors with significantly better recognition performance compared to previously used histogram features within the same class. Specifically, the experiments show that it is possible to obtain more discriminative features by combining lower-dimensional scale-space features into composed complex-cue histograms. Furthermore, different types of image descriptors have different relative advantages with respect to the problems of object instance recognition vs. object category classification. These conclusions are obtained from extensive experimental evaluations on two mutually independent data sets.

    For the task of recognizing specific object instances, combined histograms of spatial and spatio-chromatic derivatives are highly discriminative, and several image descriptors in terms rotationally invariant (intensity and spatio-chromatic) differential invariants up to order two lead to very high recognition rates.

    For the task of category classification, primary information is contained in both first- and second-order derivatives, where second-order partial derivatives constitute the most discriminative cue.

    Dimensionality reduction by principal component analysis and variance normalization prior to training and recognition can in many cases lead to a significant increase in recognition or classification performance. Surprisingly high recognition rates can even be obtained with binary histograms that reveal the polarity of local scale-space features, and which can be expected to be particularly robust to illumination variations.

    An overall conclusion from this study is that compared to previously used lower-dimensional histograms, the use of composed complex-cue histograms of higher dimensionality reveals the co-variation of multiple cues and enables much better recognition performance, both with regard to the problems of recognizing previously seen objects from novel views and for classifying previously unseen objects into visual categories.

    Fulltekst (pdf)
    fulltext
  • 50.
    Linde, Oskar
    et al.
    KTH, Tidigare Institutioner (före 2005), Numerisk analys och datalogi, NADA.
    Lindeberg, Tony
    KTH, Tidigare Institutioner (före 2005), Numerisk analys och datalogi, NADA.
    Object recognition using composed receptive field histograms of higher dimensionality2004Inngår i: Proceedings of the 17th International Conference on Pattern Recognition / [ed] Kittler, J; Petrou, M; Nixon, M, IEEE conference proceedings, 2004, s. 1-6Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Recent work has shown that effective methods for recognising objects or spatio-temporal events can be constructed based on receptive field responses summarised into histograms or other histogram-like image descriptors. This paper presents a set Of composed histogram features of higher dimensionality, which give significantly better recognition performance compared to the histogram descriptors of lower dimensionality that were used in the original papers by Swain & Ballard (1991) or Schiele & Crowley (2000). The use of histograms of higher dimensionality is made possible by a sparse representation for efficient computation and handling of higher-dimensional histograms. Results of extensive experiments are reported, showing how the performance of histogram-based recognition schemes depend upon different combinations of cues, in terms of Gaussian derivatives or differential invariants applied to either intensity information, chromatic information or both. It is shown that there exist composed higher-dimensional histogram descriptors with much better performance for recognising known objects than previously used histogram features. Experiments are also reported of classifying unknown objects into visual categories.

    Fulltekst (pdf)
    fulltext
1234 1 - 50 of 159
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf