kth.sePublications
Change search
Link to record
Permanent link

Direct link
Carlsson, Stefan
Alternative names
Publications (10 of 33) Show all publications
Maki, A., Kragic, D., Kjellström, H., Azizpour, H., Sullivan, J., Björkman, M., . . . Sundblad, Y. (2022). In Memoriam: Jan-Olof Eklundh. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9), 4488-4489
Open this publication in new window or tab >>In Memoriam: Jan-Olof Eklundh
Show others...
2022 (English)In: IEEE Transactions on Pattern Analysis and Machine Intelligence, ISSN 0162-8828, E-ISSN 1939-3539, Vol. 44, no 9, p. 4488-4489Article in journal (Refereed) Published
Place, publisher, year, edition, pages
IEEE COMPUTER SOC, 2022
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-316696 (URN)10.1109/TPAMI.2022.3183266 (DOI)000836666600005 ()
Note

QC 20220905

Available from: 2022-09-05 Created: 2022-09-05 Last updated: 2022-09-05Bibliographically approved
Gamba, M., Azizpour, H., Carlsson, S. & Björkman, M. (2019). On the geometry of rectifier convolutional neural networks. In: Proceedings - 2019 International Conference on Computer Vision Workshop, ICCVW 2019: . Paper presented at 17th IEEE/CVF International Conference on Computer Vision Workshop, ICCVW 2019, 27 October 2019 through 28 October 2019 (pp. 793-797). Institute of Electrical and Electronics Engineers Inc.
Open this publication in new window or tab >>On the geometry of rectifier convolutional neural networks
2019 (English)In: Proceedings - 2019 International Conference on Computer Vision Workshop, ICCVW 2019, Institute of Electrical and Electronics Engineers Inc. , 2019, p. 793-797Conference paper, Published paper (Refereed)
Abstract [en]

While recent studies have shed light on the expressivity, complexity and compositionality of convolutional networks, the real inductive bias of the family of functions reachable by gradient descent on natural data is still unknown. By exploiting symmetries in the preactivation space of convolutional layers, we present preliminary empirical evidence of regularities in the preimage of trained rectifier networks, in terms of arrangements of polytopes, and relate it to the nonlinear transformations applied by the network to its input.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers Inc., 2019
Keywords
Convolutional networks, Deep learning, Heometry, Preimage, Understanding, Computer vision, Convolution, Gradient methods, Rectifying circuits, Compositionality, Gradient descent, Inductive bias, Non-linear transformations, Pre images, Convolutional neural networks
National Category
Robotics and automation
Identifiers
urn:nbn:se:kth:diva-274163 (URN)10.1109/ICCVW.2019.00106 (DOI)000554591600099 ()2-s2.0-85082492932 (Scopus ID)
Conference
17th IEEE/CVF International Conference on Computer Vision Workshop, ICCVW 2019, 27 October 2019 through 28 October 2019
Note

QC 20200622

Part of ISBN 9781728150239

Available from: 2020-06-22 Created: 2020-06-22 Last updated: 2025-02-09Bibliographically approved
Razavian, A. S., Sullivan, J., Carlsson, S. & Maki, A. (2019). Visual Instance Retrieval with Deep Convolutional Networks. Kyokai Joho Imeji Zasshi/Journal of the Institute of Image Information and Television Engineers, 73(5), 956-964
Open this publication in new window or tab >>Visual Instance Retrieval with Deep Convolutional Networks
2019 (English)In: Kyokai Joho Imeji Zasshi/Journal of the Institute of Image Information and Television Engineers, ISSN 1342-6907, Vol. 73, no 5, p. 956-964Article in journal (Refereed) Published
Abstract [en]

This paper provides an extensive study on the availability of image representations based on convolutional networks (ConvNets) for the task of visual instance retrieval.Besides the choice of convolutional layers, we present an efficient pipeline exploiting multi-scale schemes to extract local features, in particular, by taking geometric invariancc into explicit account, i.e.positions, scales and spatial consistency.In our experiments using five standard image retrieval datasets, we demonstrate that generic ConvNet image representations can outperform other state-of-the-art methods if they are extracted appropriately. 

Place, publisher, year, edition, pages
Institute of Image Information and Television Engineers, 2019
Keywords
Convolutional network, Learning representation, Multi-resolution search, Visual instance retrieval, Convolution, Image retrieval, Convolutional networks, Image representations, Instance retrieval, Local feature, Multi-scales, Spatial consistency, Standard images, Image representation
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-328992 (URN)10.3169/ITEJ.73.956 (DOI)2-s2.0-85142357643 (Scopus ID)
Note

QC 20230614

Available from: 2023-06-14 Created: 2023-06-14 Last updated: 2025-02-07Bibliographically approved
Carlsson, S., Azizpour, H., Razavian, A. S., Sullivan, J. & Smith, K. (2017). The Preimage of Rectifier Network Activities. In: International Conference on Learning Representations (ICLR): . Paper presented at 5th International Conference on Learning Representations, ICLR 2017, 24-26 April 2017, Toulon, France. International Conference on Learning Representations, ICLR
Open this publication in new window or tab >>The Preimage of Rectifier Network Activities
Show others...
2017 (English)In: International Conference on Learning Representations (ICLR), International Conference on Learning Representations, ICLR , 2017Conference paper, Published paper (Refereed)
Abstract [en]

The preimage of the activity at a certain level of a deep network is the set of inputs that result in the same node activity. For fully connected multi layer rectifier networks we demonstrate how to compute the preimages of activities at arbitrary levels from knowledge of the parameters in a deep rectifying network. If the preimage set of a certain activity in the network contains elements from more than one class it means that these classes are irreversibly mixed. This implies that preimage sets which are piecewise linear manifolds are building blocks for describing the input manifolds specific classes, ie all preimages should ideally be from the same class. We believe that the knowledge of how to compute preimages will be valuable in understanding the efficiency displayed by deep learning networks and could potentially be used in designing more efficient training algorithms.

Place, publisher, year, edition, pages
International Conference on Learning Representations, ICLR, 2017
Keywords
Heuristic algorithms, Piecewise linear techniques, General structures, Input space, Network activities, Optimisations, Piecewise linear, Preimages, Regularization algorithms, Rectifying circuits
National Category
Computer graphics and computer vision Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-259164 (URN)2-s2.0-85093029593 (Scopus ID)
Conference
5th International Conference on Learning Representations, ICLR 2017, 24-26 April 2017, Toulon, France
Note

QC 20230609

Available from: 2019-09-11 Created: 2019-09-11 Last updated: 2025-02-07Bibliographically approved
Azizpour, H., Sharif Razavian, A., Sullivan, J., Maki, A. & Carlssom, S. (2016). Factors of Transferability for a Generic ConvNet Representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(9), 1790-1802, Article ID 7328311.
Open this publication in new window or tab >>Factors of Transferability for a Generic ConvNet Representation
Show others...
2016 (English)In: IEEE Transactions on Pattern Analysis and Machine Intelligence, ISSN 0162-8828, E-ISSN 1939-3539, Vol. 38, no 9, p. 1790-1802, article id 7328311Article in journal (Refereed) Published
Abstract [en]

Evidence is mounting that Convolutional Networks (ConvNets) are the most effective representation learning method for visual recognition tasks. In the common scenario, a ConvNet is trained on a large labeled dataset (source) and the feed-forward units activation of the trained network, at a certain layer of the network, is used as a generic representation of an input image for a task with relatively smaller training set (target). Recent studies have shown this form of representation transfer to be suitable for a wide range of target visual recognition tasks. This paper introduces and investigates several factors affecting the transferability of such representations. It includes parameters for training of the source ConvNet such as its architecture, distribution of the training data, etc. and also the parameters of feature extraction such as layer of the trained ConvNet, dimensionality reduction, etc. Then, by optimizing these factors, we show that significant improvements can be achieved on various (17) visual recognition tasks. We further show that these visual recognition tasks can be categorically ordered based on their similarity to the source task such that a correlation between the performance of tasks and their similarity to the source task w.r.t. the proposed factors is observed.

Place, publisher, year, edition, pages
IEEE Computer Society Digital Library, 2016
National Category
Computer graphics and computer vision
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-177033 (URN)10.1109/TPAMI.2015.2500224 (DOI)000381432700006 ()26584488 (PubMedID)2-s2.0-84981266620 (Scopus ID)
Note

QC 20161208

Available from: 2015-11-13 Created: 2015-11-13 Last updated: 2025-02-07Bibliographically approved
Razavian, A. S., Sullivan, J., Carlsson, S. & Maki, A. (2016). Visual instance retrieval with deep convolutional networks. ITE Transactions on Media Technology and Applications, 4(3), 251-258
Open this publication in new window or tab >>Visual instance retrieval with deep convolutional networks
2016 (English)In: ITE Transactions on Media Technology and Applications, ISSN 2186-7364, Vol. 4, no 3, p. 251-258Article in journal (Refereed) Published
Abstract [en]

This paper provides an extensive study on the availability of image representations based on convolutional networks (ConvNets) for the task of visual instance retrieval. Besides the choice of convolutional layers, we present an efficient pipeline exploiting multi-scale schemes to extract local features, in particular, by taking geometric invariance into explicit account, i.e. positions, scales and spatial consistency. In our experiments using five standard image retrieval datasets, we demonstrate that generic ConvNet image representations can outperform other state-of-the-art methods if they are extracted appropriately.

Place, publisher, year, edition, pages
Institute of Image Information and Television Engineers, 2016
Keywords
Convolutional network, Learning representation, Multi-resolution search, Visual instance retrieval
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-195472 (URN)10.3169/mta.4.251 (DOI)2-s2.0-84979503481 (Scopus ID)
Note

QC 20211129

Available from: 2016-11-25 Created: 2016-11-03 Last updated: 2025-02-07Bibliographically approved
Sharif Razavian, A., Sullivan, J., Maki, A. & Carlsson, S. (2015). A Baseline for Visual Instance Retrieval with Deep Convolutional Networks. In: : . Paper presented at International Conference on Learning Representations,May 7 - 9, 2015, San Diego, CA. San Diego, US: ICLR
Open this publication in new window or tab >>A Baseline for Visual Instance Retrieval with Deep Convolutional Networks
2015 (English)Conference paper, Poster (with or without abstract) (Refereed)
Place, publisher, year, edition, pages
San Diego, US: ICLR, 2015
National Category
Computer Systems
Identifiers
urn:nbn:se:kth:diva-165765 (URN)
Conference
International Conference on Learning Representations,May 7 - 9, 2015, San Diego, CA
Note

QC 20150522

Available from: 2015-04-29 Created: 2015-04-29 Last updated: 2024-03-15Bibliographically approved
Azizpour, H., Razavian, A. S., Sullivan, J., Maki, A. & Carlsson, S. (2015). From Generic to Specific Deep Representations for Visual Recognition. In: Proceedings of CVPR 2015: . Paper presented at CVPRW DeepVision Workshop, 7-12 June 2015, Boston, MA, USA. IEEE conference proceedings
Open this publication in new window or tab >>From Generic to Specific Deep Representations for Visual Recognition
Show others...
2015 (English)In: Proceedings of CVPR 2015, IEEE conference proceedings, 2015Conference paper, Published paper (Refereed)
Abstract [en]

Evidence is mounting that ConvNets are the best representation learning method for recognition. In the common scenario, a ConvNet is trained on a large labeled dataset and the feed-forward units activation, at a certain layer of the network, is used as a generic representation of an input image. Recent studies have shown this form of representation to be astoundingly effective for a wide range of recognition tasks. This paper thoroughly investigates the transferability of such representations w.r.t. several factors. It includes parameters for training the network such as its architecture and parameters of feature extraction. We further show that different visual recognition tasks can be categorically ordered based on their distance from the source task. We then show interesting results indicating a clear correlation between the performance of tasks and their distance from the source task conditioned on proposed factors. Furthermore, by optimizing these factors, we achieve stateof-the-art performances on 16 visual recognition tasks.

Place, publisher, year, edition, pages
IEEE conference proceedings, 2015
Series
IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, ISSN 2160-7508
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-164527 (URN)10.1109/CVPRW.2015.7301270 (DOI)000378887900005 ()2-s2.0-84951960494 (Scopus ID)978-1-4673-6759-2 (ISBN)
Conference
CVPRW DeepVision Workshop, 7-12 June 2015, Boston, MA, USA
Note

QC 20150507. QC 20200701

Available from: 2015-04-17 Created: 2015-04-17 Last updated: 2025-02-07Bibliographically approved
Sharif Razavian, A., Azizpour, H., Maki, A., Sullivan, J., Ek, C. H. & Carlsson, S. (2015). Persistent Evidence of Local Image Properties in Generic ConvNets. In: Paulsen, Rasmus R., Pedersen, Kim S. (Ed.), Image Analysis: 19th Scandinavian Conference, SCIA 2015, Copenhagen, Denmark, June 15-17, 2015. Proceedings. Paper presented at Scandinavian Conference on Image Analysis, Copenhagen, Denmark, 15-17 June, 2015 (pp. 249-262). Springer Publishing Company
Open this publication in new window or tab >>Persistent Evidence of Local Image Properties in Generic ConvNets
Show others...
2015 (English)In: Image Analysis: 19th Scandinavian Conference, SCIA 2015, Copenhagen, Denmark, June 15-17, 2015. Proceedings / [ed] Paulsen, Rasmus R., Pedersen, Kim S., Springer Publishing Company, 2015, p. 249-262Conference paper, Published paper (Refereed)
Abstract [en]

Supervised training of a convolutional network for object classification should make explicit any information related to the class of objects and disregard any auxiliary information associated with the capture of the image or thevariation within the object class. Does this happen in practice? Although this seems to pertain to the very final layers in the network, if we look at earlier layers we find that this is not the case. Surprisingly, strong spatial information is implicit. This paper addresses this, in particular, exploiting the image representation at the first fully connected layer,i.e. the global image descriptor which has been recently shown to be most effective in a range of visual recognition tasks. We empirically demonstrate evidences for the finding in the contexts of four different tasks: 2d landmark detection, 2d object keypoints prediction, estimation of the RGB values of input image, and recovery of semantic label of each pixel. We base our investigation on a simple framework with ridge rigression commonly across these tasks,and show results which all support our insight. Such spatial information can be used for computing correspondence of landmarks to a good accuracy, but should potentially be useful for improving the training of the convolutional nets for classification purposes.

Place, publisher, year, edition, pages
Springer Publishing Company, 2015
Series
Image Processing, Computer Vision, Pattern Recognition, and Graphics ; 9127
National Category
Computer Systems
Identifiers
urn:nbn:se:kth:diva-172140 (URN)10.1007/978-3-319-19665-7_21 (DOI)2-s2.0-84947982864 (Scopus ID)
Conference
Scandinavian Conference on Image Analysis, Copenhagen, Denmark, 15-17 June, 2015
Note

Qc 20150828

Available from: 2015-08-13 Created: 2015-08-13 Last updated: 2024-03-15Bibliographically approved
Azizpour, H., Arefiyan, M., Naderi Parizi, S. & Carlsson, S. (2015). Spotlight the Negatives: A Generalized Discriminative Latent Model. In: : . Paper presented at British Machine Vision Conference (BMVC),7-10 September, Swansea, UK, 2015.
Open this publication in new window or tab >>Spotlight the Negatives: A Generalized Discriminative Latent Model
2015 (English)Conference paper, Published paper (Refereed)
Abstract [en]

Discriminative latent variable models (LVM) are frequently applied to various visualrecognition tasks. In these systems the latent (hidden) variables provide a formalism formodeling structured variation of visual features. Conventionally, latent variables are de-fined on the variation of the foreground (positive) class. In this work we augment LVMsto includenegativelatent variables corresponding to the background class. We formalizethe scoring function of such a generalized LVM (GLVM). Then we discuss a frameworkfor learning a model based on the GLVM scoring function. We theoretically showcasehow some of the current visual recognition methods can benefit from this generalization.Finally, we experiment on a generalized form of Deformable Part Models with negativelatent variables and show significant improvements on two different detection tasks.

National Category
Computer Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-172138 (URN)
Conference
British Machine Vision Conference (BMVC),7-10 September, Swansea, UK, 2015
Note

QC 20150828

Available from: 2015-08-13 Created: 2015-08-13 Last updated: 2024-03-15Bibliographically approved
Organisations

Search in DiVA

Show all publications