Endre søk
Begrens søket
1 - 43 of 43
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Treff pr side
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Disputationsdatum (tidligste først)
  • Disputationsdatum (siste først)
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Disputationsdatum (tidligste først)
  • Disputationsdatum (siste først)
Merk
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 1.
    Afkham, Heydar Maboudi
    et al.
    KTH, Skolan för bioteknologi (BIO), Genteknologi.
    Ek, Carl Henrik
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Carlsson, Stefan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    A topological framework for training latent variable models2014Inngår i: Proceedings - International Conference on Pattern Recognition, 2014, s. 2471-2476Konferansepaper (Fagfellevurdert)
    Abstract [en]

    We discuss the properties of a class of latent variable models that assumes each labeled sample is associated with a set of different features, with no prior knowledge of which feature is the most relevant feature to be used. Deformable-Part Models (DPM) can be seen as good examples of such models. These models are usually considered to be expensive to train and very sensitive to the initialization. In this paper, we focus on the learning of such models by introducing a topological framework and show how it is possible to both reduce the learning complexity and produce more robust decision boundaries. We will also argue how our framework can be used for producing robust decision boundaries without exploiting the dataset bias or relying on accurate annotations. To experimentally evaluate our method and compare with previously published frameworks, we focus on the problem of image classification with object localization. In this problem, the correct location of the objects is unknown, during both training and testing stages, and is considered as a latent variable. ©

  • 2.
    Afkham, Heydar Maboudi
    et al.
    KTH, Skolan för bioteknologi (BIO), Genteknologi.
    Ek, Carl Henrik
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Carlsson, Stefan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Gradual improvement of image descriptor quality2014Inngår i: ICPRAM 2014 - Proceedings of the 3rd International Conference on Pattern Recognition Applications and Methods, 2014, s. 233-238Konferansepaper (Fagfellevurdert)
    Abstract [en]

    In this paper, we propose a framework for gradually improving the quality of an already existing image descriptor. The descriptor used in this paper (Afkham et al., 2013) uses the response of a series of discriminative components for summarizing each image. As we will show, this descriptor has an ideal form in which all categories become linearly separable. While, reaching this form is not feasible, we will argue how by replacing a small fraction of these components, it is possible to obtain a descriptor which is, on average, closer to this ideal form. To do so, we initially identify which components do not contribute to the quality of the descriptor and replace them with more robust components. Here, a joint feature selection method is used to find improved components. As our experiments show, this change directly reflects in the capability of the resulting descriptor in discriminating between different categories.

  • 3.
    Afkham, Heydar Maboudi
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Ek, Carl Henrik
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Carlsson, Stefan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Initialization framework for latent variable models2014Inngår i: ICPRAM 2014 - Proceedings of the 3rd International Conference on Pattern Recognition Applications and Methods, 2014, s. 227-232Konferansepaper (Fagfellevurdert)
    Abstract [en]

    In this paper, we discuss the properties of a class of latent variable models that assumes each labeled sample is associated with set of different features, with no prior knowledge of which feature is the most relevant feature to be used. Deformable-Part Models (DPM) can be seen as good example of such models. While Latent SVM framework (LSVM) has proven to be an efficient tool for solving these models, we will argue that the solution found by this tool is very sensitive to the initialization. To decrease this dependency, we propose a novel clustering procedure, for these problems, to find cluster centers that are shared by several sample sets while ignoring the rest of the cluster centers. As we will show, these cluster centers will provide a robust initialization for the LSVM framework.

  • 4.
    Aghazadeh, Omid
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Azizpour, Hossein
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Sullivan, Josephine
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Carlsson, Stefan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Mixture component identification and learning for visual recognition2012Inngår i: Computer Vision – ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part VI, Springer, 2012, s. 115-128Konferansepaper (Fagfellevurdert)
    Abstract [en]

    The non-linear decision boundary between object and background classes - due to large intra-class variations - needs to be modelled by any classifier wishing to achieve good results. While a mixture of linear classifiers is capable of modelling this non-linearity, learning this mixture from weakly annotated data is non-trivial and is the paper's focus. Our approach is to identify the modes in the distribution of our positive examples by clustering, and to utilize this clustering in a latent SVM formulation to learn the mixture model. The clustering relies on a robust measure of visual similarity which suppresses uninformative clutter by using a novel representation based on the exemplar SVM. This subtle clustering of the data leads to learning better mixture models, as is demonstrated via extensive evaluations on Pascal VOC 2007. The final classifier, using a HOG representation of the global image patch, achieves performance comparable to the state-of-the-art while being more efficient at detection time.

  • 5.
    Aghazadeh, Omid
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Carlsson, Stefan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Large Scale, Large Margin Classification using Indefinite Similarity MeasurensManuskript (preprint) (Annet vitenskapelig)
  • 6.
    Aghazadeh, Omid
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Carlsson, Stefan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Properties of Datasets Predict the Performance of Classifiers2013Inngår i: BMVC 2013 - Electronic Proceedings of the British Machine Vision Conference 2013, British Machine Vision Association, BMVA , 2013Konferansepaper (Fagfellevurdert)
    Abstract [en]

    It has been shown that the performance of classifiers depends not only on the number of training samples, but also on the quality of the training set [10, 12]. The purpose of this paper is to 1) provide quantitative measures that determine the quality of the training set and 2) provide the relation between the test performance and the proposed measures. The measures are derived from pairwise affinities between training exemplars of the positive class and they have a generative nature. We show that the performance of the state of the art methods, on the test set, can be reasonably predicted based on the values of the proposed measures on the training set. These measures open up a wide range of applications to the recognition community enabling us to analyze the behavior of the learning algorithms w.r.t the properties of the training data. This will in turn enable us to devise rules for the automatic selection of training data that maximize the quantified quality of the training set and thereby improve recognition performance.

  • 7.
    Aghazadeh, Omid
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Carlsson, Stefan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Properties of Datasets Predict the Performance of Classifiers2013Manuskript (preprint) (Annet vitenskapelig)
  • 8.
    Aghazadeh, Omid
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Sullivan, Josephine
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Carlsson, Stefan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Multi view registration for novelty/background separation2012Inngår i: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, IEEE Computer Society, 2012, s. 757-764Konferansepaper (Fagfellevurdert)
    Abstract [en]

    We propose a system for the automatic segmentation of novelties from the background in scenarios where multiple images of the same environment are available e.g. obtained by wearable visual cameras. Our method finds the pixels in a query image corresponding to the underlying background environment by comparing it to reference images of the same scene. This is achieved despite the fact that all the images may have different viewpoints, significantly different illumination conditions and contain different objects cars, people, bicycles, etc. occluding the background. We estimate the probability of each pixel, in the query image, belonging to the background by computing its appearance inconsistency to the multiple reference images. We then, produce multiple segmentations of the query image using an iterated graph cuts algorithm, initializing from these estimated probabilities and consecutively combine these segmentations to come up with a final segmentation of the background. Detection of the background in turn highlights the novel pixels. We demonstrate the effectiveness of our approach on a challenging outdoors data set.

  • 9.
    Aghazadeh, Omid
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Sullivan, Josephine
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Carlsson, Stefan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Novelty Detection from an Ego-Centric perspective2011Inngår i: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2011, s. 3297-3304Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper demonstrates a system for the automatic extraction of novelty in images captured from a small video camera attached to a subject's chest, replicating his visual perspective, while performing activities which are repeated daily. Novelty is detected when a (sub)sequence cannot be registered to previously stored sequences captured while performing the same daily activity. Sequence registration is performed by measuring appearance and geometric similarity of individual frames and exploiting the invariant temporal order of the activity. Experimental results demonstrate that this is a robust way to detect novelties induced by variations in the wearer's ego-motion such as stopping and talking to a person. This is an essentially new and generic way of automatically extracting information of interest to the camera wearer and can be used as input to a system for life logging or memory support.

  • 10.
    Azizpour, Hossein
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Arefiyan, Mostafa
    Naderi Parizi, Sobhan
    Carlsson, Stefan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Spotlight the Negatives: A Generalized Discriminative Latent Model2015Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Discriminative latent variable models (LVM) are frequently applied to various visualrecognition tasks. In these systems the latent (hidden) variables provide a formalism formodeling structured variation of visual features. Conventionally, latent variables are de-fined on the variation of the foreground (positive) class. In this work we augment LVMsto includenegativelatent variables corresponding to the background class. We formalizethe scoring function of such a generalized LVM (GLVM). Then we discuss a frameworkfor learning a model based on the GLVM scoring function. We theoretically showcasehow some of the current visual recognition methods can benefit from this generalization.Finally, we experiment on a generalized form of Deformable Part Models with negativelatent variables and show significant improvements on two different detection tasks.

  • 11.
    Azizpour, Hossein
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Carlsson, Stefan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Self-tuned Visual Subclass Learning with Shared Samples An Incremental Approach2013Artikkel, forskningsoversikt (Annet vitenskapelig)
    Abstract [en]

    Computer vision tasks are traditionally defined and eval-uated using semantic categories. However, it is known to thefield that semantic classes do not necessarily correspondto a unique visual class (e.g. inside and outside of a car).Furthermore, many of the feasible learning techniques athand cannot model a visual class which appears consistentto the human eye. These problems have motivated the useof 1) Unsupervised or supervised clustering as a prepro-cessing step to identify the visual subclasses to be used ina mixture-of-experts learning regime. 2) Felzenszwalb etal. part model and other works model mixture assignmentwith latent variables which is optimized during learning 3)Highly non-linear classifiers which are inherently capableof modelling multi-modal input space but are inefficient atthe test time. In this work, we promote an incremental viewover the recognition of semantic classes with varied appear-ances. We propose an optimization technique which incre-mentally finds maximal visual subclasses in a regularizedrisk minimization framework. Our proposed approach uni-fies the clustering and classification steps in a single algo-rithm. The importance of this approach is its compliancewith the classification via the fact that it does not need toknow about the number of clusters, the representation andsimilarity measures used in pre-processing clustering meth-ods a priori. Following this approach we show both quali-tatively and quantitatively significant results. We show thatthe visual subclasses demonstrate a long tail distribution.Finally, we show that state of the art object detection meth-ods (e.g. DPM) are unable to use the tails of this distri-bution comprising 50% of the training samples. In fact weshow that DPM performance slightly increases on averageby the removal of this half of the data.

  • 12.
    Azizpour, Hossein
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Razavian, Ali Sharif
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Sullivan, Josephine
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Maki, Atsuto
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Carlsson, Stefan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    From Generic to Specific Deep Representations for Visual Recognition2015Inngår i: Proceedings of CVPR 2015, IEEE conference proceedings, 2015Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Evidence is mounting that ConvNets are the best representation learning method for recognition. In the common scenario, a ConvNet is trained on a large labeled dataset and the feed-forward units activation, at a certain layer of the network, is used as a generic representation of an input image. Recent studies have shown this form of representation to be astoundingly effective for a wide range of recognition tasks. This paper thoroughly investigates the transferability of such representations w.r.t. several factors. It includes parameters for training the network such as its architecture and parameters of feature extraction. We further show that different visual recognition tasks can be categorically ordered based on their distance from the source task. We then show interesting results indicating a clear correlation between the performance of tasks and their distance from the source task conditioned on proposed factors. Furthermore, by optimizing these factors, we achieve stateof-the-art performances on 16 visual recognition tasks.

  • 13.
    Azizpour, Hossein
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Sharif Razavian, Ali
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Sullivan, Josephine
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Maki, Atsuto
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Carlssom, Stefan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Factors of Transferability for a Generic ConvNet Representation2016Inngår i: IEEE Transaction on Pattern Analysis and Machine Intelligence, ISSN 0162-8828, E-ISSN 1939-3539, Vol. 38, nr 9, s. 1790-1802, artikkel-id 7328311Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Evidence is mounting that Convolutional Networks (ConvNets) are the most effective representation learning method for visual recognition tasks. In the common scenario, a ConvNet is trained on a large labeled dataset (source) and the feed-forward units activation of the trained network, at a certain layer of the network, is used as a generic representation of an input image for a task with relatively smaller training set (target). Recent studies have shown this form of representation transfer to be suitable for a wide range of target visual recognition tasks. This paper introduces and investigates several factors affecting the transferability of such representations. It includes parameters for training of the source ConvNet such as its architecture, distribution of the training data, etc. and also the parameters of feature extraction such as layer of the trained ConvNet, dimensionality reduction, etc. Then, by optimizing these factors, we show that significant improvements can be achieved on various (17) visual recognition tasks. We further show that these visual recognition tasks can be categorically ordered based on their similarity to the source task such that a correlation between the performance of tasks and their similarity to the source task w.r.t. the proposed factors is observed.

  • 14.
    Burenius, Magnus
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Sullivan, Josephine
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Carlsson, Stefan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    3D pictorial structures for multiple view articulated pose estimation2013Inngår i: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE Computer Society, 2013, s. 3618-3625Konferansepaper (Fagfellevurdert)
    Abstract [en]

    We consider the problem of automatically estimating the 3D pose of humans from images, taken from multiple calibrated views. We show that it is possible and tractable to extend the pictorial structures framework, popular for 2D pose estimation, to 3D. We discuss how to use this framework to impose view, skeleton, joint angle and intersection constraints in 3D. The 3D pictorial structures are evaluated on multiple view data from a professional football game. The evaluation is focused on computational tractability, but we also demonstrate how a simple 2D part detector can be plugged into the framework.

  • 15.
    Burenius, Magnus
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Sullivan, Josephine
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Carlsson, Stefan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Motion Capture from Dynamic Orthographic Cameras2011Inngår i: 4DMOD - 1st IEEE Workshop on Dynamic Shape Capture and Analysis, 2011Konferansepaper (Fagfellevurdert)
    Abstract [en]

    We present an extension to the scaled orthographic camera model. It deals with dynamic cameras looking at faraway objects. The camera is allowed to change focal lengthand translate and rotate in 3D. The model we derive saysthat this motion can be treated as scaling, translation androtation in a 2D image plane. It is valid if the camera and itstarget move around in two separate regions that are smallcompared to the distance between them.We show two applications of this model to motion capture applications at large distances, i.e. outside a studio,using the affine factorization algorithm. The model is usedto motivate theoretically why the factorization can be carried out in a single batch step, when having both dynamiccameras and a dynamic object. Furthermore, the model isused to motivate how the position of the object can be reconstructed by measuring the virtual 2D motion of the cameras. For testing we use videos from a real football gameand reconstruct the 3D motion of a footballer as he scoresa goal.

  • 16.
    Burenius, Magnus
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Sullivan, Josephine
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Carlsson, Stefan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Halvorsen, Kjartan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Human 3D Motion Computation from a varying Number of Cameras2011Inngår i: Image Analysis, Springer Berlin / Heidelberg , 2011, s. 24-35Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper focuses on how the accuracy of marker-less human motion capture is affected by the number of camera views used. Specifically, we compare the 3D reconstructions calculated from single and multiple cameras. We perform our experiments on data consisting of video from multiple cameras synchronized with ground truth 3D motion, obtained from a motion capture session with a professional footballer. The error is compared for the 3D reconstructions, of diverse motions, estimated using the manually located image joint positions from one, two or three cameras. We also present a new bundle adjustment procedure using regression splines to impose weak prior assumptions about human motion, temporal smoothness and joint angle limits, on the 3D reconstruction. The results show that even under close to ideal circumstances the monocular 3D reconstructions contain visual artifacts not present in the multiple view case, indicating accurate and efficient marker-less human motion capture requires multiple cameras.

  • 17.
    Carlsson, Stefan
    KTH, Tidigare Institutioner                               , Numerisk analys och datalogi, NADA.
    Recognizing walking people2003Inngår i: The international journal of robotics research, ISSN 0278-3649, E-ISSN 1741-3176, Vol. 22, nr 6, s. 359-369Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    We present a method for the recognition of walking people in monocular image sequences based on the extraction of coordinates of specific point locations on the body. The method works by a comparison of sequences of recorded coordinates with a library of sequences from different individuals. The comparison is based on the evaluation of view invariant and calibration independent view consistency constraints. These constraints are functions of corresponding image coordinates in two views and are satisfied whenever the two views are projected from the same three-dimensional (3D) object. By evaluating the view consistency constraints for each pair of frames in a sequence of a walking person and a stored sequence, we obtain a matrix of consistency values that ideally are zero whenever the pair of images depict the same 3D posture. The method is virtually parameter free and computes a consistency residual between a pair of sequences that can be used as a distance for clustering and classification. Using interactively extracted data we present experimental results that are superior to those of previously published algorithms both in terms of performance and generality.

  • 18.
    Carlsson, Stefan
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Hayman, Eric
    Sullivan, Josephine
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Method and device for generating wide image sequences2004Patent (Annet (populærvitenskap, debatt, mm))
    Abstract [en]

    The invention relates to a video recording apparatus comprising: a microprocessor (130), a memory means (120) for storing program for generating a set of calibration parameters related to a device having at least two video cameras which are arranged in a predetermined relationship to each other, said parameters being unique for the at least two cameras and their current location as related to the object being recorded; said memory means (120) also storing program for recording of wide image video sequences; read and write memory means (140) for storing data relating to recorded video sequences from at least two video cameras; input means (300) for input of manual input of parameters, input of recorded video sequences, output means (300) for output of a wide image video sequence. The invention also relates to a method for generating a wide image video sequence, said method comprising the steps of generating a set of calibration parameters related to a device having at least two video cameras which are arranged in a predetermined relationship to each other, said parameters being unique for the at least two cameras and their current location as related to the object being recorded; recording synchronously video sequences using each of said at least two video cameras, and generating a wide image video sequence from each of said synchronously recorded video sequences.

  • 19. Damangir, Soheil
    et al.
    Manzouri, Amirhossein
    Oppedal, Ketil
    Carlsson, Stefan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Firbank, Michael J.
    Sonnesyn, Hogne
    Tysnes, Ole-Bjorn
    O'Brien, John T.
    Beyer, Mona K.
    Westman, Eric
    Aarsland, Dag
    Wahlund, Lars-Olof
    Spulber, Gabriela
    Multispectral MRI segmentation of age related white matter changes using a cascade of support vector machines2012Inngår i: Journal of the Neurological Sciences, ISSN 0022-510X, E-ISSN 1878-5883, Vol. 322, nr 1-2, s. 211-216Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    White matter changes (WMC) are the focus of intensive research and have been linked to cognitive impairment and depression in the elderly. Cumbersome manual outlining procedures make research on WMC labor intensive and prone to subjective bias. We present a fast, fully automated method for WMC segmentation using a cascade of reduced support vector machines (SVMs) with active learning. Data of 102 subjects was used in this study. Two MRI sequences (T1-weighted and FLAIR) and masks of manually outlined WMC from each subject were used for the image analysis. The segmentation framework comprises pre-processing, classification (training and core segmentation) and post-processing. After pre-processing, the model was trained on two subjects and tested on the remaining 100 subjects. The effectiveness and robustness of the classification was assessed using the receiver operating curve technique. The cascade of SVMs segmentation framework outputted accurate results with high sensitivity (90%) and specificity (99.5%) values, with the manually outlined WMC as reference. An algorithm for the segmentation of WMC is proposed. This is a completely competitive and fast automatic segmentation framework, capable of using different input sequences, without changes or restrictions of the image analysis algorithm.

  • 20.
    Danielsson, Oscar
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Carlsson, Stefan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Generic Object Class Detection using Boosted Configurations of Oriented Edges2010Inngår i: Computer Vision – ACCV 2010 / [ed] Kimmel, R; Klette, R; Sugimoto, A, Springer Berlin/Heidelberg, 2010, s. 1-14Konferansepaper (Fagfellevurdert)
    Abstract [en]

    In this paper we introduce a new representation for shape-based object class detection. This representation is based on very sparse and slightly flexible configurations of oriented edges. An ensemble of such configurations is learnt in a boosting framework. Each edge configuration can capture some local or global shape property of the target class and the representation is thus not limited to representing and detecting visual classes that have distinctive local structures. The representation is also able to handle significant intra-class variation. The representation allows for very efficient detection and can be learnt automatically from weakly labelled training images of the target class. The main drawback of the method is that, since its inductive bias is rather weak, it needs a comparatively large training set. We evaluate on a standard database [1] and when using a slightly extended training set, our method outperforms state of the art [2] on four out of five classes.

  • 21.
    Danielsson, Oscar
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Carlsson, Stefan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Generic Object Class Detection using Feature Maps2011Inngår i: Proceedings of Scandinavian Conference on Image Analysis, 2011, s. 348-359Konferansepaper (Fagfellevurdert)
    Abstract [en]

    In this paper we describe an object class model and a detection scheme based on feature maps, i.e. binary images indicating occurrences of various local features. Any type of local feature and any number of features can be used to generate feature maps. The choice of which features to use can thus be adapted to the task at hand, without changing the general framework. An object class is represented by a boosted decision tree classifier (which may be cascaded) based on normalized distances to feature occurrences. The resulting object class model is essentially a linear combination of a set of flexible configurations of the features used. Within this framework we present an efficient detection scheme that uses a hierarchical search strategy. We demonstrate experimentally that this detection scheme yields a significant speedup compared to sliding window search. We evaluate the detection performance on a standard dataset [7], showing state of the art results. Features used in this paper include edges, corners, blobs and interest points.

  • 22.
    Danielsson, Oscar
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Carlsson, Stefan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Projectable Classifiers for Multi-View Object Class Recognition2011Inngår i: 3rd International IEEE Workshop on 3D Representation and Recognition, 2011Konferansepaper (Fagfellevurdert)
    Abstract [en]

    We propose a multi-view object class modeling framework based on a simplified camera model and surfels (defined by a location and normal direction in a normalized 3D coordinate system) that mediate coarse correspondences between different views. Weak classifiers are learnt relative to the reference frames provided by the surfels. We describe a weak classifier that uses contour information when its corresponding surfel projects to a contour element in the image and color information when the face of the surfel is visible in the image. We emphasize that these weak classifiers can possibly take many different forms and use many different image features. Weak classifiers are combined using AdaBoost. We evaluate the method on a public dataset [8], showing promising results on categorization, recognition/detection, pose estimation and image synthesis.

  • 23.
    Danielsson, Oscar
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Carlsson, Stefan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Sullivan, Josephine
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Automatic Learning and Extraction of Multi-Local Features2009Inngår i: Proceedings of the IEEE International Conference on Computer Vision, 2009, s. 917-924Konferansepaper (Fagfellevurdert)
    Abstract [en]

    In this paper we introduce a new kind of feature - the multi-local feature, so named as each one is a collection of local features, such as oriented edgels, in a very specific spatial arrangement. A multi-local feature has the ability to capture underlying constant shape properties of exemplars from an object class. Thus it is particularly suited to representing and detecting visual classes that lack distinctive local structures and are mainly defined by their global shape. We present algorithms to automatically learn an ensemble of these features to represent an object class from weakly labelled training images of that class, as well as procedures to detect these features efficiently in novel images. The power of multi-local features is demonstrated by using the ensemble in a simple voting scheme to perform object category detection on a standard database. Despite its simplicity, this scheme yields detection rates matching state-of-the-art object detection systems.

  • 24.
    Danielsson, Oscar
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Carlsson, Stefan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Sullivan, Josephine
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Object Detection using Multi-Local Feature Manifolds2008Inngår i: Proceedings - Digital Image Computing: Techniques and Applications, DICTA 2008, 2008, s. 612-618Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Many object categories are better characterized by the shape of their contour than by local appearance properties like texture or color. Multi-local features are designed in order to capture the global discriminative structure of an object while at the same time avoiding the drawbacks with traditional global descriptors such as sensitivity to irrelevant image properties. The specific structure of multi-local features allows us to generate new feature exemplars by linear combinations which effectively increases the set of stored training exemplars. We demonstrate that a multi-local feature is a good "weak detector" of shape-based object categories and that it can accurately estimate the bounding box of objects in an image. Using just a single multi-local feature descriptor we obtain detection results comparable to those of more complex and elaborate systems. It is our opinion that multi-local features have a great potential as generic object descriptors with very interesting possibilities of feature sharing within and between classes.

  • 25.
    Danielsson, Oscar
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Rasolzadeh, Babak
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Carlsson, Stefan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Gated Classifiers: Boosting under high intra-class variation2011Inngår i: Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, 2011, s. 2673-2680Konferansepaper (Fagfellevurdert)
    Abstract [en]

    In this paper we address the problem of using boosting (e.g. AdaBoost [7]) to classify a target class with significant intra-class variation against a large background class. This situation occurs for example when we want to recognize a visual object class against all other image patches. The boosting algorithm produces a strong classifier, which is a linear combination of weak classifiers. We observe that we often have sets of weak classifiers that individually fire on many examples of the target class but never fire together on those examples (i.e. their outputs are anti-correlated on the target class). Motivated by this observation we suggest a family of derived weak classifiers, termed gated classifiers, that suppress such combinations of weak classifiers. Gated classifiers can be used on top of any original weak learner. We run experiments on two popular datasets, showing that our method reduces the required number of weak classifiers by almost an order of magnitude, which in turn yields faster detectors. We experiment on synthetic data showing that gated classifiers enables more complex distributions to be represented. We hope that gated classifiers will extend the usefulness of boosted classifier cascades [29].

  • 26.
    Eriksson, Martin
    et al.
    KTH, Tidigare Institutioner, Numerisk analys och datalogi, NADA.
    Carlsson, Stefan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Maximizing validity in 2D motion analysis2004Inngår i: PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2 / [ed] Kittler, J; Petrou, M; Nixon, M, 2004, s. 179-183Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Classifying and analyzing human motion from a video is relatively common in many areas. Since the motion is carried out in 3D space, the 2D projection provided by a video is somewhat limiting. The question we are investigating in this article is how much information is actually lost when going from 3D to 2D and how this information loss depends on factors, such as viewpoint and tracking errors that inevitably will occur if the 2D sequences are analysed automatically.

  • 27.
    Eriksson, Martin
    et al.
    KTH, Tidigare Institutioner, Numerisk analys och datalogi, NADA.
    Carlsson, Stefan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Monocular reconstruction of human motion by qualitative selection2004Inngår i: SIXTH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION, PROCEEDINGS, LOS ALAMITOS: IEEE COMPUTER SOC , 2004, s. 863-868Konferansepaper (Fagfellevurdert)
    Abstract [en]

    One of the main difficulties when reconstructing human motion from monocular video is the depth ambiguity. Achieving a reconstruction, given the projection of the joints, can be regarded as a search-problem, where the objective is to find the most likely configuration. One inherent problem in such a formulation is the definition of "most likely". In this work we will pick the configuration that best complies with a set of training-data in a qualitative sense. The reason for doing this is to allow for large individual variation within the class of motions, and avoid an extreme bias towards the training-data. In order to capture the qualitative constraints, we have used a set of 3D motion capture data of walking people. The method is tested on orthographic projections of motion capture data, in order to compare the achieved reconstruction with the original motion.

  • 28. Liebowitz, D.
    et al.
    Carlsson, Stefan
    KTH, Tidigare Institutioner                               , Numerisk analys och datalogi, NADA.
    Uncalibrated motion capture exploiting articulated structure constraints2003Inngår i: International Journal of Computer Vision, ISSN 0920-5691, E-ISSN 1573-1405, Vol. 51, nr 3, s. 171-187Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    We present an algorithm for 3D reconstruction of dynamic articulated structures, such as humans, from uncalibrated multiple views. The reconstruction exploits constraints associated with a dynamic articulated structure, specifically the conservation over time of length between rotational joints. These constraints admit reconstruction of metric structure from at least two different images in each of two uncalibrated parallel projection cameras. As a by product, the calibration of the cameras can also be computed. The algorithm is based on a stratified approach, starting with affine reconstruction from factorization, followed by rectification to metric structure using the articulated structure constraints. The exploitation of these specific constraints admits reconstruction and self-calibration with fewer feature points and views compared to standard self-calibration. The method is extended to pairs of cameras that are zooming, where calibration of the cameras allows compensation for the changing scale factor in a scaled orthographic camera. Results are presented in the form of stick figures and animated 3D reconstructions using pairs of sequences from broadcast television. The technique shows promise as a means of creating 3D animations of dynamic activities such as sports events.

  • 29.
    Loy, Gareth
    et al.
    KTH, Tidigare Institutioner, Numerisk analys och datalogi, NADA.
    Eriksson, Martin
    KTH, Tidigare Institutioner, Numerisk analys och datalogi, NADA.
    Sullivan, Josephine
    KTH, Tidigare Institutioner, Numerisk analys och datalogi, NADA.
    Carlsson, Stefan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Monocular 3D reconstruction of human motion in long action sequences2004Inngår i: COMPUTER VISION: ECCV 2004, PT 4, BERLIN: SPRINGER , 2004, Vol. 2034, s. 442-455Konferansepaper (Fagfellevurdert)
    Abstract [en]

    A novel algorithm is presented for the 3D reconstruction of human action in long (> 30 second) monocular image sequences. A sequence is represented by a small set of automatically found representative keyframes. The skeletal joint positions are manually located in each keyframe and mapped to all other frames in the sequence. For each keyframe a 3D key pose is created, and interpolation between these 3D body poses, together with the incorporation of limb length and symmetry constraints, provides a smooth initial approximation of the 3D motion. This is then fitted to the image data to generate a realistic 3D reconstruction. The degree of manual input required is controlled by the diversity of the sequence's content. Sports' footage is ideally suited to this approach as it frequently contains a limited number of repeated actions. Our method is demonstrated on a long (36 second) sequence of a woman playing tennis filmed with a non-stationary camera. This sequence required manual initialisation on < 1.5% of the frames, and demonstrates that the system can deal with very rapid motion, severe self-occlusions, motion blur and clutter occurring over several concurrent frames. The monocular 3D reconstruction is verified by synthesising a view from the perspective of a 'ground truth' reference camera, and the result is seen to provide a qualitatively accurate 3D reconstruction of the motion.

  • 30.
    Maboudi Afkham, Heydar
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Carlsson, Stefan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Sullivan, Josephine
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Improving feature level likelihoods using cloud features2012Inngår i: ICPRAM - Proc. Int. Conf. Pattern Recogn. Appl. Methods, 2012, s. 431-437Konferansepaper (Fagfellevurdert)
    Abstract [en]

    The performance of many computer vision methods depends on the quality of the local features extracted from the images. For most methods the local features are extracted independently of the task and they remain constant through the whole process. To make features more dynamic and give models a choice in the features they can use, this work introduces a set of intermediate features referred as cloud features. These features take advantage of part-based models at the feature level by combining each extracted local feature with its close by local feature creating a cloud of different representations for each local features. These representations capture the local variations around the local feature. At classification time, the best possible representation is pulled out of the cloud and used in the calculations. This selection is done based on several latent variables encoded within the cloud features. The goal of this paper is to test how the cloud features can improve the feature level likelihoods. The focus of the experiments of this paper is on feature level inference and showing how replacing single features with equivalent cloud features improves the likelihoods obtained from them. The experiments of this paper are conducted on several classes of MSRCv1 dataset.

  • 31.
    Maboudi Afkham, Heydar
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Ek, Carl Henrik
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Carlsson, Stefan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Qualitative vocabulary based descriptor2013Inngår i: ICPRAM 2013: Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods, 2013, s. 188-193Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Creating a single feature descriptors from a collection of feature responses is an often occurring task. As such the bag-of-words descriptors have been very successful and applied to data from a large range of different domains. Central to this approach is making an association of features to words. In this paper we present a new and novel approach to feature to word association problem. The proposed method creates a more robust representation when data is noisy and requires less words compared to the traditional methods while retaining similar performance. We experimentally evaluate the method on a challenging image classification data-set and show significant improvement to the state of the art.

  • 32.
    Madry, Marianna
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Autonoma System, CAS.
    Maboudi Afkham, Heydar
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Autonoma System, CAS.
    Ek, Carl Henrik
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Autonoma System, CAS.
    Carlsson, Stefan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Autonoma System, CAS.
    Kragic, Danica
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Autonoma System, CAS.
    Extracting essential local object characteristics for 3D object categorization2013Inngår i: 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE conference proceedings, 2013, s. 2240-2247Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Most object classes share a considerable amount of local appearance and often only a small number of features are discriminative. The traditional approach to represent an object is based on a summarization of the local characteristics by counting the number of feature occurrences. In this paper we propose the use of a recently developed technique for summarizations that, rather than looking into the quantity of features, encodes their quality to learn a description of an object. Our approach is based on extracting and aggregating only the essential characteristics of an object class for a task. We show how the proposed method significantly improves on previous work in 3D object categorization. We discuss the benefits of the method in other scenarios such as robot grasping. We provide extensive quantitative and qualitative experiments comparing our approach to the state of the art to justify the described approach.

  • 33.
    Nillius, Peter
    et al.
    Institute of Computer Science - FORTH.
    Sullivan, Josephine
    KTH, Skolan för datavetenskap och kommunikation (CSC), Numerisk Analys och Datalogi, NADA.
    Carlsson, Stefan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Numerisk Analys och Datalogi, NADA.
    Multi-Target Tracking -- Linking Identities using Bayesian Network Inference2006Inngår i: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Los Alamitos, CA, USA: IEEE Computer Society, 2006, s. 2187-2194Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Multi-target tracking requires locating the targets and labeling their identities. The latter is a challenge when many targets, with indistinct appearances, frequently occlude one another, as in football and surveillance tracking. We present an approach to solving this labeling problem.

    When isolated, a target can be tracked and its identity maintained. While, if targets interact this is not always the case. This paper assumes a track graph exists, denoting when targets are isolated and describing how they interact. Measures of similarity between isolated tracks are defined. The goal is to associate the identities of the isolated tracks, by exploiting the graph constraints and similarity measures.

    We formulate this as a Bayesian network inference problem, allowing us to use standard message propagation to find the most probable set of paths in an efficient way. The high complexity inevitable in large problems is gracefully reduced by removing dependency links between tracks. We apply the method to a 10 min sequence of an international football game and compare results to ground truth.

  • 34. Rother, C.
    et al.
    Carlsson, Stefan
    KTH, Tidigare Institutioner                               , Numerisk analys och datalogi, NADA.
    Linear multi view reconstruction and camera recovery using a reference plane2002Inngår i: International Journal of Computer Vision, ISSN 0920-5691, E-ISSN 1573-1405, Vol. 49, nr 03-feb, s. 117-141Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    This paper presents a linear algorithm for simultaneous computation of 3D points and camera positions from multiple perspective views based on having a reference plane visible in all views. The reconstruction and camera recovery is achieved in a single step by finding the null-space of a matrix built from image data using Singular Value Decomposition. Contrary to factorization algorithms this approach does not need to have all points visible in all views. This paper investigates two reference plane configurations: Finite reference planes defined by four coplanar points and infinite reference planes defined by vanishing points. A further contribution of this paper is the study of critical configurations for configurations with four coplanar points. By simultaneously reconstructing points and views we can exploit the numerical stabilizing effect of having wide spread cameras with large mutual baselines. This is demonstrated by reconstructing the outside and inside (courtyard) of a building on the basis of 35 views in one single Singular Value Decomposition.

  • 35.
    Sharif Razavian, Ali
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Aghazadeh, Omid
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Sullivan, Josephine
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Carlsson, Stefan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Estimating Attention in Exhibitions Using Wearable Cameras2014Inngår i: Pattern Recognition (ICPR), 2014 22nd International Conference on, Stockholm, Sweden: IEEE conference proceedings, 2014, , s. 2691-2696s. 2691-2696Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper demonstrates a system for automatic detection of visual attention and identification of salient items at exhibitions (e.g. museum or an auction). The method is offline and is done on a video captured by a head mounted camera. Towards the estimation of attention, we define the notions of "saliency" and "interestingness" for an exhibition items. Our method is a combination of multiple state of the art techniques from different vision tasks such as tracking, image matching and retrieval. Many experiments are conducted to evaluate multiple aspects of our method. The method has proven to be robust to image blur, occlusion, truncation, and dimness. The experiments shows strong performance for the tasks of matching items, estimating focus frames and detecting salient and interesting items. This can be useful to the commercial vendors and museum curators and help them to understand which items are appealing more to the visitors.

  • 36.
    Sharif Razavian, Ali
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Azizpour, Hossein
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Maki, Atsuto
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Sullivan, Josephine
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Ek, Carl Henrik
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Carlsson, Stefan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Persistent Evidence of Local Image Properties in Generic ConvNets2015Inngår i: Image Analysis: 19th Scandinavian Conference, SCIA 2015, Copenhagen, Denmark, June 15-17, 2015. Proceedings / [ed] Paulsen, Rasmus R., Pedersen, Kim S., Springer Publishing Company, 2015, s. 249-262Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Supervised training of a convolutional network for object classification should make explicit any information related to the class of objects and disregard any auxiliary information associated with the capture of the image or thevariation within the object class. Does this happen in practice? Although this seems to pertain to the very final layers in the network, if we look at earlier layers we find that this is not the case. Surprisingly, strong spatial information is implicit. This paper addresses this, in particular, exploiting the image representation at the first fully connected layer,i.e. the global image descriptor which has been recently shown to be most effective in a range of visual recognition tasks. We empirically demonstrate evidences for the finding in the contexts of four different tasks: 2d landmark detection, 2d object keypoints prediction, estimation of the RGB values of input image, and recovery of semantic label of each pixel. We base our investigation on a simple framework with ridge rigression commonly across these tasks,and show results which all support our insight. Such spatial information can be used for computing correspondence of landmarks to a good accuracy, but should potentially be useful for improving the training of the convolutional nets for classification purposes.

  • 37.
    Sharif Razavian, Ali
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Azizpour, Hossein
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Sullivan, Josephine
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Carlsson, Stefan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    CNN features off-the-shelf: An Astounding Baseline for Recognition2014Inngår i: Proceedings of CVPR 2014, 2014Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Recent results indicate that the generic descriptors extracted from the convolutional neural networks are very powerful. This paper adds to the mounting evidence that this is indeed the case. We report on a series of experiments conducted for different recognition tasks using the publicly available code and model of the OverFeat network which was trained to perform object classification on ILSVRC13. We use features extracted from the OverFeat network as a generic image representation to tackle the diverse range of recognition tasks of object image classification, scene recognition, fine grained recognition, attribute detection and image retrieval applied to a diverse set of datasets. We selected these tasks and datasets as they gradually move further away from the original task and data the OverFeat network was trained to solve. Astonishingly, we report consistent superior results compared to the highly tuned state-of-the-art systems in all the visual classification tasks on various datasets. For instance retrieval it consistently outperforms low memory footprint methods except for sculptures dataset. The results are achieved using a linear SVM classifier (or L2 distance in case of retrieval) applied to a feature representation of size 4096 extracted from a layer in the net. The representations are further modified using simple augmentation techniques e.g. jittering. The results strongly suggest that features obtained from deep learning with convolutional nets should be the primary candidate in most visual recognition tasks.

  • 38.
    Sharif Razavian, Ali
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Sullivan, Josephine
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Maki, Atsuto
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Carlsson, Stefan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    A Baseline for Visual Instance Retrieval with Deep Convolutional Networks2015Konferansepaper (Fagfellevurdert)
  • 39.
    Sullivan, Josephine
    et al.
    KTH, Tidigare Institutioner, Numerisk analys och datalogi, NADA.
    Carlsson, Stefan
    KTH, Tidigare Institutioner, Numerisk analys och datalogi, NADA.
    Recognizing and Tracking Human Action2002Inngår i: COMPUTER VISON - ECCV 2002, PT 1 / [ed] Anders Heyden, Gunnar Sparr, Mads Nielsen and Peter Johansen, Berlin: Springer, 2002, s. 629-644Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Human activity can be described as a sequence of 3D body postures. The traditional approach to recognition and 3D reconstruction of human activity has been to track motion in 3D, mainly using advanced geometric and dynamic models. In this paper we reverse this process. View based activity recognition serves as an input to a human body location tracker with the ultimate goal of 3D reanimation in mind. We demonstrate that specific human actions can be detected from single frame postures in a video sequence. By recognizing the image of a person’s posture as corresponding to a particular key frame from a set of stored key frames, it is possible to map body locations from the key frames to actual frames. This is achieved using a shape matching algorithm based on qualitative similarity that computes point to point correspondence between shapes, together with information about appearance. As the mapping is from fixed key frames, our tracking does not suffer from the problem of having to reinitialise when it gets lost. It is effectively a closed loop. We present experimental results both for recognition and tracking for a sequence of a tennis player.

  • 40.
    Sullivan, Josephine
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Numerisk Analys och Datalogi, NADA.
    Carlsson, Stefan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Tracking and labelling of interacting multiple targets2006Inngår i: COMPUTER VISION - ECCV 2006, PT 3, PROCEEDINGS / [ed] Leonardis, A; Pinz, A, 2006, Vol. 3953, s. 619-632Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Successful multi-target tracking requires solving two problems - localize the targets and label their identity. An isolated target's identity can be unambiguously preserved from one frame to the next. However, for long sequences of many moving targets, like a football game, grouping scenarios will occur in which identity labellings cannot be maintained reliably by using continuity of motion or appearance. This paper describes bow to match targets' identities despite these interactions. Trajectories of when a target is isolated are found. These trajectories end when targets interact and their labellings cannot be maintained. The interactions (merges and splits) of these trajectories form a graph structure. Appropriate feature vectors summarizing particular qualities of each trajectory are extracted. A clustering procedure based on these feature vectors allows the identities of temporally separated trajectories to be matched. Results are shown from a football match captured by a wide screen system giving a full stationary view of the pitch.\

  • 41.
    Sullivan, Josephine
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Danielsson, Oscar
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Carlsson, Stefan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Exploiting Part-Based Models and Edge Boundaries for Object Detection2008Inngår i: Digital Image Computing: Techniques and Applications, DICTA 2008, 2008, s. 199-206Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper explores how to exploit shape information to perform object class recognition. We use a sparse partbased model to describe object categories defined by shape. The sparseness allows the relative spatial relationship between parts to be described simply. It is possible, with this model, to highlight potential locations of the object and its parts in novel images. Subsequently these areas are examined by a more flexible shape model that measures if the image data provides evidence of the existence of boundary/connecting curves between connected hypothesized parts. From these measurements it is possible to construct a very simple cost function which indicates the presence or absence of the object class. The part-based model is designed to decouple variations due to affine warps and other forms of shape deformations. The latter are modeled probabilistically using conditional probability distributions which describe the linear dependencies between the location of a part and a subset of the other parts. These conditional distributions can then be exploited to search efficiently for the instances of the part model in novel images. Results are reported on experiments performed on the ETHZ shape classes database that features heavily cluttered images and large variations in scale.

  • 42. Svedberg, D.
    et al.
    Carlsson, Stefan
    KTH, Tidigare Institutioner                               , Numerisk analys och datalogi, NADA.
    Calibration, pose and novel views from single images of constrained scenes2000Inngår i: Pattern Recognition Letters, ISSN 0167-8655, E-ISSN 1872-7344, Vol. 21, nr 13-14, s. 1125-1133Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    We exploit the common constraint of having a right-angle corner of two rectangular planes in the scene in order to calibrate a perspective projection camera and compute its pose relative to the coordinate system defined by the corner. No metric information about the corner is assumed. The camera is constrained to have its image x- and y-axes to be orthogonal with the same scale factor, which is valid for most real-world cameras. We then reproject the image of the corner to an arbitrary viewpoint. We can also compute the metric properties of the scene to scale. We report experimental results with subjectively acceptable quality. The approach shows the power of exploiting constraints that are abundant in typical architectural scenes.

  • 43.
    Thureson, Johan
    et al.
    KTH, Tidigare Institutioner, Numerisk analys och datalogi, NADA.
    Carlsson, Stefan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Appearance based qualitative image description for object class recognition2004Inngår i: COMPUTER VISION - ECCV 2004, PT 2 / [ed] Pajdla, T; Matas, J, BERLIN: SPRINGER , 2004, Vol. 3022, s. 518-529Konferansepaper (Fagfellevurdert)
    Abstract [en]

    The problem of recognizing classes of objects as opposed to special instances requires methods of comparing images that capture the variation within the class while they discriminate against objects outside the class. We present a simple method for image description based on histograms of qualitative shape indexes computed from the combination of triplets of sampled locations and gradient directions in the image. We demonstrate that this method indeed is able to capture variation within classes of objects and we apply it to the problem of recognizing four different, categories from a large database. Using our descriptor on the whole image, containing varying degrees of background clutter, we obtain results for two of the objects that are superior to the best results published so far for this database. By cropping images manually we demonstrate that our method has a potential to handle also the other objects when supplied with an algorithm for searching the image. We argue that our method, based on qualitative image properties, capture the large range of variation that is typically encountered within an object class. This means that our method can be used on substantially larger patches of images than existing methods based on simpler criteria for evaluating image similarity.

1 - 43 of 43
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf