Endre søk
Begrens søket
1 - 40 of 40
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Treff pr side
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Disputationsdatum (tidligste først)
  • Disputationsdatum (siste først)
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Disputationsdatum (tidligste først)
  • Disputationsdatum (siste først)
Merk
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 1.
    Aghazadeh, Omid
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Azizpour, Hossein
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Sullivan, Josephine
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Carlsson, Stefan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Mixture component identification and learning for visual recognition2012Inngår i: Computer Vision – ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part VI, Springer, 2012, s. 115-128Konferansepaper (Fagfellevurdert)
    Abstract [en]

    The non-linear decision boundary between object and background classes - due to large intra-class variations - needs to be modelled by any classifier wishing to achieve good results. While a mixture of linear classifiers is capable of modelling this non-linearity, learning this mixture from weakly annotated data is non-trivial and is the paper's focus. Our approach is to identify the modes in the distribution of our positive examples by clustering, and to utilize this clustering in a latent SVM formulation to learn the mixture model. The clustering relies on a robust measure of visual similarity which suppresses uninformative clutter by using a novel representation based on the exemplar SVM. This subtle clustering of the data leads to learning better mixture models, as is demonstrated via extensive evaluations on Pascal VOC 2007. The final classifier, using a HOG representation of the global image patch, achieves performance comparable to the state-of-the-art while being more efficient at detection time.

  • 2.
    Aghazadeh, Omid
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Sullivan, Josephine
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Carlsson, Stefan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Multi view registration for novelty/background separation2012Inngår i: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, IEEE Computer Society, 2012, s. 757-764Konferansepaper (Fagfellevurdert)
    Abstract [en]

    We propose a system for the automatic segmentation of novelties from the background in scenarios where multiple images of the same environment are available e.g. obtained by wearable visual cameras. Our method finds the pixels in a query image corresponding to the underlying background environment by comparing it to reference images of the same scene. This is achieved despite the fact that all the images may have different viewpoints, significantly different illumination conditions and contain different objects cars, people, bicycles, etc. occluding the background. We estimate the probability of each pixel, in the query image, belonging to the background by computing its appearance inconsistency to the multiple reference images. We then, produce multiple segmentations of the query image using an iterated graph cuts algorithm, initializing from these estimated probabilities and consecutively combine these segmentations to come up with a final segmentation of the background. Detection of the background in turn highlights the novel pixels. We demonstrate the effectiveness of our approach on a challenging outdoors data set.

  • 3.
    Aghazadeh, Omid
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Sullivan, Josephine
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Carlsson, Stefan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Novelty Detection from an Ego-Centric perspective2011Inngår i: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2011, s. 3297-3304Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper demonstrates a system for the automatic extraction of novelty in images captured from a small video camera attached to a subject's chest, replicating his visual perspective, while performing activities which are repeated daily. Novelty is detected when a (sub)sequence cannot be registered to previously stored sequences captured while performing the same daily activity. Sequence registration is performed by measuring appearance and geometric similarity of individual frames and exploiting the invariant temporal order of the activity. Experimental results demonstrate that this is a robust way to detect novelties induced by variations in the wearer's ego-motion such as stopping and talking to a person. This is an essentially new and generic way of automatically extracting information of interest to the camera wearer and can be used as input to a system for life logging or memory support.

  • 4.
    Azizpour, Hossein
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Razavian, Ali Sharif
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Sullivan, Josephine
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Maki, Atsuto
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Carlsson, Stefan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    From Generic to Specific Deep Representations for Visual Recognition2015Inngår i: Proceedings of CVPR 2015, IEEE conference proceedings, 2015Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Evidence is mounting that ConvNets are the best representation learning method for recognition. In the common scenario, a ConvNet is trained on a large labeled dataset and the feed-forward units activation, at a certain layer of the network, is used as a generic representation of an input image. Recent studies have shown this form of representation to be astoundingly effective for a wide range of recognition tasks. This paper thoroughly investigates the transferability of such representations w.r.t. several factors. It includes parameters for training the network such as its architecture and parameters of feature extraction. We further show that different visual recognition tasks can be categorically ordered based on their distance from the source task. We then show interesting results indicating a clear correlation between the performance of tasks and their distance from the source task conditioned on proposed factors. Furthermore, by optimizing these factors, we achieve stateof-the-art performances on 16 visual recognition tasks.

  • 5.
    Azizpour, Hossein
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Sharif Razavian, Ali
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Sullivan, Josephine
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Maki, Atsuto
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Carlssom, Stefan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Factors of Transferability for a Generic ConvNet Representation2016Inngår i: IEEE Transaction on Pattern Analysis and Machine Intelligence, ISSN 0162-8828, E-ISSN 1939-3539, Vol. 38, nr 9, s. 1790-1802, artikkel-id 7328311Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Evidence is mounting that Convolutional Networks (ConvNets) are the most effective representation learning method for visual recognition tasks. In the common scenario, a ConvNet is trained on a large labeled dataset (source) and the feed-forward units activation of the trained network, at a certain layer of the network, is used as a generic representation of an input image for a task with relatively smaller training set (target). Recent studies have shown this form of representation transfer to be suitable for a wide range of target visual recognition tasks. This paper introduces and investigates several factors affecting the transferability of such representations. It includes parameters for training of the source ConvNet such as its architecture, distribution of the training data, etc. and also the parameters of feature extraction such as layer of the trained ConvNet, dimensionality reduction, etc. Then, by optimizing these factors, we show that significant improvements can be achieved on various (17) visual recognition tasks. We further show that these visual recognition tasks can be categorically ordered based on their similarity to the source task such that a correlation between the performance of tasks and their similarity to the source task w.r.t. the proposed factors is observed.

  • 6.
    Burenius, Magnus
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Sullivan, Josephine
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Carlsson, Stefan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    3D pictorial structures for multiple view articulated pose estimation2013Inngår i: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE Computer Society, 2013, s. 3618-3625Konferansepaper (Fagfellevurdert)
    Abstract [en]

    We consider the problem of automatically estimating the 3D pose of humans from images, taken from multiple calibrated views. We show that it is possible and tractable to extend the pictorial structures framework, popular for 2D pose estimation, to 3D. We discuss how to use this framework to impose view, skeleton, joint angle and intersection constraints in 3D. The 3D pictorial structures are evaluated on multiple view data from a professional football game. The evaluation is focused on computational tractability, but we also demonstrate how a simple 2D part detector can be plugged into the framework.

  • 7.
    Burenius, Magnus
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Sullivan, Josephine
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Carlsson, Stefan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Motion Capture from Dynamic Orthographic Cameras2011Inngår i: 4DMOD - 1st IEEE Workshop on Dynamic Shape Capture and Analysis, 2011Konferansepaper (Fagfellevurdert)
    Abstract [en]

    We present an extension to the scaled orthographic camera model. It deals with dynamic cameras looking at faraway objects. The camera is allowed to change focal lengthand translate and rotate in 3D. The model we derive saysthat this motion can be treated as scaling, translation androtation in a 2D image plane. It is valid if the camera and itstarget move around in two separate regions that are smallcompared to the distance between them.We show two applications of this model to motion capture applications at large distances, i.e. outside a studio,using the affine factorization algorithm. The model is usedto motivate theoretically why the factorization can be carried out in a single batch step, when having both dynamiccameras and a dynamic object. Furthermore, the model isused to motivate how the position of the object can be reconstructed by measuring the virtual 2D motion of the cameras. For testing we use videos from a real football gameand reconstruct the 3D motion of a footballer as he scoresa goal.

  • 8.
    Burenius, Magnus
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Sullivan, Josephine
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Carlsson, Stefan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Halvorsen, Kjartan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Human 3D Motion Computation from a varying Number of Cameras2011Inngår i: Image Analysis, Springer Berlin / Heidelberg , 2011, s. 24-35Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper focuses on how the accuracy of marker-less human motion capture is affected by the number of camera views used. Specifically, we compare the 3D reconstructions calculated from single and multiple cameras. We perform our experiments on data consisting of video from multiple cameras synchronized with ground truth 3D motion, obtained from a motion capture session with a professional footballer. The error is compared for the 3D reconstructions, of diverse motions, estimated using the manually located image joint positions from one, two or three cameras. We also present a new bundle adjustment procedure using regression splines to impose weak prior assumptions about human motion, temporal smoothness and joint angle limits, on the 3D reconstruction. The results show that even under close to ideal circumstances the monocular 3D reconstructions contain visual artifacts not present in the multiple view case, indicating accurate and efficient marker-less human motion capture requires multiple cameras.

  • 9.
    Carlsson, Stefan
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Hayman, Eric
    Sullivan, Josephine
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Method and device for generating wide image sequences2004Patent (Annet (populærvitenskap, debatt, mm))
    Abstract [en]

    The invention relates to a video recording apparatus comprising: a microprocessor (130), a memory means (120) for storing program for generating a set of calibration parameters related to a device having at least two video cameras which are arranged in a predetermined relationship to each other, said parameters being unique for the at least two cameras and their current location as related to the object being recorded; said memory means (120) also storing program for recording of wide image video sequences; read and write memory means (140) for storing data relating to recorded video sequences from at least two video cameras; input means (300) for input of manual input of parameters, input of recorded video sequences, output means (300) for output of a wide image video sequence. The invention also relates to a method for generating a wide image video sequence, said method comprising the steps of generating a set of calibration parameters related to a device having at least two video cameras which are arranged in a predetermined relationship to each other, said parameters being unique for the at least two cameras and their current location as related to the object being recorded; recording synchronously video sequences using each of said at least two video cameras, and generating a wide image video sequence from each of said synchronously recorded video sequences.

  • 10.
    Danielsson, Oscar
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Carlsson, Stefan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Sullivan, Josephine
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Automatic Learning and Extraction of Multi-Local Features2009Inngår i: Proceedings of the IEEE International Conference on Computer Vision, 2009, s. 917-924Konferansepaper (Fagfellevurdert)
    Abstract [en]

    In this paper we introduce a new kind of feature - the multi-local feature, so named as each one is a collection of local features, such as oriented edgels, in a very specific spatial arrangement. A multi-local feature has the ability to capture underlying constant shape properties of exemplars from an object class. Thus it is particularly suited to representing and detecting visual classes that lack distinctive local structures and are mainly defined by their global shape. We present algorithms to automatically learn an ensemble of these features to represent an object class from weakly labelled training images of that class, as well as procedures to detect these features efficiently in novel images. The power of multi-local features is demonstrated by using the ensemble in a simple voting scheme to perform object category detection on a standard database. Despite its simplicity, this scheme yields detection rates matching state-of-the-art object detection systems.

  • 11.
    Danielsson, Oscar
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Carlsson, Stefan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Sullivan, Josephine
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Object Detection using Multi-Local Feature Manifolds2008Inngår i: Proceedings - Digital Image Computing: Techniques and Applications, DICTA 2008, 2008, s. 612-618Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Many object categories are better characterized by the shape of their contour than by local appearance properties like texture or color. Multi-local features are designed in order to capture the global discriminative structure of an object while at the same time avoiding the drawbacks with traditional global descriptors such as sensitivity to irrelevant image properties. The specific structure of multi-local features allows us to generate new feature exemplars by linear combinations which effectively increases the set of stored training exemplars. We demonstrate that a multi-local feature is a good "weak detector" of shape-based object categories and that it can accurately estimate the bounding box of objects in an image. Using just a single multi-local feature descriptor we obtain detection results comparable to those of more complex and elaborate systems. It is our opinion that multi-local features have a great potential as generic object descriptors with very interesting possibilities of feature sharing within and between classes.

  • 12.
    Kazemi, Vahid
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Burenius, Magnus
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Azizpour, Hossein
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Sullivan, Josephine
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Multi-view body part recognition with random forests2013Inngår i: BMVC 2013 - Electronic Proceedings of the British Machine Vision Conference 2013, Bristol, England: British Machine Vision Association , 2013Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper addresses the problem of human pose estimation, given images taken from multiple dynamic but calibrated cameras. We consider solving this task using a part-based model and focus on the part appearance component of such a model. We use a random forest classifier to capture the variation in appearance of body parts in 2D images. The result of these 2D part detectors are then aggregated across views to produce consistent 3D hypotheses for parts. We solve correspondences across views for mirror symmetric parts by introducing a latent variable. We evaluate our part detectors qualitatively and quantitatively on a dataset gathered from a professional football game.

  • 13.
    Kazemi, Vahid
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Josephine, Sullivan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    One Millisecond Face Alignment with an Ensemble of Regression Trees2014Inngår i: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE Computer Society, 2014, s. 1867-1874Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper addresses the problem of Face Alignment for a single image. We show how an ensemble of regression trees can be used to estimate the face's landmark positions directly from a sparse subset of pixel intensities, achieving super-realtime performance with high quality predictions. We present a general framework based on gradient boosting for learning an ensemble of regression trees that optimizes the sum of square error loss and naturally handles missing or partially labelled data. We show how using appropriate priors exploiting the structure of image data helps with efficient feature selection. Different regularization strategies and its importance to combat overfitting are also investigated. In addition, we analyse the effect of the quantity of training data on the accuracy of the predictions and explore the effect of data augmentation using synthesized data.

  • 14.
    Kazemi, Vahid
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Sullivan, Josephine
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Face Alignment with Part-Based Modeling2011Inngår i: BMVC 2011 - Proceedings of the British Machine Vision Conference 2011 / [ed] Hoey, Jesse and McKenna, Stephen and Trucco, Emanuele, UK: British Machine Vision Association, BMVA , 2011, s. 27.1-27.10Konferansepaper (Fagfellevurdert)
    Abstract [en]

    We propose a new method for face alignment with part-based modeling. This method is competitive in terms of precision with existing methods such as Active Appearance Models, but is more robust and has a superior generalization ability due to its part-based nature. A variation of the Histogram of Oriented Gradients descriptor is used to model the appearance of each part and the shape information is represented with a set of landmark points around the major facial features. Multiple linear regression models are learnt to estimate the position of the landmarks from the appearance of each part. We verify our algorithm with a set of experiments on human faces and these show the competitive performance of our method compared to existing methods.

  • 15.
    Kazemi, Vahid
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Sullivan, Josephine
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Using Richer Models for Articulated Pose Estimation of Footballers2012Inngår i: Proceedings British Machine Vision Conference 2012., 2012, s. 6.1-6.10Konferansepaper (Fagfellevurdert)
    Abstract [en]

    We present a fully automatic procedure for reconstructing the pose of a person in 3Dfrom images taken from multiple views. We demonstrate a novel approach for learningmore complex models using SVM-Rank, to reorder a set of high scoring configurations.The new model in many cases can resolve the problem of double counting of limbswhich happens often in the pictorial structure based models. We address the problemof flipping ambiguity to find the correct correspondences of 2D predictions across allviews. We obtain improvements for 2D prediction over the state of art methods on ourdataset. We show that the results in many cases are good enough for a fully automatic3D reconstruction with uncalibrated cameras.

  • 16.
    Kobetski, Miroslav
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Sullivan, Josephine
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Apprenticeship learning: Transfer of knowledge via dataset augmentation2013Inngår i: Image Analysis: 18th Scandinavian Conference, SCIA 2013, Espoo, Finland, June 17-20, 2013. Proceedings, Springer, 2013, s. 432-443Konferansepaper (Fagfellevurdert)
    Abstract [en]

    In visual category recognition there is often a trade-off between fast and powerful classifiers. Complex models often have superior performance to simple ones but are computationally too expensive for many applications. At the same time the performance of simple classifiers is not necessarily limited only by their flexibility but also by the amount of labelled data available for training. We propose a semi-supervised wrapper algorithm named apprenticeship learning, which leverages the strength of slow but powerful classification methods to improve the performance of simpler methods. The powerful classifier parses a large pool of unlabelled data, labelling positive examples to extend the dataset of the simple classifier. We demonstrate apprenticeship learning and its effectiveness by performing experiments on the VOC2007 dataset - one experiment improving detection performance on VOC2007, and one domain adaptation experiment, where the VOC2007 classifier is adapted to a new dataset, collected using a GoPro camera.

  • 17.
    Kobetski, Miroslav
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Sullivan, Josephine
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Discriminative tree-based feature mapping2013Inngår i: BMVC 2013 - Electronic Proceedings of the British Machine Vision Conference 2013, British Machine Vision Association, BMVA , 2013Konferansepaper (Fagfellevurdert)
    Abstract [en]

    For object classification and detection, the algorithm pipeline often involves classifying feature vectors extracted from image patches. Existing features such as HOG, fail to map the image patches into a space where a linear hyperplane is suitable for separating the classes, while many non-linear classification methods are too expensive for many tasks. We propose a sparse tree-based mapping method that learns a mapping of the feature vector to a space where a linear hyperplane can better separate negative and positive examples. The learned mapping function Φ(x) results in significant improvement for image patch classification with HOG and LBP-features over other feature mapping methods on VOC2007 and INRIAPerson datasets.

  • 18.
    Kobetski, Miroslav
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Sullivan, Josephine
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Improved boosting performance by exclusion of ambiguous positive examples2013Inngår i: ICPRAM 2013 - Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods, 2013, s. 11-21Konferansepaper (Fagfellevurdert)
    Abstract [en]

    In visual object class recognition it is difficult to densely sample the set of positive examples. Therefore, frequently there will be areas of the feature space that are sparsely populated, in which uncommon examples are hard to disambiguate from surrounding negatives without overfitting. Boosting in particular struggles to learn optimal decision boundaries in the presence of such hard and ambiguous examples. We propose a two-pass dataset pruning method for identifying ambiguous examples and subjecting them to an exclusion function, in order to obtain more optimal decision boundaries for existing boosting algorithms. We also provide an experimental comparison of different boosting algorithms on the VOC2007 dataset, training them with and without our proposed extension. Using our exclusion extension improves the performance of all the tested boosting algorithms except TangentBoost, without adding any additional test-time cost. In our experiments LogitBoost performs best overall and is also significantly improved by our extension. Our results also suggest that outlier exclusion is complementary to positive jittering and hard negative mining.

  • 19.
    Kobetski, Miroslav
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Sullivan, Josephine
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Improved boosting performance by explicit handling of ambiguous positive examples2015Inngår i: Pattern Recognition: Applications and Methods, Springer Berlin/Heidelberg, 2015, s. 17-37Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Visual classes naturally have ambiguous examples, that are different depending on feature and classifier and are hard to disambiguate from surrounding negatives without overfitting. Boosting in particular tends to overfit to such hard and ambiguous examples, due to its flexibility and typically aggressive loss functions. We propose a two-pass learning method for identifying ambiguous examples and relearning, either subjecting them to an exclusion function or using them in a later stage of an inverted cascade. We provide an experimental comparison of different boosting algorithms on the VOC2007 dataset, training them with and without our proposed extension. Using our exclusion extension improves the performance of almost all of the tested boosting algorithms, without adding any additional test-time cost. Our proposed inverted cascade adds some test-time cost but gives additional improvements in performance. Our results also suggest that outlier exclusion is complementary to positive jittering and hard negative mining.

  • 20.
    Launila, Andreas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Sullivan, Josephine
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Contextual Features for Head Pose Estimation in Football Games2010Inngår i: 20th International Conference on Pattern Recognition, ICPR 2010, IEEE conference proceedings, 2010, s. 340-343Konferansepaper (Fagfellevurdert)
    Abstract [en]

    We explore the benefits of using contextual features for head pose estimation in football games. Contextual features are derived from knowledge of the position of all players and combined with image based features derived from low-resolution footage. Using feature selection and combination techniques, we show that contextual features can aid head pose estimation in football games and potentially be an important complement to the image based features traditionally used.

  • 21.
    Loy, Gareth
    et al.
    KTH, Tidigare Institutioner, Numerisk analys och datalogi, NADA.
    Eriksson, Martin
    KTH, Tidigare Institutioner, Numerisk analys och datalogi, NADA.
    Sullivan, Josephine
    KTH, Tidigare Institutioner, Numerisk analys och datalogi, NADA.
    Carlsson, Stefan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Monocular 3D reconstruction of human motion in long action sequences2004Inngår i: COMPUTER VISION: ECCV 2004, PT 4, BERLIN: SPRINGER , 2004, Vol. 2034, s. 442-455Konferansepaper (Fagfellevurdert)
    Abstract [en]

    A novel algorithm is presented for the 3D reconstruction of human action in long (> 30 second) monocular image sequences. A sequence is represented by a small set of automatically found representative keyframes. The skeletal joint positions are manually located in each keyframe and mapped to all other frames in the sequence. For each keyframe a 3D key pose is created, and interpolation between these 3D body poses, together with the incorporation of limb length and symmetry constraints, provides a smooth initial approximation of the 3D motion. This is then fitted to the image data to generate a realistic 3D reconstruction. The degree of manual input required is controlled by the diversity of the sequence's content. Sports' footage is ideally suited to this approach as it frequently contains a limited number of repeated actions. Our method is demonstrated on a long (36 second) sequence of a woman playing tennis filmed with a non-stationary camera. This sequence required manual initialisation on < 1.5% of the frames, and demonstrates that the system can deal with very rapid motion, severe self-occlusions, motion blur and clutter occurring over several concurrent frames. The monocular 3D reconstruction is verified by synthesising a view from the perspective of a 'ground truth' reference camera, and the result is seen to provide a qualitatively accurate 3D reconstruction of the motion.

  • 22.
    Maboudi Afkham, Heydar
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Carlsson, Stefan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Sullivan, Josephine
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Improving feature level likelihoods using cloud features2012Inngår i: ICPRAM - Proc. Int. Conf. Pattern Recogn. Appl. Methods, 2012, s. 431-437Konferansepaper (Fagfellevurdert)
    Abstract [en]

    The performance of many computer vision methods depends on the quality of the local features extracted from the images. For most methods the local features are extracted independently of the task and they remain constant through the whole process. To make features more dynamic and give models a choice in the features they can use, this work introduces a set of intermediate features referred as cloud features. These features take advantage of part-based models at the feature level by combining each extracted local feature with its close by local feature creating a cloud of different representations for each local features. These representations capture the local variations around the local feature. At classification time, the best possible representation is pulled out of the cloud and used in the calculations. This selection is done based on several latent variables encoded within the cloud features. The goal of this paper is to test how the cloud features can improve the feature level likelihoods. The focus of the experiments of this paper is on feature level inference and showing how replacing single features with equivalent cloud features improves the likelihoods obtained from them. The experiments of this paper are conducted on several classes of MSRCv1 dataset.

  • 23.
    Nillius, Peter
    et al.
    KTH, Skolan för teknikvetenskap (SCI), Fysik, Medicinsk avbildning.
    Sullivan, Josephine
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Argyros, Antonis
    Shading models for illumination and reflectance invariant shape detectors2008Inngår i: 2008 IEEE Conference On Computer Vision And Pattern Recognition: Vols 1-12, 2008, s. 3353-3360Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Many objects have smooth surfaces of a fairly uniform color, thereby exhibiting shading patterns that reveal information about its shape, an important clue to the nature of the object. This papers explores extracting this information from images, by creating shape detectors based on shading. Recent work has derived low-dimensional models of shading that can handle realistic unknown lighting conditions and surface reflectance properties. We extend this theory by also incorporating variations in the surface shape. In doing so it enables the creation of very general models for the 2D appearance of objects, not only coping with variations in illumination and BRDF but also in shape alterations such as small scale and pose changes. Using this framework we propose a scheme to build shading models that can be used for shape detection in a bottom up fashion without any a priori knowledge about the scene. From the developed theory we construct detectors for two basic shape primitives, spheres and cylinders. Their performance is evaluated by extensive synthetic experiments as well as experiments on real images.

  • 24.
    Nillius, Peter
    et al.
    Institute of Computer Science - FORTH.
    Sullivan, Josephine
    KTH, Skolan för datavetenskap och kommunikation (CSC), Numerisk Analys och Datalogi, NADA.
    Carlsson, Stefan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Numerisk Analys och Datalogi, NADA.
    Multi-Target Tracking -- Linking Identities using Bayesian Network Inference2006Inngår i: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Los Alamitos, CA, USA: IEEE Computer Society, 2006, s. 2187-2194Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Multi-target tracking requires locating the targets and labeling their identities. The latter is a challenge when many targets, with indistinct appearances, frequently occlude one another, as in football and surveillance tracking. We present an approach to solving this labeling problem.

    When isolated, a target can be tracked and its identity maintained. While, if targets interact this is not always the case. This paper assumes a track graph exists, denoting when targets are isolated and describing how they interact. Measures of similarity between isolated tracks are defined. The goal is to associate the identities of the isolated tracks, by exploiting the graph constraints and similarity measures.

    We formulate this as a Bayesian network inference problem, allowing us to use standard message propagation to find the most probable set of paths in an efficient way. The high complexity inevitable in large problems is gracefully reduced by removing dependency links between tracks. We apply the method to a 10 min sequence of an international football game and compare results to ground truth.

  • 25.
    Razavian, Ali Sharif
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Robotik, perception och lärande, RPL.
    Sullivan, Josephine
    KTH.
    Carlsson, Stefan
    KTH.
    Maki, Atsuto
    KTH, Skolan för datavetenskap och kommunikation (CSC), Robotik, perception och lärande, RPL.
    Visual instance retrieval with deep convolutional networks2016Inngår i: ITE Transactions on Media Technology and Applications, ISSN 2186-7364, Vol. 4, nr 3, s. 251-258Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    This paper provides an extensive study on the availability of image representations based on convolutional networks (ConvNets) for the task of visual instance retrieval. Besides the choice of convolutional layers, we present an efficient pipeline exploiting multi-scale schemes to extract local features, in particular, by taking geometric invariance into explicit account, i.e. positions, scales and spatial consistency. In our experiments using five standard image retrieval datasets, we demonstrate that generic ConvNet image representations can outperform other state-of-the-art methods if they are extracted appropriately.

  • 26.
    Sharif Razavian, Ali
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Aghazadeh, Omid
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Sullivan, Josephine
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Carlsson, Stefan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Estimating Attention in Exhibitions Using Wearable Cameras2014Inngår i: Pattern Recognition (ICPR), 2014 22nd International Conference on, Stockholm, Sweden: IEEE conference proceedings, 2014, , s. 2691-2696s. 2691-2696Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper demonstrates a system for automatic detection of visual attention and identification of salient items at exhibitions (e.g. museum or an auction). The method is offline and is done on a video captured by a head mounted camera. Towards the estimation of attention, we define the notions of "saliency" and "interestingness" for an exhibition items. Our method is a combination of multiple state of the art techniques from different vision tasks such as tracking, image matching and retrieval. Many experiments are conducted to evaluate multiple aspects of our method. The method has proven to be robust to image blur, occlusion, truncation, and dimness. The experiments shows strong performance for the tasks of matching items, estimating focus frames and detecting salient and interesting items. This can be useful to the commercial vendors and museum curators and help them to understand which items are appealing more to the visitors.

  • 27.
    Sharif Razavian, Ali
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Azizpour, Hossein
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Maki, Atsuto
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Sullivan, Josephine
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Ek, Carl Henrik
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Carlsson, Stefan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Persistent Evidence of Local Image Properties in Generic ConvNets2015Inngår i: Image Analysis: 19th Scandinavian Conference, SCIA 2015, Copenhagen, Denmark, June 15-17, 2015. Proceedings / [ed] Paulsen, Rasmus R., Pedersen, Kim S., Springer Publishing Company, 2015, s. 249-262Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Supervised training of a convolutional network for object classification should make explicit any information related to the class of objects and disregard any auxiliary information associated with the capture of the image or thevariation within the object class. Does this happen in practice? Although this seems to pertain to the very final layers in the network, if we look at earlier layers we find that this is not the case. Surprisingly, strong spatial information is implicit. This paper addresses this, in particular, exploiting the image representation at the first fully connected layer,i.e. the global image descriptor which has been recently shown to be most effective in a range of visual recognition tasks. We empirically demonstrate evidences for the finding in the contexts of four different tasks: 2d landmark detection, 2d object keypoints prediction, estimation of the RGB values of input image, and recovery of semantic label of each pixel. We base our investigation on a simple framework with ridge rigression commonly across these tasks,and show results which all support our insight. Such spatial information can be used for computing correspondence of landmarks to a good accuracy, but should potentially be useful for improving the training of the convolutional nets for classification purposes.

  • 28.
    Sharif Razavian, Ali
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Azizpour, Hossein
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Sullivan, Josephine
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Carlsson, Stefan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    CNN features off-the-shelf: An Astounding Baseline for Recognition2014Inngår i: Proceedings of CVPR 2014, 2014Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Recent results indicate that the generic descriptors extracted from the convolutional neural networks are very powerful. This paper adds to the mounting evidence that this is indeed the case. We report on a series of experiments conducted for different recognition tasks using the publicly available code and model of the OverFeat network which was trained to perform object classification on ILSVRC13. We use features extracted from the OverFeat network as a generic image representation to tackle the diverse range of recognition tasks of object image classification, scene recognition, fine grained recognition, attribute detection and image retrieval applied to a diverse set of datasets. We selected these tasks and datasets as they gradually move further away from the original task and data the OverFeat network was trained to solve. Astonishingly, we report consistent superior results compared to the highly tuned state-of-the-art systems in all the visual classification tasks on various datasets. For instance retrieval it consistently outperforms low memory footprint methods except for sculptures dataset. The results are achieved using a linear SVM classifier (or L2 distance in case of retrieval) applied to a feature representation of size 4096 extracted from a layer in the net. The representations are further modified using simple augmentation techniques e.g. jittering. The results strongly suggest that features obtained from deep learning with convolutional nets should be the primary candidate in most visual recognition tasks.

  • 29.
    Sharif Razavian, Ali
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Sullivan, Josephine
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Maki, Atsuto
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Carlsson, Stefan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    A Baseline for Visual Instance Retrieval with Deep Convolutional Networks2015Konferansepaper (Fagfellevurdert)
  • 30.
    Sullivan, Josephine
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Blake, A
    Isard, M
    MacCormick, J
    Object Localisation by Bayesian Correlation1999Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Maximisation of cross-correlation is a commonly used principle for intensity-based object localization that gives a single estimate of location. However, to facilitate sequential inference (eg over time or scale) and to allow the representation of ambiguity, it is desirable to represent an entire probability distribution for object location. Although the cross-correlation itself (or some function of it) has sometimes been treated as a probability distribution, this is not generally justifiable.

    Bayesian correlation achieves a consistent probabilistic treatment by combining several developments. The first is the interpretation of correlation matching functions in probabilistic terms, as observation likelihoods. Second, probability distributions of filter-bank responses are learned from training examples. Inescapably, response-learning also demands statistical modelling of background intensities, and there are links here with image coding and Independent Component Analysis. Lastly, multi-scale processing is achieved, in a Bayesian context, by means of a new algorithm, "layered sampling", for which asymptotic properties are derived.

  • 31.
    Sullivan, Josephine
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Blake, A.
    Isard, M.
    MacCormik, J.
    Bayesian Correlation2001Inngår i: International Journal of Computer Vision, ISSN 0920-5691, E-ISSN 1573-1405, Vol. 44, nr 2, s. 111-135Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Maximisation of cross-correlation is a commonly used principle for intensity-based object localization that gives a single estimate of location. However, to facilitate sequential inference (eg over time or scale) and to allow the representation of ambiguity, it is desirable to represent an entire probability distribution for object location. Although the cross-correlation itself (or some function of it) has sometimes been treated as a probability distribution, this is not generally justifiable.

    Bayesian correlation achieves a consistent probabilistic treatment by combining several developments. The first is the interpretation of correlation matching functions in probabilistic terms, as observation likelihoods. Second, probability distributions of filter-bank responses are learned from training examples. Inescapably, response-learning also demands statistical modelling of background intensities, and there are links here with image coding and Independent Component Analysis. Lastly, multi-scale processing is achieved, in a Bayesian context, by means of a new algorithm, "layered sampling", for which asymptotic properties are derived

  • 32.
    Sullivan, Josephine
    et al.
    KTH, Tidigare Institutioner, Numerisk analys och datalogi, NADA.
    Carlsson, Stefan
    KTH, Tidigare Institutioner, Numerisk analys och datalogi, NADA.
    Recognizing and Tracking Human Action2002Inngår i: COMPUTER VISON - ECCV 2002, PT 1 / [ed] Anders Heyden, Gunnar Sparr, Mads Nielsen and Peter Johansen, Berlin: Springer, 2002, s. 629-644Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Human activity can be described as a sequence of 3D body postures. The traditional approach to recognition and 3D reconstruction of human activity has been to track motion in 3D, mainly using advanced geometric and dynamic models. In this paper we reverse this process. View based activity recognition serves as an input to a human body location tracker with the ultimate goal of 3D reanimation in mind. We demonstrate that specific human actions can be detected from single frame postures in a video sequence. By recognizing the image of a person’s posture as corresponding to a particular key frame from a set of stored key frames, it is possible to map body locations from the key frames to actual frames. This is achieved using a shape matching algorithm based on qualitative similarity that computes point to point correspondence between shapes, together with information about appearance. As the mapping is from fixed key frames, our tracking does not suffer from the problem of having to reinitialise when it gets lost. It is effectively a closed loop. We present experimental results both for recognition and tracking for a sequence of a tennis player.

  • 33.
    Sullivan, Josephine
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Numerisk Analys och Datalogi, NADA.
    Carlsson, Stefan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Tracking and labelling of interacting multiple targets2006Inngår i: COMPUTER VISION - ECCV 2006, PT 3, PROCEEDINGS / [ed] Leonardis, A; Pinz, A, 2006, Vol. 3953, s. 619-632Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Successful multi-target tracking requires solving two problems - localize the targets and label their identity. An isolated target's identity can be unambiguously preserved from one frame to the next. However, for long sequences of many moving targets, like a football game, grouping scenarios will occur in which identity labellings cannot be maintained reliably by using continuity of motion or appearance. This paper describes bow to match targets' identities despite these interactions. Trajectories of when a target is isolated are found. These trajectories end when targets interact and their labellings cannot be maintained. The interactions (merges and splits) of these trajectories form a graph structure. Appropriate feature vectors summarizing particular qualities of each trajectory are extracted. A clustering procedure based on these feature vectors allows the identities of temporally separated trajectories to be matched. Results are shown from a football match captured by a wide screen system giving a full stationary view of the pitch.\

  • 34.
    Sullivan, Josephine
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Danielsson, Oscar
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Carlsson, Stefan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Exploiting Part-Based Models and Edge Boundaries for Object Detection2008Inngår i: Digital Image Computing: Techniques and Applications, DICTA 2008, 2008, s. 199-206Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper explores how to exploit shape information to perform object class recognition. We use a sparse partbased model to describe object categories defined by shape. The sparseness allows the relative spatial relationship between parts to be described simply. It is possible, with this model, to highlight potential locations of the object and its parts in novel images. Subsequently these areas are examined by a more flexible shape model that measures if the image data provides evidence of the existence of boundary/connecting curves between connected hypothesized parts. From these measurements it is possible to construct a very simple cost function which indicates the presence or absence of the object class. The part-based model is designed to decouple variations due to affine warps and other forms of shape deformations. The latter are modeled probabilistically using conditional probability distributions which describe the linear dependencies between the location of a part and a subset of the other parts. These conditional distributions can then be exploited to search efficiently for the instances of the part model in novel images. Results are reported on experiments performed on the ETHZ shape classes database that features heavily cluttered images and large variations in scale.

  • 35.
    Sullivan, Josephine
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Rittscher, J
    Guiding Random Samples by Deterministic Search2001Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Among the algorithms developed towards the goal of robust and efficient tracking, two approaches which stand out due to their success are those based on particle filtering and variational approaches. The Bayesian approach led to the development of the particle filter, which performs a random search guided by a stochastic motion model. On the other hand, localising an object can be based on minimising a cost function. This minimum can be found using variational methods. The search paradigms differ in these two methods. One is stochastic and model-driven while the other is deterministic and data-driven. This paper presents a new algorithm to incorporate the strengths of both approaches into one consistent framework. To allow this fusion a smooth, wide likelihood function is constructed, based on a sum-of-squares distance measure and an appropriate sampling scheme is introduced. Based on low-level information this scheme automatically mixes the two methods of search and adapts the computational demands of the algorithm to the difficulty of the problem at hand. The ability to effectively track complex motions without the need for finely tuned motion models is demonstrated

  • 36.
    Sullivan, Josephine
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Rittscher, J
    Statistical Foreground Modelling for Object Localisation2000Konferansepaper (Fagfellevurdert)
    Abstract [en]

    A Bayesian approach to object localisation is feasible given suitable likelihood models for image observations. Such a likelihood involves statistical modelling - and learning - both of the object foreground and of the scene background. Statistical background models are already quite well understood. Here we propose a “conditioned likelihood” model for the foreground, conditioned on variations both in object appearance and illumination. Its effectiveness in localising a variety of objects is demonstrated.

  • 37.
    Zhong, Yang
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Medieteknik och interaktionsdesign, MID.
    Sullivan, Josephine
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Li, Haibo
    KTH, Skolan för datavetenskap och kommunikation (CSC), Medieteknik och interaktionsdesign, MID.
    Face Attribute Prediction Using Off-The-Shelf CNN Features2016Inngår i: 2016 International Conference on Biometrics, ICB 2016, Institute of Electrical and Electronics Engineers (IEEE), 2016, artikkel-id 7550092Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Predicting attributes from face images in the wild is a challenging computer vision problem. To automatically describe face attributes from face containing images, traditionally one needs to cascade three technical blocks — face localization, facial descriptor construction, and attribute classification — in a pipeline. As a typical classification problem, face attribute preiction has been addressed using deep learning. Current state-of-the-art performance was achieved by using two cascaded Convolutional Neural Networks (CNNs), which were specifically trained to learn face localization and attribute description. In this paper, we experiment with an alternative way of employing the power of deep representations from CNNs. Combining with conventional face localization techniques, we use off-the-shelf architectures trained for face recognition to build facial descriptors. Recognizing that the describable face attributes are diverse, our face descriptors are constructed from different levels of the CNNs for different attributes to best facilitate face attribute prediction. Experiments on two large datasets, LFWA and CelebA, show that our approach is entirely comparable to the state-of-the-art. Our findings not only demonstrate an efficient face attribute prediction approach, but also raise an important question: how to leverage the power of off-the-shelf CNN representations for novel tasks

  • 38.
    Zhong, Yang
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Medieteknik och interaktionsdesign, MID.
    Sullivan, Josephine
    KTH, Skolan för datavetenskap och kommunikation (CSC), Robotik, perception och lärande, RPL.
    Li, Haibo
    KTH, Skolan för datavetenskap och kommunikation (CSC), Medieteknik och interaktionsdesign, MID.
    Leveraging Mid-level Deep Representations for Prediction Face Attributes in the Wild2016Inngår i: 2016 IEEE International Conference on Image Processing (ICIP), Institute of Electrical and Electronics Engineers (IEEE), 2016Konferansepaper (Fagfellevurdert)
  • 39.
    Zhong, Yang
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC).
    Sullivan, Josephine
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Li, Haibo
    KTH, Skolan för datavetenskap och kommunikation (CSC), Medieteknik och interaktionsdesign, MID.
    Transferring from Face Recognition to Face Attribute Prediction through Adaptive Selection of Off-the-shelf CNN RepresentationsManuskript (preprint) (Annet vitenskapelig)
  • 40.
    Zhong, Yang
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Robotik, perception och lärande, RPL.
    Sullivan, Josephine
    KTH, Skolan för datavetenskap och kommunikation (CSC), Robotik, perception och lärande, RPL.
    Li, Haibo
    KTH, Skolan för datavetenskap och kommunikation (CSC), Medieteknik och interaktionsdesign, MID.
    Transferring from face recognition to face attribute prediction through adaptive selection of off-the-shelf CNN representations2016Inngår i: 2016 23rd International Conference on Pattern Recognition, ICPR 2016, Institute of Electrical and Electronics Engineers (IEEE), 2016, s. 2264-2269, artikkel-id 7899973Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper addresses the problem of transferring CNNs pre-trained for face recognition to a face attribute prediction task. To transfer an off-the-shelf CNN to a novel task, a typical solution is to fine-tune the network towards the novel task. As demonstrated in the state-of-the-art face attribute prediction approach, fine-tuning the high-level CNN hidden layer by using labeled attribute data leads to significant performance improvements. In this paper, however, we tackle the same problem but through a different approach. Rather than using an end-to-end network, we select face descriptors from off-the-shelf hierarchical CNN representations for recognizing different attributes. Through such an adaptive representation selection, even without any fine-tuning, our results still outperform the state-of-the-art face attribute prediction approach on the latest large-scale dataset for an error rate reduction of more than 20%. Moreover, by using intensive empirical probes, we have identified several key factors that are significant for achieving promising face attribute prediction performance. These results attempt to gain and update our understandings of the nature of CNN features and how they can be better applied to the transferred novel tasks.

1 - 40 of 40
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf