kth.sePublikationer
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Correspondence Estimation in Human Face and Posture Images
KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP. (Computer Vision)ORCID-id: 0000-0003-4181-2753
2014 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

Many computer vision tasks such as object detection, pose estimation,and alignment are directly related to the estimation of correspondences overinstances of an object class. Other tasks such as image classification andverification if not completely solved can largely benefit from correspondenceestimation. This thesis presents practical approaches for tackling the corre-spondence estimation problem with an emphasis on deformable objects.Different methods presented in this thesis greatly vary in details but theyall use a combination of generative and discriminative modeling to estimatethe correspondences from input images in an efficient manner. While themethods described in this work are generic and can be applied to any object,two classes of objects of high importance namely human body and faces arethe subjects of our experimentations.When dealing with human body, we are mostly interested in estimating asparse set of landmarks – specifically we are interested in locating the bodyjoints. We use pictorial structures to model the articulation of the body partsgeneratively and learn efficient discriminative models to localize the parts inthe image. This is a common approach explored by many previous works. Wefurther extend this hybrid approach by introducing higher order terms to dealwith the double-counting problem and provide an algorithm for solving theresulting non-convex problem efficiently. In another work we explore the areaof multi-view pose estimation where we have multiple calibrated cameras andwe are interested in determining the pose of a person in 3D by aggregating2D information. This is done efficiently by discretizing the 3D search spaceand use the 3D pictorial structures model to perform the inference.In contrast to the human body, faces have a much more rigid structureand it is relatively easy to detect the major parts of the face such as eyes,nose and mouth, but performing dense correspondence estimation on facesunder various poses and lighting conditions is still challenging. In a first workwe deal with this variation by partitioning the face into multiple parts andlearning separate regressors for each part. In another work we take a fullydiscriminative approach and learn a global regressor from image to landmarksbut to deal with insufficiency of training data we augment it by a large numberof synthetic images. While we have shown great performance on the standardface datasets for performing correspondence estimation, in many scenariosthe RGB signal gets distorted as a result of poor lighting conditions andbecomes almost unusable. This problem is addressed in another work wherewe explore use of depth signal for dense correspondence estimation. Hereagain a hybrid generative/discriminative approach is used to perform accuratecorrespondence estimation in real-time.

Ort, förlag, år, upplaga, sidor
Stockholm, Sweden: KTH Royal Institute of Technology, 2014. , s. vii, 32
Serie
TRITA-CSC-A, ISSN 1653-5723 ; 2014:14
Nationell ämneskategori
Datorseende och robotik (autonoma system)
Forskningsämne
Datalogi
Identifikatorer
URN: urn:nbn:se:kth:diva-150115ISBN: 978-91-7595-261-1 (tryckt)OAI: oai:DiVA.org:kth-150115DiVA, id: diva2:745607
Disputation
2014-10-10, Kollegiesalen, Brinellvägen 8, KTH, Stockholm, 10:00 (Engelska)
Opponent
Handledare
Anmärkning

QC 20140919

Tillgänglig från: 2014-09-19 Skapad: 2014-08-29 Senast uppdaterad: 2022-06-23Bibliografiskt granskad
Delarbeten
1. Face Alignment with Part-Based Modeling
Öppna denna publikation i ny flik eller fönster >>Face Alignment with Part-Based Modeling
2011 (Engelska)Ingår i: BMVC 2011 - Proceedings of the British Machine Vision Conference 2011 / [ed] Hoey, Jesse and McKenna, Stephen and Trucco, Emanuele, UK: British Machine Vision Association, BMVA , 2011, s. 27.1-27.10Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

We propose a new method for face alignment with part-based modeling. This method is competitive in terms of precision with existing methods such as Active Appearance Models, but is more robust and has a superior generalization ability due to its part-based nature. A variation of the Histogram of Oriented Gradients descriptor is used to model the appearance of each part and the shape information is represented with a set of landmark points around the major facial features. Multiple linear regression models are learnt to estimate the position of the landmarks from the appearance of each part. We verify our algorithm with a set of experiments on human faces and these show the competitive performance of our method compared to existing methods.

Ort, förlag, år, upplaga, sidor
UK: British Machine Vision Association, BMVA, 2011
Nyckelord
Linear regression, Active appearance models, Competitive performance, Facial feature, Generalization ability, Histogram of oriented gradients, Multiple linear regression models, Part-based models, Shape information
Nationell ämneskategori
Datorseende och robotik (autonoma system)
Identifikatorer
urn:nbn:se:kth:diva-45452 (URN)10.5244/C.25.27 (DOI)000346360200030 ()2-s2.0-84898413537 (Scopus ID)1-901725-43-X (ISBN)
Konferens
2011 22nd British Machine Vision Conference, BMVC 2011, Dundee, United Kingdom, 29 August 2011 through 2 September 2011
Anmärkning

QC 20111116

Tillgänglig från: 2011-10-28 Skapad: 2011-10-28 Senast uppdaterad: 2024-03-15Bibliografiskt granskad
2. Using Richer Models for Articulated Pose Estimation of Footballers
Öppna denna publikation i ny flik eller fönster >>Using Richer Models for Articulated Pose Estimation of Footballers
2012 (Engelska)Ingår i: Proceedings British Machine Vision Conference 2012., 2012, s. 6.1-6.10Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

We present a fully automatic procedure for reconstructing the pose of a person in 3Dfrom images taken from multiple views. We demonstrate a novel approach for learningmore complex models using SVM-Rank, to reorder a set of high scoring configurations.The new model in many cases can resolve the problem of double counting of limbswhich happens often in the pictorial structure based models. We address the problemof flipping ambiguity to find the correct correspondences of 2D predictions across allviews. We obtain improvements for 2D prediction over the state of art methods on ourdataset. We show that the results in many cases are good enough for a fully automatic3D reconstruction with uncalibrated cameras.

Nationell ämneskategori
Datorseende och robotik (autonoma system)
Identifikatorer
urn:nbn:se:kth:diva-104558 (URN)10.5244/C.26.6 (DOI)000346356200003 ()2-s2.0-84898491407 (Scopus ID)1-901725-46-4 (ISBN)
Konferens
British Machine Vision Conference, Surrey,3-7 September 2012
Anmärkning

QC 20121114

Tillgänglig från: 2012-11-14 Skapad: 2012-11-05 Senast uppdaterad: 2024-03-15Bibliografiskt granskad
3. Multi-view body part recognition with random forests
Öppna denna publikation i ny flik eller fönster >>Multi-view body part recognition with random forests
2013 (Engelska)Ingår i: BMVC 2013 - Electronic Proceedings of the British Machine Vision Conference 2013, Bristol, England: British Machine Vision Association , 2013Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

This paper addresses the problem of human pose estimation, given images taken from multiple dynamic but calibrated cameras. We consider solving this task using a part-based model and focus on the part appearance component of such a model. We use a random forest classifier to capture the variation in appearance of body parts in 2D images. The result of these 2D part detectors are then aggregated across views to produce consistent 3D hypotheses for parts. We solve correspondences across views for mirror symmetric parts by introducing a latent variable. We evaluate our part detectors qualitatively and quantitatively on a dataset gathered from a professional football game.

Ort, förlag, år, upplaga, sidor
Bristol, England: British Machine Vision Association, 2013
Nyckelord
Data processing, Decision trees, Motion estimation, Body part recognition, Calibrated cameras, Football game, Human pose estimations, Latent variable, Part-based models, Random forest classifier, Random forests
Nationell ämneskategori
Datorseende och robotik (autonoma system)
Identifikatorer
urn:nbn:se:kth:diva-134190 (URN)10.5244/C.27.48 (DOI)000346352700045 ()2-s2.0-84898413079 (Scopus ID)
Konferens
2013 24th British Machine Vision Conference, BMVC 2013; Bristol; United Kingdom; 9 September 2013 through 13 September 2013
Forskningsfinansiär
EU, FP7, Sjunde ramprogrammet
Anmärkning

QC 20131217

Tillgänglig från: 2013-11-19 Skapad: 2013-11-19 Senast uppdaterad: 2024-03-15Bibliografiskt granskad
4. One Millisecond Face Alignment with an Ensemble of Regression Trees
Öppna denna publikation i ny flik eller fönster >>One Millisecond Face Alignment with an Ensemble of Regression Trees
2014 (Engelska)Ingår i: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE Computer Society, 2014, s. 1867-1874Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

This paper addresses the problem of Face Alignment for a single image. We show how an ensemble of regression trees can be used to estimate the face's landmark positions directly from a sparse subset of pixel intensities, achieving super-realtime performance with high quality predictions. We present a general framework based on gradient boosting for learning an ensemble of regression trees that optimizes the sum of square error loss and naturally handles missing or partially labelled data. We show how using appropriate priors exploiting the structure of image data helps with efficient feature selection. Different regularization strategies and its importance to combat overfitting are also investigated. In addition, we analyse the effect of the quantity of training data on the accuracy of the predictions and explore the effect of data augmentation using synthesized data.

Ort, förlag, år, upplaga, sidor
IEEE Computer Society, 2014
Serie
IEEE Conference on Computer Vision and Pattern Recognition. Proceedings, ISSN 1063-6919
Nyckelord
Decision Trees, Face Alignment, Gradient Boosting, Real-Time
Nationell ämneskategori
Datorseende och robotik (autonoma system)
Identifikatorer
urn:nbn:se:kth:diva-144334 (URN)10.1109/CVPR.2014.241 (DOI)000361555601116 ()2-s2.0-84911391543 (Scopus ID)978-147995117-8 (ISBN)
Konferens
27th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, Columbus, United States, 23 June 2014 through 28 June 2014
Forskningsfinansiär
Stiftelsen för strategisk forskning (SSF), 6246
Anmärkning

QC 20140423

Tillgänglig från: 2014-04-20 Skapad: 2014-04-20 Senast uppdaterad: 2024-03-15Bibliografiskt granskad
5. Real-time Face Reconstruction from a Single Depth Image
Öppna denna publikation i ny flik eller fönster >>Real-time Face Reconstruction from a Single Depth Image
Visa övriga...
2014 (Engelska)Ingår i: Proceedings - 2014 International Conference on 3D Vision, 3DV 2014, IEEE conference proceedings, 2014, s. 369-376Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

This paper contributes a real time method for recovering facial shape and expression from a single depth image. The method also estimates an accurate and dense correspondence field between the input depth image and a generic face model. Both outputs are a result of minimizing the error in reconstructing the depth image, achieved by applying a set of identity and expression blend shapes to the model. Traditionally, such a generative approach has shown to be computationally expensive and non-robust because of the non-linear nature of the reconstruction error. To overcome this problem, we use a discriminatively trained prediction pipeline that employs random forests to generate an initial dense but noisy correspondence field. Our method then exploits a fast ICP-like approximation to update these correspondences, allowing us to quickly obtain a robust initial fit of our model. The model parameters are then fine tuned to minimize the true reconstruction error using a stochastic optimization technique. The correspondence field resulting from our hybrid generative-discriminative pipeline is accurate and useful for a variety of applications such as mesh deformation and retexturing. Our method works in real-time on a single depth image i.e. without temporal tracking, is free from per-user calibration, and works in low-light conditions.

Ort, förlag, år, upplaga, sidor
IEEE conference proceedings, 2014
Nationell ämneskategori
Datorseende och robotik (autonoma system)
Identifikatorer
urn:nbn:se:kth:diva-150827 (URN)10.1109/3DV.2014.93 (DOI)2-s2.0-84925299755 (Scopus ID)9781479970018 (ISBN)
Konferens
2014 2nd International Conference on 3D Vision, 3DV 2014; The University of TokyoTokyo; Japan; 8 December 2014 through 11 December 2014
Anmärkning

QC 20140911

Tillgänglig från: 2014-09-10 Skapad: 2014-09-10 Senast uppdaterad: 2022-06-23Bibliografiskt granskad

Open Access i DiVA

Thesis(27867 kB)1240 nedladdningar
Filinformation
Filnamn FULLTEXT02.pdfFilstorlek 27867 kBChecksumma SHA-512
a1c2ba9c2d00f07c66e9911508bb86a63b373dcf2e6d392691d10d228c8067e836a667bf035fa189cb2392e52732537c96e94daee66ffa8efbf26b331e26a1a7
Typ fulltextMimetyp application/pdf

Person

Kazemi, Vahid

Sök vidare i DiVA

Av författaren/redaktören
Kazemi, Vahid
Av organisationen
Datorseende och robotik, CVAP
Datorseende och robotik (autonoma system)

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 1240 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

isbn
urn-nbn

Altmetricpoäng

isbn
urn-nbn
Totalt: 846 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf