Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
3D Gesture Recognition and Tracking for Next Generation of Smart Devices: Theories, Concepts, and Implementations
KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
2014 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

The rapid development of mobile devices during the recent decade has been greatly driven by interaction and visualization technologies. Although touchscreens have signicantly enhanced the interaction technology, it is predictable that with the future mobile devices, e.g., augmentedreality glasses and smart watches, users will demand more intuitive in-puts such as free-hand interaction in 3D space. Specically, for manipulation of the digital content in augmented environments, 3D hand/body gestures will be extremely required. Therefore, 3D gesture recognition and tracking are highly desired features for interaction design in future smart environments. Due to the complexity of the hand/body motions, and limitations of mobile devices in expensive computations, 3D gesture analysis is still an extremely diffcult problem to solve.

This thesis aims to introduce new concepts, theories and technologies for natural and intuitive interaction in future augmented environments. Contributions of this thesis support the concept of bare-hand 3D gestural interaction and interactive visualization on future smart devices. The introduced technical solutions enable an e ective interaction in the 3D space around the smart device. High accuracy and robust 3D motion analysis of the hand/body gestures is performed to facilitate the 3D interaction in various application scenarios. The proposed technologies enable users to control, manipulate, and organize the digital content in 3D space.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2014. , xii, 101 p.
Series
TRITA-CSC-A, ISSN 1653-5723 ; 14:02
Keyword [en]
3D gestural interaction, gesture recognition, gesture tracking, 3D visualization, 3D motion analysis, augmented environments
National Category
Engineering and Technology
Research subject
Media Technology
Identifiers
URN: urn:nbn:se:kth:diva-141938ISBN: 978-91-7595-031-0 (print)OAI: oai:DiVA.org:kth-141938DiVA: diva2:699011
Public defence
2014-03-17, F3, Lindstedtsvägen 26, KTH, Stockholm, 13:15 (English)
Opponent
Supervisors
Note

QC 20140226

Available from: 2014-02-26 Created: 2014-02-26 Last updated: 2014-02-26Bibliographically approved
List of papers
1. Experiencing real 3D gestural interaction with mobile devices
Open this publication in new window or tab >>Experiencing real 3D gestural interaction with mobile devices
2013 (English)In: Pattern Recognition Letters, ISSN 0167-8655, E-ISSN 1872-7344, Vol. 34, no 8, 912-921 p.Article in journal (Refereed) Published
Abstract [en]

Number of mobile devices such as smart phones or Tablet PCs has been dramatically increased over the recent years. New mobile devices are equipped with integrated cameras and large displays which make the interaction with the device more efficient. Although most of the previous works on interaction between humans and mobile devices are based on 2D touch-screen displays, camera-based interaction opens a new way to manipulate in 3D space behind the device, in the camera's field of view. In this paper, our gestural interaction heavily relies on particular patterns from local orientation of the image called Rotational Symmetries. This approach is based on finding the most suitable pattern from a large set of rotational symmetries of different orders that ensures a reliable detector for hand gesture. Consequently, gesture detection and tracking can be hired as an efficient tool for 3D manipulation in various applications in computer vision and augmented reality. The final output will be rendered into color anaglyphs for 3D visualization. Depending on the coding technology, different low cost 3D glasses can be used for the viewers.

Keyword
3D mobile interaction, Rotational symmetries, Gesture detection, SIFT, Gesture tracking, stereoscopic visualization
National Category
Signal Processing
Identifiers
urn:nbn:se:kth:diva-141805 (URN)10.1016/j.patrec.2013.02.004 (DOI)000318129200010 ()2-s2.0-84893684743 (Scopus ID)
Note

QC 20150624

Available from: 2014-02-25 Created: 2014-02-25 Last updated: 2017-12-05Bibliographically approved
2. 3D Photo Browsing for Future Mobile Devices
Open this publication in new window or tab >>3D Photo Browsing for Future Mobile Devices
2012 (English)In: Proceedings of the 20th ACM International Conference on Multimedia, 2012, 1401-1404 p.Conference paper, Published paper (Refereed)
Abstract [en]

By introducing the interactive 3D photo/video browsing and exploration system, we propose novel approaches for handling the limitations of the current 2D mobile technology from two aspects: interaction design and visualization. Our contributions feature an effective interaction that happens in the 3D space behind the mobile device's camera. 3D motion analysis of the user's gesture captured by the device's camera is performed to facilitate the interaction between users and multimedia collections in various applications. This approach will solve a wide range of problems with the current input facilities such as miniature keyboards, tiny joysticks and 2D touch screens. The suggested interactive technology enables users to control, manipulate, organize, and re-arrange their photo/video collections in 3D space using bare-hand, marker-less gesture. Moreover, with the proposed techniques we aim to visualize the 2D photo collection, in 3D, on normal 2D displays. This process is automatically done by retrieving the 3D structure from single images, finding the stereo/multiple views of a scene or using the geo-tagged meta-data from huge photo collections. By using the design and implementation of the contributions of this work, we aim to achieve the following goals: Solving the limitations of the current 2D interaction facilities by 3D gestural interaction; Increasing the usability of the multimedia applications on mobile devices; Enhancing the quality of user experience with the digital collections.

Series
MM ’12
Keyword
3D gestural interaction, 3D visualization, motion analysis, photo browsing, quality of experience
National Category
Signal Processing
Identifiers
urn:nbn:se:kth:diva-141806 (URN)10.1145/2393347.2396503 (DOI)2-s2.0-84871368430 (Scopus ID)
Conference
ACM International Conference on Multimedia
Note

QC 20140226

Available from: 2014-02-25 Created: 2014-02-25 Last updated: 2014-02-26Bibliographically approved
3. Bare-hand Gesture Recognition and Tracking through the Large-scale Image Retrieval
Open this publication in new window or tab >>Bare-hand Gesture Recognition and Tracking through the Large-scale Image Retrieval
2014 (English)In: VISAPP 2014 - Proceedings of the 9th International Conference on Computer Vision Theory and Applications, 2014Conference paper, Published paper (Refereed)
National Category
Signal Processing
Identifiers
urn:nbn:se:kth:diva-141832 (URN)
Conference
9th International Conference on Computer Vision Theory and Applications - VISAPP 2014 on Jan 5, 2014 in Lisbon, Portugal
Note

QC 20160610

Available from: 2014-02-25 Created: 2014-02-25 Last updated: 2016-06-10Bibliographically approved
4. Interactive 3D Visualization on a 4K Wall-Sized Display
Open this publication in new window or tab >>Interactive 3D Visualization on a 4K Wall-Sized Display
2014 (English)Conference paper, Published paper (Refereed)
Abstract [en]

This paper introduces a novel vision-based approach for realistic interaction between user and display's content. An extremely accurate motion capture system is proposed to measure and track the user's head motion in 3D space. Video frames captured by the low-cost head-mounted camera are processed to retrieve the 3D motion parameters. The retrieved information facilitates the real-time 3D interaction. This technology turns any 2D screen to interactive 3D display, enabling users to control and manipulate the content as a digital window. The proposed system is tested and verified on a huge wall-sized 4K screen.

Place, publisher, year, edition, pages
IEEE conference proceedings, 2014
Keyword
3-D displays, 3D interactions, 3D-motion parameters, Accurate motion, Head mounted Camera, Interactive 3d visualizations, Vision-based approaches, Wall-sized displays
National Category
Signal Processing
Identifiers
urn:nbn:se:kth:diva-141838 (URN)10.1109/APSIPA.2014.7041653 (DOI)2-s2.0-84949926246 (Scopus ID)978-616361823-8 (ISBN)
Conference
International Conference on Image Processing (ICIP 2014)
Note

QC 20151208. QC 20160212

Available from: 2014-02-25 Created: 2014-02-25 Last updated: 2016-02-12Bibliographically approved
5. 3D Visualization of Single Images using Patch Level Depth
Open this publication in new window or tab >>3D Visualization of Single Images using Patch Level Depth
2011 (English)In: SIGMAP 2011, 2011, 61-66 p.Conference paper, Published paper (Refereed)
Abstract [en]

In this paper we consider the task of 3D photo visualization using a single monocular image. The main idea is to use single photos taken by capturing devices such as ordinary cameras, mobile phones, tablet PCs etc. and visualize them in 3D on normal displays. Supervised learning approach is hired to retrieve depth information from single images. This algorithm is based on the hierarchical multi-scale Markov Random Field (MRF) which models the depth based on the multi-scale global and local features and relation between them in a monocular image. Consequently, the estimated depth image is used to allocate the specified depth parameters for each pixel in the 3D map. Accordingly, the multi-level depth adjustments and coding for color anaglyphs is performed. Our system receives a single 2D image as input and provides a anaglyph coded 3D image in output. Depending on the coding technology the special low-cost anaglyph glasses for viewers will be used.

National Category
Signal Processing
Identifiers
urn:nbn:se:kth:diva-141824 (URN)
Conference
SIGMAP 2011, International Conference on Signal Processing and Multimedia Applications(SIGMAP). Jul 18, 2011 - Jul 21, 2011, Seville - Spain
Note

QC 20140226

Available from: 2014-02-25 Created: 2014-02-25 Last updated: 2016-06-10Bibliographically approved
6. Stereoscopic visualization of monocular images in photo collections
Open this publication in new window or tab >>Stereoscopic visualization of monocular images in photo collections
2011 (English)In: Wireless Communications and Signal Processing (WCSP), 2011 International Conference on, IEEE , 2011, 1-5 p.Conference paper, Published paper (Refereed)
Abstract [en]

In this paper we propose a novel approach for 3D video/photo visualization using an ordinary digital camera. The idea is to turn any 2D camera into 3D based on the data derived from a collection of captured photos or a recorded video. For a given monocular input, the retrieved information from the overlapping photos can be used to provide required information for performing 3D output. Robust feature detection and matching between images is hired to find the transformation between overlapping frames. The transformation matrix will map images to the same horizontal baseline. Afterwards, the projected images will be adjusted to the stereoscopic model. Finally, stereo views will be coded into 3D channels for visualization. This approach enables us making 3D output using randomly taken photos of a scene or a recorded video. Our system receives 2D monocular input and provides double layer coded 3D output. Depending on the coding technology different low cost 3D glasses will be used for viewers.

Place, publisher, year, edition, pages
IEEE, 2011
Keyword
cameras, feature extraction, image matching, matrix algebra, stereo image processing, video coding, video retrieval, 3D channel, 3D glasses, 3D video-photo visualization, coding technology, digital camera, feature detection, information retrieval, monocular images, overlapping frames, overlapping photos, photo collections, stereoscopic visualization, transformation matrix, Image color analysis, Robustness, Three dimensional displays, Visualization
National Category
Signal Processing
Identifiers
urn:nbn:se:kth:diva-141822 (URN)10.1109/WCSP.2011.6096688 (DOI)2-s2.0-84555194972 (Scopus ID)978-1-4577-1008-7 (ISBN)
Conference
WCSP 2011
Note

QC 20140226

Available from: 2014-02-25 Created: 2014-02-25 Last updated: 2016-04-26Bibliographically approved
7. Robust correction of 3D geo-metadata in photo collections by forming a photo grid
Open this publication in new window or tab >>Robust correction of 3D geo-metadata in photo collections by forming a photo grid
2011 (English)In: Wireless Communications and Signal Processing (WCSP), 2011 International Conference on, IEEE , 2011, 1-5 p.Conference paper, Published paper (Refereed)
Abstract [en]

In this work, we present a technique for efficient and robust estimation of the exact location and orientation of a photo capture device in a large data set. The provided data set includes a set of photos and the associated information from GPS and orientation sensor. This attached metadata is noisy and lacks precision. Our strategy to correct this uncertain data is based on the data fusion between measurement model, derived from sensor data, and signal model given by the computer vision algorithms. Based on the retrieved information from multiple views of a scene we make a grid of images. Our robust feature detection and matching between images result in finding a reliable transformation. Consequently, relative location and orientation of the data set construct the signal model. On the other hand, information extracted from the single images combined with the measurement data make the measurement model. Finally, Kalman filter is used to fuse these two models iteratively and enhance the estimation of the ground truth(GT) location and orientation. Practically, this approach can help us to design a photo browsing system from a huge collection of photos, enabling 3D navigation and exploration of our huge data set.

Place, publisher, year, edition, pages
IEEE, 2011
Keyword
Kalman filters, computer vision, feature extraction, image fusion, image matching, image retrieval, information retrieval, iterative methods, meta data, 3D geometadata, 3D navigation, GPS, GT orientation, Kalman filter, computer vision algorithms, data fusion, data set, ground truth location, image grid, information extraction, iterative fusion, measurement model, orientation sensor, photo browsing system, photo capture device, photo collections, photo grid, robust correction, robust estimation, robust feature detection, sensor data, signal model, Cameras, Computational modeling, Data models, Estimation, Noise, Noise measurement, Uncertainty
National Category
Signal Processing
Identifiers
urn:nbn:se:kth:diva-141823 (URN)10.1109/WCSP.2011.6096689 (DOI)2-s2.0-84555186863 (Scopus ID)
Conference
WCSP 2011
Note

QC 20140226

Available from: 2014-02-25 Created: 2014-02-25 Last updated: 2016-04-26Bibliographically approved

Open Access in DiVA

PhD_thesis_Introduction(3467 kB)1034 downloads
File information
File name FULLTEXT01.pdfFile size 3467 kBChecksum SHA-512
51c077de0397796bc6a94e8b1dea9d48ce17278aa34dbaca81f86d9a6530c54c15c65fc915599a9a8cc2948c495ff576ebb8b7139b108e936c3d6c210da5acd3
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Yousefi, Shahrouz
By organisation
Media Technology and Interaction Design, MID
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar
Total: 1034 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 904 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf