Change search
Refine search result
12 1 - 50 of 66
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the 'Create feeds' function.
  • 1. Abedan Kondori, Farid
    et al.
    Yousefi, Shahrouz
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Direct Head Pose Estimation Using Kinect-type Sensors2014In: Electronics Letters, ISSN 0013-5194, E-ISSN 1350-911XArticle in journal (Refereed)
  • 2. Abedan Kondori, Farid
    et al.
    Yousefi, Shahrouz
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Liu, Li
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID. Nanjing University of Posts and Telecommunications, Nanjing, China .
    Head Operated Electric Wheelchair2014In: Proceedings of the IEEE Southwest Symposium on Image Analysis and Interpretation, 2014, 53-56 p.Conference paper (Refereed)
    Abstract [en]

    Currently, the most common way to control an electric wheelchair is to use joystick. However, there are some individuals unable to operate joystick-driven electric wheelchairs due to sever physical disabilities, like quadriplegia patients. This paper proposes a novel head pose estimation method to assist such patients. Head motion parameters are employed to control and drive an electric wheelchair. We introduce a direct method for estimating user head motion, based on a sequence of range images captured by Kinect. In this work, we derive new version of the optical flow constraint equation for range images. We show how the new equation can be used to estimate head motion directly. Experimental results reveal that the proposed system works with high accuracy in real-time. We also show simulation results for navigating the electric wheelchair by recovering user head motion.

  • 3. Darvish, A. M.
    et al.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Söderström, U.
    Super-resolution facial images from single input images based on discrete wavelet transform2014In: Proceedings - International Conference on Pattern Recognition, 2014, 843-848 p.Conference paper (Refereed)
    Abstract [en]

    In this work, we are presenting a technique that allows for accurate estimation of frequencies in higher dimensions than the original image content. This technique uses asymmetrical Principal Component Analysis together with Discrete Wavelet Transform (aPCA-DWT). For example, high quality content can be generated from low quality cameras since the necessary frequencies can be estimated through reliable methods. Within our research, we build models for interpreting facial images where super-resolution versions of human faces can be created. We have worked on several different experiments, extracting the frequency content in order to create models with aPCA-DWT. The results are presented along with experiments of deblurring and zooming beyond the original image resolution. For example, when an image is enlarged 16 times in decoding, the proposed technique outperforms interpolation with more than 7 dB on average.

  • 4. Ge, Q.
    et al.
    Shen, F.
    Jing, X. -Y
    Wu, F.
    Xie, S. -P
    Yue, D.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Active contour evolved by joint probability classification on Riemannian manifold2016In: Signal, Image and Video Processing, ISSN 1863-1703, E-ISSN 1863-1711, Vol. 10, no 7, 1257-1264 p.Article in journal (Refereed)
    Abstract [en]

    In this paper, we present an active contour model for image segmentation based on a nonparametric distribution metric without any intensity a priori of the image. A novel nonparametric distance metric, which is called joint probability classification, is established to drive the active contour avoiding the instability induced by multimodal intensity distribution. Considering an image as a Riemannian manifold with spatial and intensity information, the contour evolution is performed on the image manifold by embedding geometric image feature into the active contour model. The experimental results on medical and texture images demonstrate the advantages of the proposed method.

  • 5. Ge, Qi
    et al.
    Jing, Xiao-Yuan
    Wu, Fei
    Yan, Jingjie
    Li, Hai-Bo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Unsupervised Joint Image Denoising and Active Contour Segmentation in Multidimensional Feature Space2016In: Mathematical problems in engineering (Print), ISSN 1024-123X, E-ISSN 1563-5147, 3909645Article in journal (Refereed)
  • 6. Halawani, A.
    et al.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID. KTH.
    FingerInk: Turn your glass into a digital board2013In: Proceedings of the 25th Australian Computer-Human Interaction Conference: Augmentation, Application, Innovation, Collaboration, OzCHI 2013, Association for Computing Machinery (ACM), 2013, 393-396 p.Conference paper (Refereed)
    Abstract [en]

    We present a robust vision-based technology for hand and finger detection and tracking that can be used in many CHI scenarios. The method can be used in real-life setups and does not assume any predefined conditions. Moreover, it does not require any additional expensive hardware. It fits well into user's environment without major changes and hence can be used in ambient intelligence paradigm. Another contribution is the interaction using glass which is a natural, yet challenging environment to interact with. We introduce the concept of "invisible information layer" embedded into normal window glass that is used as an interaction medium thereafter.

  • 7. Halawani, A.
    et al.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID. Nanjing University of Posts and Telecommunications.
    Template-based search: A tool for scene analysis2016In: Proceeding - 2016 IEEE 12th International Colloquium on Signal Processing and its Applications, CSPA 2016, IEEE conference proceedings, 2016, 1-6 p.Conference paper (Refereed)
    Abstract [en]

    This paper proposes a simple and yet effective technique for shape-based scene analysis, in which detection and/or tracking of specific objects or structures in the image is desirable. The idea is based on using predefined binary templates of the structures to be located in the image. The template is matched to contours in a given edge image to locate the designated entity. These templates are allowed to deform in order to deal with variations in the structure's shape and size. Deformation is achieved by dividing the template into segments. The dynamic programming search algorithm is used to accomplish the matching process, achieving very robust results in cluttered and noisy scenes in the applications presented.

  • 8. Halawani, Alaa
    et al.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    100 lines of code for shape-based object localization2016In: Pattern Recognition, ISSN 0031-3203, E-ISSN 1873-5142, Vol. 60, 458-472 p.Article in journal (Refereed)
    Abstract [en]

    We introduce a simple and effective concept for localizing objects in densely cluttered edge images based on shape information. The shape information is characterized by a binary template of the object's contour, provided to search for object instances in the image. We adopt a segment-based search strategy, in which the template is divided into a set of segments. In this work, we propose our own segment representation that we call one-pixel segment (OPS), in which each pixel in the template is treated as a separate segment. This is done to achieve high flexibility that is required to account for intra-class variations. OPS representation can also handle scale changes effectively. A dynamic programming algorithm uses the OPS representation to realize the search process, enabling a detailed localization of the object boundaries in the image. The concept's simplicity is reflected in the ease of implementation, as the paper's title suggests. The algorithm works directly with very noisy edge images extracted using the Canny edge detector, without the need for any preprocessing or learning steps. We present our experiments and show that our results outperform those of very powerful, state-of-the-art algorithms.

  • 9. Halawani, Alaa
    et al.
    Rehman, Shafiq Ur
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Active vision for tremor disease monitoring2015In: 6TH INTERNATIONAL CONFERENCE ON APPLIED HUMAN FACTORS AND ERGONOMICS (AHFE 2015) AND THE AFFILIATED CONFERENCES, AHFE 2015, Elsevier, 2015, 2042-2048 p.Conference paper (Refereed)
    Abstract [en]

    The aim of this work is to introduce a prototype for monitoring tremor diseases using computer vision techniques. While vision has been previously used for this purpose, the system we are introducing differs intrinsically from other traditional systems. The essential difference is characterized by the placement of the camera on the user's body rather than in front of it, and thus reversing the whole process of motion estimation. This is called active motion tracking. Active vision is simpler in setup and achieves more accurate results compared to traditional arrangements, which we refer to as "passive" here. One main advantage of active tracking is its ability to detect even tiny motions using its simple setup, and that makes it very suitable for monitoring tremor disorders.

  • 10. Khan, M. S. L.
    et al.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Rehman, Shafiq Ur
    Telepresence Mechatronic Robot (TEBoT): Towards the design and control of socially interactive bio-inspired system2016In: Journal of Intelligent & Fuzzy Systems, ISSN 1064-1246, E-ISSN 1875-8967, Vol. 31, no 5, 2597-2610 p.Article in journal (Refereed)
    Abstract [en]

    Socially interactive systems are embodied agents that engage in social interactions with humans. From a design perspective, these systems are built by considering a biologically inspired design (Bio-inspired) that can mimic and simulate human-like communication cues and gestures. The design of a bio-inspired system usually consists of (i) studying biological characteristics, (ii) designing a similar biological robot, and (iii) motion planning, that can mimic the biological counterpart. In this article, we present a design, development, control-strategy and verification of our socially interactive bio-inspired robot, namely - Telepresence Mechatronic Robot (TEBoT). The key contribution of our work is an embodiment of a real human-neck movements by, i) designing a mechatronic platform based on the dynamics of a real human neck and ii) capturing the real head movements through our novel single-camera based vision algorithm. Our socially interactive bio-inspired system is based on an intuitive integration-design strategy that combines computer vision based geometric head pose estimation algorithm, model based design (MBD) approach and real-time motion planning techniques. We have conducted an extensive testing to demonstrate effectiveness and robustness of our proposed system.

  • 11. Khan, M. S. L.
    et al.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Réhman, S. U.
    Expressive multimedia: Bringing action to physical world by dancing-tablet2015In: HCMC 2015 - Proceedings of the 2nd Workshop on Computational Models of Social Interactions: Human-Computer-Media Communication, co-located with ACM MM 2015, ACM Digital Library, 2015, 9-14 p.Conference paper (Refereed)
    Abstract [en]

    The design practice based on embodied interaction concept focuses on developing new user interfaces for computer devices that merge the digital content with the physical world. In this work we have proposed a novel embodied interaction based design in which the 'action' information of the digital content is presented in the physical world. More specifically, we have mapped the 'action' information of the video content from the digital world into the physical world. The motivating example presented in this paper is our novel dancing-tablet, in which a tablet-PC dances on the rhythm of the song, hence the 'action' information is not just confined into a 2D flat display but also expressed by it. This paper presents i) hardware design of our mechatronic dancingtablet platform, ii) software algorithm for musical feature extraction and iii) embodied computational model for mapping 'action' information of the musical expression to the mechatronic platform. Our user study shows that the overall perception of audio-video music is enhanced by our dancingtablet setup.

  • 12. Khan, M. S. L.
    et al.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Ur Réhman, S.
    Embodied tele-presence system (ETS): Designing tele-presence for video teleconferencing2014In: 3rd International Conference on Design, User Experience, and Usability: User Experience Design for Diverse Interaction Platforms and Environments, DUXU 2014, Held as Part of 16th International Conference on Human-Computer Interaction, HCI Int. 2014, 2014, no PART 2, 574-585 p.Conference paper (Refereed)
    Abstract [en]

    In spite of the progress made in tele conferencing over the last decades, however, it is still far from a resolved issue. In this work, we present an intuitive video teleconferencing system, namely - Embodied Tele-Presence System (ETS) which is based on embodied interaction concept. This work proposes the results of a user study considering the hypothesis: " Embodied interaction based video conferencing system performs better than the standard video conferencing system in representing nonverbal behaviors, thus creating a 'feeling of presence' of a remote person among his/her local collaborators". Our ETS integrates standard audio-video conferencing with mechanical embodiment of head gestures of a remote person (as nonverbal behavior) to enhance the level of interaction. To highlight the technical challenges and design principles behind such tele-presence systems, we have also performed a system evaluation which shows the accuracy and efficiency of our ETS design. The paper further provides an overview of our case study and an analysis of our user evaluation. The user study shows that the proposed embodied interaction approach in video teleconferencing increases 'in-meeting interaction' and enhance a 'feeling of presence' among remote participant and his collaborators.

  • 13. Khan, M. S. L.
    et al.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Ur Réhman, S.
    Tele-immersion: Virtual reality based collaboration2016In: 18th International Conference on Human-Computer Interaction, HCI International 2016, Springer, 2016, 352-357 p.Conference paper (Refereed)
    Abstract [en]

    The ‘perception of being present in another space’ during video teleconferencing is a challenging task. This work makes an effort to improve upon a user perception of being ‘present’ in another space by employing a virtual reality (VR) headset and an embodied telepresence system (ETS). In our application scenario, a remote participant uses a VR headset to collaborate with local collaborators. At a local site, an ETS is used as a physical representation of the remote participant among his/her local collaborators. The head movements of the remote person is mapped and presented by the ETS along with audio-video communication. Key considerations of complete design are discussed, where solutions to challenges related to head tracking, audio-video communication and data communication are presented. The proposed approach is validated by the user study where quantitative analysis is done on immersion and presence parameters.

  • 14. Khan, M. S. L.
    et al.
    Réhman, S. U.
    Söderström, U.
    Halawani, A.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Face-off: A face reconstruction technique for virtual reality (VR) scenarios2016In: 14th European Conference on Computer Vision, ECCV 2016, Springer, 2016, 490-503 p.Conference paper (Refereed)
    Abstract [en]

    Virtual Reality (VR) headsets occlude a significant portion of human face. The real human face is required in many VR applications, for example, video teleconferencing. This paper proposes a wearable camera setup-based solution to reconstruct the real face of a person wearing VR headset. Our solution lies in the core of asymmetrical principal component analysis (aPCA). A user-specific training model is built using aPCA with full face, lips and eye region information. During testing phase, lower face region and partial eye information is used to reconstruct the wearer face. Online testing session consists of two phases, (i) calibration phase and (ii) reconstruction phase. In former, a small calibration step is performed to align test information with training data, while the later uses half face information to reconstruct the full face using aPCAbased trained-data. The proposed approach is validated with qualitative and quantitative analysis.

  • 15. Khan, M. S. L.
    et al.
    Ur Rehman, S.
    Hera, P. L.
    Liu, F.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    A pilot user's prospective in mobile robotic telepresence system2014In: 2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014, IEEE conference proceedings, 2014Conference paper (Refereed)
    Abstract [en]

    In this work we present an interactive video conferencing system specifically designed for enhancing the experience of video teleconferencing for a pilot user. We have used an Embodied Telepresence System (ETS) which was previously designed to enhance the experience of video teleconferencing for the collaborators. In this work we have deployed an ETS in a novel scenario to improve the experience of pilot user during distance communication. The ETS is used to adjust the view of the pilot user at the distance location (e.g. distance located conference/meeting). The velocity profile control for the ETS is developed which is implicitly controlled by the head of the pilot user. The experiment was conducted to test whether the view adjustment capability of an ETS increases the collaboration experience of video conferencing for the pilot user or not. The user study was conducted in which participants (pilot users) performed interaction using ETS and with traditional computer based video conferencing tool. Overall, the user study suggests the effectiveness of our approach and hence results in enhancing the experience of video conferencing for the pilot user. © 2014 Asia-Pacific Signal and Information Processing Ass.

  • 16. Khan, Muhammad Sikandar Lal Khan
    et al.
    Réhman, Shafiq
    Lu, Zhihan
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Tele-embodied agent (TEA) for video teleconferencing2013In: Proceedings of the 12th International Conference on Mobile and Ubiquitous Multimedia, MUM 2013, Association for Computing Machinery (ACM), 2013Conference paper (Refereed)
    Abstract [en]

    We propose a design of teleconference system which express nonverbal behavior (in our case head gesture) along with audio-video communication. Previous audio-video confer- encing systems are abortive in presenting nonverbal behav- iors which we, as human, usually use in face to face in- teraction. Recently, research in teleconferencing systems has expanded to include nonverbal cues of remote person in their distance communication. The accurate representation of non-verbal gestures for such systems is still challenging because they are dependent on hand-operated devices (like mouse or keyboard). Furthermore, they still lack in present- ing accurate human gestures. We believe that incorporating embodied interaction in video teleconferencing, (i.e., using the physical world as a medium for interacting with digi- tal technology) can result in nonverbal behavior represen- tation. The experimental platform named Tele-Embodied Agent (TEA) is introduced which incorperates remote per- son's head gestures to study new paradigm of embodied in- teraction in video teleconferencing. Our preliminary test shows accuracy (with respect to pose angles) and efficiency (with respect to time) of our proposed design. TEA can be used in medical field, factories, offices, gaming industry, music industry and for training.

  • 17. Khan, Muhammad Sikandar Lal
    et al.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Rehman, Shafiq Ur
    Gaze perception and awareness in smart devices2016In: International journal of human-computer studies, ISSN 1071-5819, E-ISSN 1095-9300, Vol. 92-93, 55-65 p.Article in journal (Refereed)
    Abstract [en]

    Eye contact and gaze awareness play a significant role for conveying emotions and intentions during face-to-face conversation. Humans can perceive each other's gaze quite naturally and accurately. However, the gaze awareness/perception are ambiguous during video teleconferencing performed by computer-based devices (such as laptops, tablet, and smart-phones). The reasons for this ambiguity are the (i) camera position relative to the screen and (ii) 2D rendition of 3D human face i.e., the 2D screen is unable to deliver an accurate gaze during video teleconferencing. To solve this problem, researchers have proposed different hardware setups with complex software algorithms. The most recent solution for accurate gaze perception employs 3D interfaces, such as 3D screens and 3D face-masks. However, today commonly used video teleconferencing devices are smart devices with 2D screens. Therefore, there is a need to improve gaze awareness/perception in these smart devices. In this work, we have revisited the question; how to improve a remote user's gaze awareness among his/her collaborators. Our hypothesis is that 'an accurate gaze perception can be achieved by the '3D embodiment' of a remote user's head gesture during video teleconferencing. We have prototyped an embodied telepresence system (ETS) for the 3D embodiment of a remote user's head. Our ETS is based on a 3-DOF neck robot with a mounted smart device (tablet PC). The electromechanical platform in combination with a smart device is a novel setup that is used for studying gaze awareness/perception in 2D screen-based smart devices during video teleconferencing. Two important gaze-related issues are considered in this work; namely (i) 'Mona-Lisa Gaze Effect' - the gaze is always directed at the person independent of his position in the room, and (ii) 'Gaze Awareness/Faithfulness' - the ability to perceive an accurate spatial relationship between the observing person and the object by an actor. Our results confirm that the 3D embodiment of a remote user head not only mitigates the Mona Lisa gaze effect but also supports three levels of gaze faithfulness, hence, accurately projecting the human gaze in distant space.

  • 18. Kondori, F. A.
    et al.
    Liu, L.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Telelife: An immersive media experience for rehabilitation2014In: 2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014, IEEE conference proceedings, 2014Conference paper (Refereed)
    Abstract [en]

    In recent years, emergence of telerehabilitation systems for home-based therapy has altered healthcare systems. Telerehabilitation enables therapists to observe patients status via Internet, thus a patient does not have to visit rehabilitation facilities for every rehabilitation session. Despite the fact that telerehabilitation provides great opportunities, there are two major issues that affect effectiveness of telerehabilitation: relegation of the patient at home, and loss of direct supervision of the therapist. Since patients have no actual interaction with other persons during the rehabilitation period, they will become isolated and gradually lose their social skills. Moreover, without direct supervision of therapists, rehabilitation exercises can be performed with bad compensation strategies that lead to a poor quality recovery. To resolve these issues, we propose telelife, a new concept for future rehabilitation systems. The idea is to use media technology to create a totally new immersive media experience for rehabilitation. In telerehabilitation patients locally execute exercises, and therapists remotely monitor patients' status. In telelife patients, however, remotely perform exercises and therapists locally monitor. Thus, not only telelife enables rehabilitation at distance, but also improves the patients' social competences, and provides direct supervision of therapists. In this paper we introduce telelife to enhance telerehabilitation, and investigate technical challenges and possible methods to achieve telelife.

  • 19. Kondori, F. A.
    et al.
    Yousefi, Shahrouz
    Umeå University, Sweden .
    Li, Haibo
    Umeå University, Sweden .
    Real 3D interaction behind mobile phones for augmented environments2011Conference paper (Refereed)
    Abstract [en]

    Number of mobile devices such as mobile phones or PDAs has been dramatically increased over the recent years. New mobile devices are equipped with integrated cameras and large displays which make the interaction with device easier and more efficient. Although most of the previous works on interaction between humans and mobile devices are based on 2D touch-screen displays, camera-based interaction opens a new way to manipulate in 3D space behind the device in the camera's field of view. This paper suggests the use of particular patterns from local orientation of the image called Rotational Symmetries to detect and localize human gesture. Relative rotation and translation of human gesture between consecutive frames are estimated by means of extracting stable features. Consequently, this information can be used to facilitate the 3D manipulation of virtual objects in various applications in mobile devices.

  • 20. Kondori, F. A.
    et al.
    Yousefi, Shahrouz
    Umeå University, Sweden .
    Li, Haibo
    Umeå University, Sweden .
    Sonning, S.
    3D head pose estimation using the Kinect2011Conference paper (Refereed)
    Abstract [en]

    Head pose estimation plays an essential role for bridging the information gap between humans and computers. Conventional head pose estimation methods are mostly done in images captured by cameras. However accurate and robust pose estimation is often problematic. In this paper we present an algorithm for recovering the six degrees of freedom (DOF) of motion of a head from a sequence of range images taken by the Microsoft Kinect for Xbox 360. The proposed algorithm utilizes a least-squares minimization of the difference between the measured rate of change of depth at a point and the rate predicted by the depth rate constraint equation. We segment the human head from its surroundings and background, and then we estimate the head motion. Our system has the capability to recover the six DOF of the head motion of multiple people in one image. The proposed system is evaluated in our lab and presents superior results.

  • 21. Kondori, Farid Abedan
    et al.
    Yousefi, Shahrouz
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Kouma, Jean-Paul
    Liu, Li
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Direct hand pose estimation for immersive gestural interaction2015In: Pattern Recognition Letters, ISSN 0167-8655, E-ISSN 1872-7344, Vol. 66, 91-99 p.Article in journal (Refereed)
    Abstract [en]

    This paper presents a novel approach for performing intuitive gesture based interaction using depth data acquired by Kinect. The main challenge to enable immersive gestural interaction is dynamic gesture recognition. This problem can be formulated as a combination of two tasks; gesture recognition and gesture pose estimation. Incorporation of fast and robust pose estimation method would lessen the burden to a great extent. In this paper we propose a direct method for real-time hand pose estimation. Based on the range images, a new version of optical flow constraint equation is derived, which can be utilized to directly estimate 3D hand motion without any need of imposing other constraints. Extensive experiments illustrate that the proposed approach performs properly in real-time with high accuracy. As a proof of concept, we demonstrate the system performance in 3D object manipulation On two different setups; desktop computing, and mobile platform. This reveals the system capability to accommodate different interaction procedures. In addition, a user study is conducted to evaluate learnability, user experience and interaction quality in 3D gestural interaction in comparison to 2D touchscreen interaction.

  • 22. Li, B.
    et al.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC).
    Söderström, U.
    Distinctive curves features2016In: Electronics Letters, ISSN 0013-5194, E-ISSN 1350-911X, Vol. 52, no 3, 197-U83 p.Article in journal (Refereed)
    Abstract [en]

    Curves and lines are geometrical, abstract features of an image. Whereas interest points are more limited, curves and lines provide much more information of the image structure. However, the research done in curve and line detection is very fragmented. The concept of scale space is not yet fused very well into curve and line detection. Keypoint (e.g. SIFT, SURF, ORB) is a successful concept which represent features (e.g. blob, corner etc.) in scale space. Stimulated by the keypoint concept, a method which extracts distinctive curves (DICU) in scale space, including lines as a special form of curve features is proposed. A curve feature can be represented by three keypoints (two end points, and one middle point). A good way to test the quality of detected curves is to analyse the repeatability under various image transformations. DICU using the standard Oxford benchmark is evaluated. The overlap error is calculated by averaging the overlap error of three keypoints on the curve. Experiment results show that DICU achieves good repeatability comparing with other state-of-the-art methods. To match curve features, a relatively uncomplicated way is to combine local descriptors of three keypoints on each curve.

  • 23. Li, B.
    et al.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Söderström, U.
    Scale-invariant corner keypoints2014Conference paper (Refereed)
    Abstract [en]

    Effective and efficient generation of keypoints from images is the first step of many computer vision applications, such as object matching. The last decade presented us with an arms race toward faster and more robust keypoint detection, feature description and matching. This resulted in several new algorithms, for example Scale Invariant Features Transform (SIFT), Speed-up Robust Feature (SURF), Oriented FAST and Rotated BRIEF (ORB) and Binary Robust Invariant Scalable Keypoints (BRISK). The keypoint detection has been improved using various techniques in most of these algorithms. However, in the search for faster computing, the accuracy of the algorithms is decreasing. In this paper, we present SICK (Scale-Invariant Corner Keypoints), which is a novel method for fast keypoint detection. Our experiment results show that SICK is faster to compute and more robust than recent state-of-the-art methods.

  • 24.
    Li, Haibo
    et al.
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Hedman, Anders
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Harnessing crowds to avert or mitigate acts terrorism: A collective intelligence call for action2017In: Proceedings - 2016 European Intelligence and Security Informatics Conference, EISIC 2016, IEEE, 2017Conference paper (Refereed)
    Abstract [en]

    Averting acts of terrorism through non-traditional means of surveillance and control: the use of crowd sourcing (collective intelligence) and the development of a new class of anti-terror mobile apps. The proposed class of anti-terrorist apps is based on two dimensions: the individual and the central. By individual, we mean the individual app user and by central we mean a central organizational locus of coordination and control in the fight against terrorism. Such a central locus could be a governmental agency or a national/international security organization active in the fight against terrorism.

  • 25.
    Li, Haibo
    et al.
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Hedman, Anders
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Harnessing Crowds to Avert or Mitigate Acts Terrorism: A Collective Intelligence Call for Action2016In: 2016 EUROPEAN INTELLIGENCE AND SECURITY INFORMATICS CONFERENCE (EISIC) / [ed] Brynielsson, J Johansson, F, IEEE , 2016, 203-203 p.Conference paper (Refereed)
    Abstract [en]

    Averting acts of terrorism through non-traditional means of surveillance and control: the use of crowd sourcing (collective intelligence) and the development of a new class of anti-terror mobile apps. The proposed class of anti-terrorist apps is based on two dimensions: the individual and the central. By individual, we mean the individual app user and by central we mean a central organizational locus of coordination and control in the fight against terrorism. Such a central locus could be a governmental agency or a national/international security organization active in the fight against terrorism.

  • 26. Lu, G.
    et al.
    He, J.
    Yan, J.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID. Nanjing University of Posts and Telecommunications.
    Convolutional neural network for facial expression recognition2016In: Journal of Nanjing University of Posts and Telecommunications, ISSN 1673-5439, Vol. 36, no 1, 16-22 p.Article in journal (Refereed)
    Abstract [en]

    To avoid the complex explicit feature extraction process in traditional expression recognition, a convolutional neural network (CNN) for the facial expression recognition is proposed. Firstly, the facial expression image is normalized and the implicit features are extracted by using the trainable convolution kernel. Then, the maximum pooling is used to reduce the dimensions of the extracted implicit features. Finally, the Softmax classifier is used to classify the facial expressions of the test samples. The experiment is carried out on the CK+ facial expression database using the graphics processing unit (GPU). Experimental results show the performance and the generalization ability of the CNN for facial expression recognition.

  • 27. Lu, Z.
    et al.
    Réhman, S.
    Khan, M. S. L.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID. KTH.
    Anaglyph 3D Stereoscopic Visualization of 2D Video based on Fundamental Matrix2013In: Proceedings - 2013 International Conference on Virtual Reality and Visualization, ICVRV 2013, IEEE , 2013, 305-308 p.Conference paper (Refereed)
    Abstract [en]

    In this paper, we propose a simple Anaglyph 3D stereo generation algorithm from 2D video sequence with monocular camera. In our novel approach we employ camera pose estimation method to directly generate stereoscopic 3D from 2D video without building depth map explicitly. Our cost effective method is suitable for arbitrary real-world video sequence and produces smooth results. We use image stitching based on plane correspondence using fundamental matrix. To this end we also demonstrate that correspondence plane image stitching based on Homography matrix only cannot generate better result. Furthermore, we utilize the structure from motion (with fundamental matrix) based reconstructed camera pose model to accomplish visual anaglyph 3D illusion. The proposed approach demonstrates a very good performance for most of the video sequences.

  • 28. Lv, Z.
    et al.
    Feng, L.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Feng, S.
    Hand-free motion interaction on google glass2014In: SIGGRAPH Asia 2014 Mobile Graphics and Interactive Applications, SA 2014, 2014Conference paper (Refereed)
    Abstract [en]

    There is an increasing interest in creating wearable device interaction technologies. Novel emerging user interface technologies (e.g. eye-ball tracking, speech recognition, gesture recognition, ECG, EEG and fusion of them) have the potential to significantly affect market share in PC, smartphones, tablets and latest wearable devices such as google glass. As a result, displacing these technologies in devices such as smart phones and wearable devices is challenging. Google glass has many impressive characteristics (i.e. voice actions, head wake up, wink detection), which are human-glass interface (HGI) technologies. Google glass won't meet the 'the occlusion problem' and 'the fat finger problem' any more, which are the problems of direct-touch finger input on touch screen. However, google glass only provides a touchpad that includes haptics with simple 'tapping and sliding your finger' gestures which is a one-dimensional interaction in fact, instead of the traditional two-dimensional interaction based on the complete touch screen of smartphone. The one-dimensional 'swipe the touchpad' interaction with a row of 'Cards' which replace traditional two-dimensional icon menu limits the intuitive and flexibility of HGI. Therefore, there is a growing interest in implementing 3D gesture recognition vision systems in which optical sensors capture real-time video of the user and ubiquitous algorithms are then used to determine what the user's gestures are, without the user having to hold any device. We will demonstrate a hand-free motion interaction application based on computer vision technology on google glass. Presented application allows user to perform touch-less interaction by hand or foot gesture in front of the camera of google glass. Based on the same core ubiquitous gestures recognition algorithm as used in this demonstration, a hybrid wearable smartphone system based on mixed hardware and software has been presented in our previous work [Lv 2013][Lu et al. 2013][Lv et al. 2013], which can support either hand or foot interaction with today' smartphone.

  • 29. Lv, Z.
    et al.
    Feng, S.
    Lal Khan, M. S.
    Ur Réhman, S.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Foot motion sensing: Augmented game interface based on foot interaction for smartphone2014In: Conference on Human Factors in Computing Systems - Proceedings, 2014, 293-296 p.Conference paper (Refereed)
    Abstract [en]

    We designed and developmented two games: real-time augmented football game and augmented foot piano game to demonstrate a innovative interface based on foot motion sensing approach for smart phone. In the proposed novel interface, the computer vision based hybrid detection and tracking method provides a core support for foot interaction interface by accurately tracking the shoes. Based on the proposed interaction interface, wo demonstrations are developed, the applications employ augmented reality technology to render the game graphics and game status information on smart phones screen. The players interact with the game using foot interaction toward the rear camera, which triggers the interaction event. This interface supports basic foot motion sensing (i.e. direction of movement, velocity, rhythm).

  • 30. Lv, Z.
    et al.
    Halawani, A.
    Khan, M. S. L.
    Réhman, S. U.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Finger in air: Touch-less interaction on smartphone2013In: Proceedings of the 12th International Conference on Mobile and Ubiquitous Multimedia, MUM 2013, Association for Computing Machinery (ACM), 2013Conference paper (Refereed)
    Abstract [en]

    In this paper we present a vision based intuitive interaction method for smart mobile devices. It is based on markerless finger gesture detection which attempts to provide a 'natural user interface'. There is no additional hardware necessary for real-time finger gesture estimation. To evaluate the strengths and effectiveness of proposed method, we design two smart phone applications, namely circle menu application - provides user with graphics and smart phone's status information, and bouncing ball game- a finger gesture based bouncing ball application. The users interact with these applications using finger gestures through the smart phone's camera view, which trigger the interaction event and generate activity sequences for interactive buffers. Our preliminary user study evaluation demonstrates effectiveness and the social acceptability of proposed interaction approach.

  • 31. Lv, Zhihan
    et al.
    Feng, Liangbing
    Feng, Shengzhong
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Extending Touch-less Interaction on Vision Based Wearable Device2015In: 2015 IEEE VIRTUAL REALITY CONFERENCE (VR), IEEE conference proceedings, 2015, 231-232 p.Conference paper (Refereed)
    Abstract [en]

    A touch-less interaction technology on vision based wearable device is designed and evaluated. Users interact with the application with dynamic hands/feet gestures in front of the camera. Several proof-of-concept prototypes with eleven dynamic gestures are developed based on the touch-less interaction. At last, a comparing user study evaluation is proposed to demonstrate the usability of the touch-less approach, as well as the impact on user's emotion, running on a wearable framework or Google Glass.

  • 32. Lv, Zhihan
    et al.
    Halawani, Alaa
    Feng, Shengzhong
    ur Rehman, Shafiq
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Touch-less interactive augmented reality game on vision-based wearable device2015In: Personal and Ubiquitous Computing, ISSN 1617-4909, E-ISSN 1617-4917, Vol. 19, no 3-4, 551-567 p.Article in journal (Refereed)
    Abstract [en]

    There is an increasing interest in creating pervasive games based on emerging interaction technologies. In order to develop touch-less, interactive and augmented reality games on vision-based wearable device, a touch-less motion interaction technology is designed and evaluated in this work. Users interact with the augmented reality games with dynamic hands/feet gestures in front of the camera, which triggers the interaction event to interact with the virtual object in the scene. Three primitive augmented reality games with eleven dynamic gestures are developed based on the proposed touch-less interaction technology as proof. At last, a comparing evaluation is proposed to demonstrate the social acceptability and usability of the touch-less approach, running on a hybrid wearable framework or with Google Glass, as well as workload assessment, user's emotions and satisfaction.

  • 33. Lv, Zhihan
    et al.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Imagining In-Air Interaction for Hemiplegia Sufferer2015In: 2015 INTERNATIONAL CONFERENCE ON VIRTUAL REHABILITATION PROCEEDINGS (ICVR), 2015, 149-150 p.Conference paper (Refereed)
    Abstract [en]

    In this paper, we described the imagination scenarios of a touch-less interaction technology for hemiplegia, which can support either hand or foot interaction with the smartphone or head mounted device (HMD). The computer vision interaction technology is implemented in our previous work, which provides a core support for gesture interaction by accurately detecting and tracking the hand or foot gesture. The patients interact with the application using hand/foot gesture motion in the camera view.

  • 34. Lv, Zjhan
    et al.
    Halawani, Alaa
    Feng, Shengzhong
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Réhman, S.U
    Multimodal Hand and Foot Gesture Interaction for Handheld Devices2014In: ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP), ISSN 1551-6857, E-ISSN 1551-6865, Vol. 11, 10Article in journal (Refereed)
    Abstract [en]

    We present a hand-and-foot-based multimodal interaction approach for handheld devices. Our method combines input modalities (i.e., hand and foot) and provides a coordinated output to both modalities along with audio and video. Human foot gesture is detected and tracked using contour-based template detection (CTD) and Tracking-Learning-Detection (TLD) algorithm. 3D foot pose is estimated from passive homography matrix of the camera. 3D stereoscopic and vibrotactile are used to enhance the immersive feeling. We developed a multimodal football game based on the multimodal approach as a proof-of-concept. We confirm our systems user satisfaction through a user study.

  • 35. Réhman Ur, S.
    et al.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Using vibrotactile language for multimodal human animals communication and interaction2014In: ACM International Conference Proceeding Series, ACM Digital Library, 2014Conference paper (Refereed)
    Abstract [en]

    In this work we aim to facilitate computer mediated multimodal communication and interaction between human and animal based on vibrotactile stimuli. To study and influence the behavior of animals, usually researchers use 2D/3D visual stimuli. However we use vibrotactile pattern based language which provides the opportunity to communicate and interact with animals. We have performed experiment with a vibrotactile based human-animal multimodal communication system to study the effectiveness of vibratory stimuli applied to the animal skin along with audio and visual stimuli. The preliminary results are encouraging and indicate that low-resolution tactual displays are effective in transmitting information.

  • 36. Shao, W. -Z
    et al.
    Li, H. -B
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Elad, M.
    Bi-l0-l2-norm regularization for blind motion deblurring2015In: Journal of Visual Communication and Image Representation, ISSN 1047-3203, E-ISSN 1095-9076, Vol. 33, 42-59 p.Article in journal (Refereed)
    Abstract [en]

    In blind motion deblurring, leading methods today tend towards highly non-convex approximations of the l<inf>0</inf>-norm, especially in the image regularization term. In this paper, we propose a simple, effective and fast approach for the estimation of the motion blur-kernel, through a bi-l<inf>0</inf>-l<inf>2</inf>-norm regularization imposed on both the intermediate sharp image and the blur-kernel. Compared with existing methods, the proposed regularization is shown to be more effective and robust, leading to a more accurate motion blur-kernel and a better final restored image. A fast numerical scheme is deployed for alternatingly computing the sharp image and the blur-kernel, by coupling the operator splitting and augmented Lagrangian methods. Experimental results on both a benchmark image dataset and real-world motion blurred images show that the proposed approach is highly competitive with state-of-the-art methods in both deblurring effectiveness and computational efficiency.

  • 37. Shao, Wen-Ze
    et al.
    Ge, Qi
    Deng, Hai-Song
    Wei, Zhi-Hui
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Motion Deblurring Using Non-stationary Image Modeling2015In: Journal of Mathematical Imaging and Vision, ISSN 0924-9907, E-ISSN 1573-7683, Vol. 52, no 2, 234-248 p.Article in journal (Refereed)
    Abstract [en]

    It is well-known that shaken cameras or mobile phones during exposure usually lead to motion blurry photographs. Therefore, camera shake deblurring or motion deblurring is required and requested in many practical scenarios. The contribution of this paper is the proposal of a simple yet effective approach for motion blur kernel estimation, i.e., blind motion deblurring. Though there have been proposed severalmethods formotion blur kernel estimation in the literature, we impose a type of non-stationary Gaussian prior on the gradient fields of sharp images, in order to automatically detect and purse the salient edges of images as the important clues to blur kernel estimation. On one hand, the prior is able to promote sparsity inherited in the non-stationarity of the precision parameters (inverse of variances). On the other hand, since the prior is in a Gaussian form, there exists a great possibility of deducing a conceptually simple and computationally tractable inference scheme. Specifically, the well-known expectation-maximization algorithm is used to alternatingly estimate the motion blur kernels, the salient edges of images as well as the precision parameters in the image prior. In difference from many existing methods, no hyperpriors are imposed on any parameters in this paper; there are not any pre-processing steps involved in the proposed method, either, such as explicit suppression of random noise or prediction of salient edge structures. With estimated motion blur kernels, the deblurred images are finally generated using an off-the-shelf non-blind deconvolution method proposed by Krishnan and Fergus (Adv Neural Inf Process Syst 22:1033-1041, 2009). The rationality and effectiveness of our proposed method have been well demonstrated by the experimental results on both synthetic and realistic motion blurry images, showing state-of-the-art blind motion deblurring performance of the proposed approach in the term of quantitative metric as well as visual perception.

  • 38. Shao, Wen-Ze
    et al.
    Ge, Qi
    Gan, Zong-Liang
    Deng, Hai-Song
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    A Generalized Robust Minimization Framework for Low-Rank Matrix Recovery2014In: Mathematical problems in engineering (Print), ISSN 1024-123X, E-ISSN 1563-5147, 656074- p.Article in journal (Refereed)
    Abstract [en]

    This paper considers the problem of recovering low-rank matrices which are heavily corrupted by outliers or large errors. To improve the robustness of existing recovery methods, the problem is solved by formulating it as a generalized nonsmooth nonconvex minimization functional via exploiting the Schatten p-norm (0 < p <= 1) and L-q(0 <q <= 1) seminorm. Two numerical algorithms are provided based on the augmented Lagrange multiplier (ALM) and accelerated proximal gradient (APG) methods as well as efficient root-finder strategies. Experimental results demonstrate that the proposed generalized approach is more inclusive and effective compared with state-of-the-art methods, either convex or nonconvex.

  • 39. Ur Rehman, S.
    et al.
    Khan, M. S. L.
    Li, L.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Vibrotactile TV for immersive experience2014In: 2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014, IEEE conference proceedings, 2014Conference paper (Refereed)
    Abstract [en]

    Audio and video are two powerful media forms to shorten the distance between audience/viewer and actors or players in the TV and films. The recent research shows that today people are using more and more multimedia contents on mobile devices, such as tablets and smartphones. Therefore, an important question emerges - how can we render high-quality, personal immersive experiences to consumers on these systems? To give audience an immersive engagement that differs from 'watching a play', we have designed a study to render complete immersive media which include the 'emotional information' based on augmented vibrotactile-coding on the back of the user along with audio-video signal. The reported emotional responses to videos viewed with and without haptic enhancement, show that participants exhibited an increased emotional response to media with haptic enhancement. Overall, these studies suggest that the effectiveness of our approach and using a multisensory approach increase immersion and user satisfaction.

  • 40. Wu, Jinsong
    et al.
    Bisio, Igor
    Gniady, Chris
    Hossain, Ekram
    Valla, Massimo
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Context-aware networking and communications: Part 12014In: IEEE Communications Magazine, ISSN 0163-6804, E-ISSN 1558-1896, Vol. 52, no 6, 14-15 p.Article in journal (Refereed)
  • 41. Wu, Jinsong
    et al.
    Bisio, Igor
    Gniady, Chris
    Hossain, Ekram
    Valla, Massimo
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Context-aware networking and communications: Part 22014In: IEEE Communications Magazine, ISSN 0163-6804, E-ISSN 1558-1896, Vol. 52, no 8, 64-65 p.Article in journal (Refereed)
  • 42. Yan, Jingjie
    et al.
    Zheng, Wenming
    Xu, Qinyu
    Lu, Guanming
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID. Nanjing Univ Posts & Telecommun, Peoples R China.
    Wang, Bei
    Sparse Kernel Reduced-Rank Regression for Bimodal Emotion Recognition From Facial Expression and Speech2016In: IEEE transactions on multimedia, ISSN 1520-9210, E-ISSN 1941-0077, Vol. 18, no 7, 1319-1329 p.Article in journal (Refereed)
    Abstract [en]

    A novel bimodal emotion recognition approach from facial expression and speech based on the sparse kernel reduced-rank regression (SKRRR) fusion method is proposed in this paper. In this method, we use the openSMILE feature extractor and the scale invariant feature transform feature descriptor to respectively extract effective features from speech modality and facial expression modality, and then propose the SKRRR fusion approach to fuse the emotion features of two modalities. The proposed SKRRR method is a nonlinear extension of the traditional reduced-rank regression (RRR), where both predictor and response feature vectors in RRR are kernelized by being mapped onto two high-dimensional feature space via two nonlinear mappings, respectively. To solve the SKRRR problem, we propose a sparse representation (SR)-based approach to find the optimal solution of the coefficient matrices of SKRRR, where the introduction of the SR technique aims to fully consider the different contributions of training data samples to the derivation of optimal solution of SKRRR. Finally, we utilize the eNTERFACE '05 and AFEW4.0 bimodal emotion database to conduct the experiments of monomodal emotion recognition and bimodal emotion recognition, and the results indicate that our presented approach acquires the highest or comparable bimodal emotion recognition rate among some state-of-the-art approaches.

  • 43.
    Yousefi, Shahrouz
    et al.
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Abedan Kondori, Farid
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID. KTH.
    Bare-hand Gesture Recognition and Tracking through the Large-scale Image Retrieval2014In: VISAPP 2014 - Proceedings of the 9th International Conference on Computer Vision Theory and Applications, 2014Conference paper (Refereed)
  • 44.
    Yousefi, Shahrouz
    et al.
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Abedan Kondori, Farid
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC).
    Interactive 3D Visualization on a 4K Wall-Sized Display2014Conference paper (Refereed)
    Abstract [en]

    This paper introduces a novel vision-based approach for realistic interaction between user and display's content. An extremely accurate motion capture system is proposed to measure and track the user's head motion in 3D space. Video frames captured by the low-cost head-mounted camera are processed to retrieve the 3D motion parameters. The retrieved information facilitates the real-time 3D interaction. This technology turns any 2D screen to interactive 3D display, enabling users to control and manipulate the content as a digital window. The proposed system is tested and verified on a huge wall-sized 4K screen.

  • 45.
    Yousefi, Shahrouz
    et al.
    Umeå University, Sweden.
    Abedan Kondori, Farid
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Tracking Fingers in 3D Space for Mobile Interaction2010Conference paper (Refereed)
  • 46.
    Yousefi, Shahrouz
    et al.
    Umeå University, Faculty of Science and Technology, Department of Applied Physics and Electronics.
    Kondori, Farid Abedan
    Umeå University, Faculty of Science and Technology, Department of Applied Physics and Electronics.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID. Umeå University, Faculty of Science and Technology, Department of Applied Physics and Electronics.
    3D Visualization of Single Images using Patch Level Depth2011In: SIGMAP 2011, 2011, 61-66 p.Conference paper (Refereed)
    Abstract [en]

    In this paper we consider the task of 3D photo visualization using a single monocular image. The main idea is to use single photos taken by capturing devices such as ordinary cameras, mobile phones, tablet PCs etc. and visualize them in 3D on normal displays. Supervised learning approach is hired to retrieve depth information from single images. This algorithm is based on the hierarchical multi-scale Markov Random Field (MRF) which models the depth based on the multi-scale global and local features and relation between them in a monocular image. Consequently, the estimated depth image is used to allocate the specified depth parameters for each pixel in the 3D map. Accordingly, the multi-level depth adjustments and coding for color anaglyphs is performed. Our system receives a single 2D image as input and provides a anaglyph coded 3D image in output. Depending on the coding technology the special low-cost anaglyph glasses for viewers will be used.

  • 47.
    Yousefi, Shahrouz
    et al.
    Umeå University, Sweden.
    Kondori, Farid Abedan
    Li, Haibo
    Umeå University, Sweden.
    Camera-based gesture tracking for 3D interaction behind mobile devices2012In: International journal of pattern recognition and artificial intelligence, ISSN 0218-0014, Vol. 26, no 8, 1260008Article in journal (Refereed)
    Abstract [en]

    Number of mobile devices such as Smartphones or Tablet PCs has been dramatically increased over the recent years. New mobile devices are equipped with integrated cameras and large displays that make the interaction with the device easier and more efficient. Although most of the previous works on interaction between humans and mobile devices are based on 2D touch-screen displays, camera-based interaction opens a new way to manipulate in 3D space behind the device in the camera's field of view. In this paper, our gestural interaction relies on particular patterns from local orientation of the image called rotational symmetries. This approach is based on finding the most suitable pattern from a large set of rotational symmetries of diffrerent orders that ensures a reliable detector for fingertips and user's gesture. Consequently, gesture detection and tracking can be used as an efficient tool for 3D manipulation in various virtual/augmented reality applications.

  • 48.
    Yousefi, Shahrouz
    et al.
    Digital Media Lab., Department of Applied Physics and Electronics, Umeå University.
    Kondori, Farid Abedan
    Digital Media Lab., Department of Applied Physics and Electronics, Umeå University.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC). Digital Media Lab., Department of Applied Physics and Electronics, Umeå University.
    Experiencing real 3D gestural interaction with mobile devices2013In: Pattern Recognition Letters, ISSN 0167-8655, E-ISSN 1872-7344, Vol. 34, no 8, 912-921 p.Article in journal (Refereed)
    Abstract [en]

    Number of mobile devices such as smart phones or Tablet PCs has been dramatically increased over the recent years. New mobile devices are equipped with integrated cameras and large displays which make the interaction with the device more efficient. Although most of the previous works on interaction between humans and mobile devices are based on 2D touch-screen displays, camera-based interaction opens a new way to manipulate in 3D space behind the device, in the camera's field of view. In this paper, our gestural interaction heavily relies on particular patterns from local orientation of the image called Rotational Symmetries. This approach is based on finding the most suitable pattern from a large set of rotational symmetries of different orders that ensures a reliable detector for hand gesture. Consequently, gesture detection and tracking can be hired as an efficient tool for 3D manipulation in various applications in computer vision and augmented reality. The final output will be rendered into color anaglyphs for 3D visualization. Depending on the coding technology, different low cost 3D glasses can be used for the viewers.

  • 49.
    Yousefi, Shahrouz
    et al.
    Umeå University, Faculty of Science and Technology, Department of Applied Physics and Electronics.
    Kondori, Farid Abedan
    Umeå University, Faculty of Science and Technology, Department of Applied Physics and Electronics.
    Li, Haibo
    Umeå University, Faculty of Science and Technology, Department of Applied Physics and Electronics.
    Robust correction of 3D geo-metadata in photo collections by forming a photo grid2011In: Wireless Communications and Signal Processing (WCSP), 2011 International Conference on, IEEE , 2011, 1-5 p.Conference paper (Refereed)
    Abstract [en]

    In this work, we present a technique for efficient and robust estimation of the exact location and orientation of a photo capture device in a large data set. The provided data set includes a set of photos and the associated information from GPS and orientation sensor. This attached metadata is noisy and lacks precision. Our strategy to correct this uncertain data is based on the data fusion between measurement model, derived from sensor data, and signal model given by the computer vision algorithms. Based on the retrieved information from multiple views of a scene we make a grid of images. Our robust feature detection and matching between images result in finding a reliable transformation. Consequently, relative location and orientation of the data set construct the signal model. On the other hand, information extracted from the single images combined with the measurement data make the measurement model. Finally, Kalman filter is used to fuse these two models iteratively and enhance the estimation of the ground truth(GT) location and orientation. Practically, this approach can help us to design a photo browsing system from a huge collection of photos, enabling 3D navigation and exploration of our huge data set.

  • 50.
    Yousefi, Shahrouz
    et al.
    Umeå University, Sweden.
    Kondori, Farid Abedan
    Umeå University, Sweden.
    Li, Haibo
    Umeå University, Sweden.
    Stereoscopic visualization of monocular images in photo collections2011In: Wireless Communications and Signal Processing (WCSP), 2011 International Conference on, IEEE , 2011, 1-5 p.Conference paper (Refereed)
    Abstract [en]

    In this paper we propose a novel approach for 3D video/photo visualization using an ordinary digital camera. The idea is to turn any 2D camera into 3D based on the data derived from a collection of captured photos or a recorded video. For a given monocular input, the retrieved information from the overlapping photos can be used to provide required information for performing 3D output. Robust feature detection and matching between images is hired to find the transformation between overlapping frames. The transformation matrix will map images to the same horizontal baseline. Afterwards, the projected images will be adjusted to the stereoscopic model. Finally, stereo views will be coded into 3D channels for visualization. This approach enables us making 3D output using randomly taken photos of a scene or a recorded video. Our system receives 2D monocular input and provides double layer coded 3D output. Depending on the coding technology different low cost 3D glasses will be used for viewers.

12 1 - 50 of 66
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf