Change search
Refine search result
12 1 - 50 of 82
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1. Abedan Kondori, Farid
    et al.
    Yousefi, Shahrouz
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Direct Head Pose Estimation Using Kinect-type Sensors2014In: Electronics Letters, ISSN 0013-5194, E-ISSN 1350-911XArticle in journal (Refereed)
  • 2. Abedan Kondori, Farid
    et al.
    Yousefi, Shahrouz
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Liu, Li
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID. Nanjing University of Posts and Telecommunications, Nanjing, China .
    Head Operated Electric Wheelchair2014In: Proceedings of the IEEE Southwest Symposium on Image Analysis and Interpretation, 2014, p. 53-56Conference paper (Refereed)
    Abstract [en]

    Currently, the most common way to control an electric wheelchair is to use joystick. However, there are some individuals unable to operate joystick-driven electric wheelchairs due to sever physical disabilities, like quadriplegia patients. This paper proposes a novel head pose estimation method to assist such patients. Head motion parameters are employed to control and drive an electric wheelchair. We introduce a direct method for estimating user head motion, based on a sequence of range images captured by Kinect. In this work, we derive new version of the optical flow constraint equation for range images. We show how the new equation can be used to estimate head motion directly. Experimental results reveal that the proposed system works with high accuracy in real-time. We also show simulation results for navigating the electric wheelchair by recovering user head motion.

  • 3. Cheng, X.
    et al.
    Yang, B.
    Olofsson, T.
    Liu, G.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    A pilot study of online non-invasive measuring technology based on video magnification to determine skin temperature2017In: Building and Environment, ISSN 0360-1323, E-ISSN 1873-684X, Vol. 121, p. 1-10Article in journal (Refereed)
    Abstract [en]

    Much attention was paid on human centered design strategies for environmental control systems of indoor built environments. The goal is to achieve thermally comfortable, healthy and safe working or living environments in energy efficient manners. Normally building Heating, Ventilation and Air Conditioning (HVAC) systems have fixed operating settings, which can't satisfy human thermal comfort requirements under transient and non-uniform indoor thermal environments. Therefore, human thermal physiology signal such as skin temperature, which can reflect human body thermal sensation, has to be measured over time. Several trials have been performed by minimizing measuring sensors such as i-Button and mounting measuring sensors into wearable devices such as glasses. Infrared thermography technology has also been tried to achieve non-invasive measurements. However, it would be much more convenient and feasible if normal computer camera could record images, which could be used to obtain human thermal physiology signals. In this study, skin temperature of hand back, which has a high density of blood vessels and is normally not covered by clothing, was measured by i-button sensors. Images recorded by normal camera were amplified to analyzing skin temperature variation, which are impossible to see with naked eyes. The agreement between i-button sensor measuring results and image magnification results demonstrated the possibility of non-invasive measuring technology by image magnification. Partly personalized saturation-temperature model (T = 96.5 × S + bi) can be used to predict skin temperatures for young East Asia females.

  • 4.
    Cheng, Xiaogang
    et al.
    KTH, School of Computer Science and Communication (CSC). Nanjing University of Posts and Telecommunications, Nanjing, China.
    Yang, B.
    Liu, G.
    Olofsson, T.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC).
    A variational approach to atmospheric visibility estimation in the weather of fog and haze2018In: Sustainable cities and society, ISSN 2210-6707, Vol. 39, p. 215-224Article in journal (Refereed)
    Abstract [en]

    Real-time atmospheric visibility estimation in foggy and hazy weather plays a crucial role in ensuring traffic safety. Overcoming the inherent drawbacks with traditional optical estimation methods, like limited sampling volume and high cost, vision-based approaches have received much more attention in recent research on atmospheric visibility estimation. Based on the classical Koschmieder's formula, atmospheric visibility estimation is carried out by extracting an inherent extinction coefficient. In this paper we present a variational framework to handle the nature of time-varying extinction coefficient and develop a novel algorithm of extracting the extinction coefficient through a piecewise functional fitting of observed luminance curves. The developed algorithm is validated and evaluated with a big database of road traffic video from Tongqi expressway (in China). The test results are very encouraging and show that the proposed algorithm could achieve an estimation error rate of 10%. More significantly, it is the first time that the effectiveness of Koschmieder's formula in atmospheric visibility estimation was validated with a big dataset, which contains more than two million luminance curves extracted from real-world traffic video surveillance data.

  • 5.
    Cheng, Xiaogang
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS). Nanjing Univ Posts & Telecommun, Coll Telecommun & Informat Engn, Nanjing 210003, Jiangsu, Peoples R China.
    Yang, Bin
    Hedman, Anders
    KTH, School of Electrical Engineering and Computer Science (EECS), Media Technology and Interaction Design, MID.
    Olofsson, Thomas
    Li, Haibo
    KTH, School of Electrical Engineering and Computer Science (EECS), Media Technology and Interaction Design, MID.
    Van Gool, Luc
    NIDL: A pilot study of contactless measurement of skin temperature for intelligent building2019In: Energy and Buildings, ISSN 0378-7788, E-ISSN 1872-6178, Vol. 198, p. 340-352Article in journal (Refereed)
    Abstract [en]

    Human thermal comfort measurement plays a critical role in giving feedback signals for building energy efficiency. A contactless measuring method based on subtleness magnification and deep learning (NIDL) was designed to achieve a comfortable, energy efficient built environment. The method relies on skin feature data, e.g., subtle motion and texture variation, and a 315-layer deep neural network for constructing the relationship between skin features and skin temperature. A physiological experiment was conducted for collecting feature data (1.44 million) and algorithm validation. The contactless measurement algorithm based on a partly-personalized saturation temperature model (NIPST) was used for algorithm performance comparisons. The results show that the mean error and median error of the NIDL are 0.476 degrees C and 0.343 degrees C which is equivalent to accuracy improvements of 39.07% and 38.76%, respectively.

  • 6.
    Cheng, Xiaogang
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS). Nanjing Univ Posts & Telecommun, Coll Telecommun & Informat Engn, Nanjing 210003, Jiangsu, Peoples R China.;Umea Univ, Dept Appl Phys & Elect, S-90187 Umea, Sweden.
    Yang, Bin
    Umea Univ, Dept Appl Phys & Elect, S-90187 Umea, Sweden.;Xian Univ Architecture & Technol, Sch Environm & Municipal Engn, Xian 710055, Shaanxi, Peoples R China..
    Liu, Guoqing
    Nanjing Tech Univ, Sch Phys & Math Sci, Nanjing 211800, Jiangsu, Peoples R China..
    Olofsson, Thomas
    Umea Univ, Dept Appl Phys & Elect, S-90187 Umea, Sweden..
    Li, Haibo
    KTH, School of Electrical Engineering and Computer Science (EECS). Nanjing Univ Posts & Telecommun, Coll Telecommun & Informat Engn, Nanjing 210003, Jiangsu, Peoples R China..
    A Total Bounded Variation Approach to Low Visibility Estimation on Expressways2018In: Sensors, ISSN 1424-8220, E-ISSN 1424-8220, Vol. 18, no 2, article id 392Article in journal (Refereed)
    Abstract [en]

    Low visibility on expressways caused by heavy fog and haze is a main reason for traffic accidents. Real-time estimation of atmospheric visibility is an effective way to reduce traffic accident rates. With the development of computer technology, estimating atmospheric visibility via computer vision becomes a research focus. However, the estimation accuracy should be enhanced since fog and haze are complex and time-varying. In this paper, a total bounded variation (TBV) approach to estimate low visibility (less than 300 m) is introduced. Surveillance images of fog and haze are processed as blurred images (pse udo-blurred images), while the surveillance images at selected road points on sunny days are handled as clear images, when considering fog and haze as noise superimposed on the clear images. By combining image spectrum and TBV, the features of foggy and hazy images can be extracted. The extraction results are compared with features of images on sunny days. Firstly, the low visibility surveillance images can be filtered out according to spectrum features of foggy and hazy images. For foggy and hazy images with visibility less than 300 m, the high-frequency coefficient ratio of Fourier (discrete cosine) transform is less than 20%, while the low-frequency coefficient ratio is between 100% and 120%. Secondly, the relationship between TBV and real visibility is established based on machine learning and piecewise stationary time series analysis. The established piecewise function can be used for visibility estimation. Finally, the visibility estimation approach proposed is validated based on real surveillance video data. The validation results are compared with the results of image contrast model. Besides, the big video data are collected from the Tongqi expressway, Jiangsu, China. A total of 1,782,000 frames were used and the relative errors of the approach proposed are less than 10%.

  • 7.
    Cheng, Xiaogang
    et al.
    Nanjing Univ Posts & Telecommun, Coll Telecommun & Informat Engn, Nanjing 210003, Jiangsu, Peoples R China.;Swiss Fed Inst Technol, Comp Vis Lab, CH-8092 Zurich, Switzerland..
    Yang, Bin
    Xian Univ Architecture & Technol, Sch Bldg Serv Sci & Engn, Xian 710055, Shaanxi, Peoples R China.;Umea Univ, Dept Appl Phys & Elect, S-90187 Umea, Sweden..
    Tan, Kaige
    KTH.
    Isaksson, Erik
    KTH, School of Electrical Engineering and Computer Science (EECS), Media Technology and Interaction Design, MID.
    Li, Liren
    Nanjing Tech Univ, Sch Comp Sci & Technol, Nanjing 211816, Jiangsu, Peoples R China..
    Hedman, Anders
    KTH, School of Electrical Engineering and Computer Science (EECS), Media Technology and Interaction Design, MID.
    Olofsson, Thomas
    Umea Univ, Dept Appl Phys & Elect, S-90187 Umea, Sweden..
    Li, Haibo
    KTH, School of Electrical Engineering and Computer Science (EECS), Media Technology and Interaction Design, MID. Nanjing Univ Posts & Telecommun, Coll Telecommun & Informat Engn, Nanjing 210003, Jiangsu, Peoples R China.
    A Contactless Measuring Method of Skin Temperature based on the Skin Sensitivity Index and Deep Learning2019In: Applied Sciences, E-ISSN 2076-3417, Vol. 9, no 7, article id 1375Article in journal (Refereed)
    Abstract [en]

    Featured Application The NISDL method proposed in this paper can be used for real time contactless measuring of human skin temperature, which reflects human body thermal comfort status and can be used for control HVAC devices. Abstract In human-centered intelligent building, real-time measurements of human thermal comfort play critical roles and supply feedback control signals for building heating, ventilation, and air conditioning (HVAC) systems. Due to the challenges of intra- and inter-individual differences and skin subtleness variations, there has not been any satisfactory solution for thermal comfort measurements until now. In this paper, a contactless measuring method based on a skin sensitivity index and deep learning (NISDL) was proposed to measure real-time skin temperature. A new evaluating index, named the skin sensitivity index (SSI), was defined to overcome individual differences and skin subtleness variations. To illustrate the effectiveness of SSI proposed, a two multi-layers deep learning framework (NISDL method I and II) was designed and the DenseNet201 was used for extracting features from skin images. The partly personal saturation temperature (NIPST) algorithm was use for algorithm comparisons. Another deep learning algorithm without SSI (DL) was also generated for algorithm comparisons. Finally, a total of 1.44 million image data was used for algorithm validation. The results show that 55.62% and 52.25% error values (NISDL method I, II) are scattered at (0 degrees C, 0.25 degrees C), and the same error intervals distribution of NIPST is 35.39%.

  • 8. Darvish, A. M.
    et al.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Söderström, U.
    Super-resolution facial images from single input images based on discrete wavelet transform2014In: Proceedings - International Conference on Pattern Recognition, 2014, p. 843-848Conference paper (Refereed)
    Abstract [en]

    In this work, we are presenting a technique that allows for accurate estimation of frequencies in higher dimensions than the original image content. This technique uses asymmetrical Principal Component Analysis together with Discrete Wavelet Transform (aPCA-DWT). For example, high quality content can be generated from low quality cameras since the necessary frequencies can be estimated through reliable methods. Within our research, we build models for interpreting facial images where super-resolution versions of human faces can be created. We have worked on several different experiments, extracting the frequency content in order to create models with aPCA-DWT. The results are presented along with experiments of deblurring and zooming beyond the original image resolution. For example, when an image is enlarged 16 times in decoding, the proposed technique outperforms interpolation with more than 7 dB on average.

  • 9. Ge, Q.
    et al.
    Shen, F.
    Jing, X. -Y
    Wu, F.
    Xie, S. -P
    Yue, D.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Active contour evolved by joint probability classification on Riemannian manifold2016In: Signal, Image and Video Processing, ISSN 1863-1703, E-ISSN 1863-1711, Vol. 10, no 7, p. 1257-1264Article in journal (Refereed)
    Abstract [en]

    In this paper, we present an active contour model for image segmentation based on a nonparametric distribution metric without any intensity a priori of the image. A novel nonparametric distance metric, which is called joint probability classification, is established to drive the active contour avoiding the instability induced by multimodal intensity distribution. Considering an image as a Riemannian manifold with spatial and intensity information, the contour evolution is performed on the image manifold by embedding geometric image feature into the active contour model. The experimental results on medical and texture images demonstrate the advantages of the proposed method.

  • 10. Ge, Qi
    et al.
    Jing, Xiao-Yuan
    Wu, Fei
    Wei, Zhi-Hui
    Xiao, Liang
    Shao, Wen-Ze
    Yue, Dong
    Li, Hai-Bo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Structure-Based Low-Rank Model With Graph Nuclear Norm Regularization for Noise Removal2017In: IEEE Transactions on Image Processing, ISSN 1057-7149, E-ISSN 1941-0042, Vol. 26, no 7, p. 3098-3112Article in journal (Refereed)
    Abstract [en]

    Nonlocal image representation methods, including group-based sparse coding and block-matching 3-D filtering, have shown their great performance in application to low-level tasks. The nonlocal prior is extracted from each group consisting of patches with similar intensities. Grouping patches based on intensity similarity, however, gives rise to disturbance and inaccuracy in estimation of the true images. To address this problem, we propose a structure-based low-rank model with graph nuclear norm regularization. We exploit the local manifold structure inside a patch and group the patches by the distance metric of manifold structure. With the manifold structure information, a graph nuclear norm regularization is established and incorporated into a low-rank approximation model. We then prove that the graph-based regularization is equivalent to a weighted nuclear norm and the proposed model can be solved by a weighted singular-value thresholding algorithm. Extensive experiments on additive white Gaussian noise removal and mixed noise removal demonstrate that the proposed method achieves a better performance than several state-of-the-art algorithms.

  • 11. Ge, Qi
    et al.
    Jing, Xiao-Yuan
    Wu, Fei
    Yan, Jingjie
    Li, Hai-Bo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Unsupervised Joint Image Denoising and Active Contour Segmentation in Multidimensional Feature Space2016In: Mathematical problems in engineering (Print), ISSN 1024-123X, E-ISSN 1563-5147, article id 3909645Article in journal (Refereed)
  • 12. Halawani, A.
    et al.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID. KTH.
    FingerInk: Turn your glass into a digital board2013In: Proceedings of the 25th Australian Computer-Human Interaction Conference: Augmentation, Application, Innovation, Collaboration, OzCHI 2013, Association for Computing Machinery (ACM), 2013, p. 393-396Conference paper (Refereed)
    Abstract [en]

    We present a robust vision-based technology for hand and finger detection and tracking that can be used in many CHI scenarios. The method can be used in real-life setups and does not assume any predefined conditions. Moreover, it does not require any additional expensive hardware. It fits well into user's environment without major changes and hence can be used in ambient intelligence paradigm. Another contribution is the interaction using glass which is a natural, yet challenging environment to interact with. We introduce the concept of "invisible information layer" embedded into normal window glass that is used as an interaction medium thereafter.

  • 13. Halawani, A.
    et al.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID. Nanjing University of Posts and Telecommunications.
    Template-based search: A tool for scene analysis2016In: Proceeding - 2016 IEEE 12th International Colloquium on Signal Processing and its Applications, CSPA 2016, IEEE conference proceedings, 2016, p. 1-6Conference paper (Refereed)
    Abstract [en]

    This paper proposes a simple and yet effective technique for shape-based scene analysis, in which detection and/or tracking of specific objects or structures in the image is desirable. The idea is based on using predefined binary templates of the structures to be located in the image. The template is matched to contours in a given edge image to locate the designated entity. These templates are allowed to deform in order to deal with variations in the structure's shape and size. Deformation is achieved by dividing the template into segments. The dynamic programming search algorithm is used to accomplish the matching process, achieving very robust results in cluttered and noisy scenes in the applications presented.

  • 14. Halawani, Alaa
    et al.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    100 lines of code for shape-based object localization2016In: Pattern Recognition, ISSN 0031-3203, E-ISSN 1873-5142, Vol. 60, p. 458-472Article in journal (Refereed)
    Abstract [en]

    We introduce a simple and effective concept for localizing objects in densely cluttered edge images based on shape information. The shape information is characterized by a binary template of the object's contour, provided to search for object instances in the image. We adopt a segment-based search strategy, in which the template is divided into a set of segments. In this work, we propose our own segment representation that we call one-pixel segment (OPS), in which each pixel in the template is treated as a separate segment. This is done to achieve high flexibility that is required to account for intra-class variations. OPS representation can also handle scale changes effectively. A dynamic programming algorithm uses the OPS representation to realize the search process, enabling a detailed localization of the object boundaries in the image. The concept's simplicity is reflected in the ease of implementation, as the paper's title suggests. The algorithm works directly with very noisy edge images extracted using the Canny edge detector, without the need for any preprocessing or learning steps. We present our experiments and show that our results outperform those of very powerful, state-of-the-art algorithms.

  • 15. Halawani, Alaa
    et al.
    Rehman, Shafiq Ur
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Active vision for tremor disease monitoring2015In: 6TH INTERNATIONAL CONFERENCE ON APPLIED HUMAN FACTORS AND ERGONOMICS (AHFE 2015) AND THE AFFILIATED CONFERENCES, AHFE 2015, Elsevier, 2015, p. 2042-2048Conference paper (Refereed)
    Abstract [en]

    The aim of this work is to introduce a prototype for monitoring tremor diseases using computer vision techniques. While vision has been previously used for this purpose, the system we are introducing differs intrinsically from other traditional systems. The essential difference is characterized by the placement of the camera on the user's body rather than in front of it, and thus reversing the whole process of motion estimation. This is called active motion tracking. Active vision is simpler in setup and achieves more accurate results compared to traditional arrangements, which we refer to as "passive" here. One main advantage of active tracking is its ability to detect even tiny motions using its simple setup, and that makes it very suitable for monitoring tremor disorders.

  • 16. Khan, M. S. L.
    et al.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Rehman, Shafiq Ur
    Telepresence Mechatronic Robot (TEBoT): Towards the design and control of socially interactive bio-inspired system2016In: Journal of Intelligent & Fuzzy Systems, ISSN 1064-1246, E-ISSN 1875-8967, Vol. 31, no 5, p. 2597-2610Article in journal (Refereed)
    Abstract [en]

    Socially interactive systems are embodied agents that engage in social interactions with humans. From a design perspective, these systems are built by considering a biologically inspired design (Bio-inspired) that can mimic and simulate human-like communication cues and gestures. The design of a bio-inspired system usually consists of (i) studying biological characteristics, (ii) designing a similar biological robot, and (iii) motion planning, that can mimic the biological counterpart. In this article, we present a design, development, control-strategy and verification of our socially interactive bio-inspired robot, namely - Telepresence Mechatronic Robot (TEBoT). The key contribution of our work is an embodiment of a real human-neck movements by, i) designing a mechatronic platform based on the dynamics of a real human neck and ii) capturing the real head movements through our novel single-camera based vision algorithm. Our socially interactive bio-inspired system is based on an intuitive integration-design strategy that combines computer vision based geometric head pose estimation algorithm, model based design (MBD) approach and real-time motion planning techniques. We have conducted an extensive testing to demonstrate effectiveness and robustness of our proposed system.

  • 17. Khan, M. S. L.
    et al.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Réhman, S. U.
    Expressive multimedia: Bringing action to physical world by dancing-tablet2015In: HCMC 2015 - Proceedings of the 2nd Workshop on Computational Models of Social Interactions: Human-Computer-Media Communication, co-located with ACM MM 2015, ACM Digital Library, 2015, p. 9-14Conference paper (Refereed)
    Abstract [en]

    The design practice based on embodied interaction concept focuses on developing new user interfaces for computer devices that merge the digital content with the physical world. In this work we have proposed a novel embodied interaction based design in which the 'action' information of the digital content is presented in the physical world. More specifically, we have mapped the 'action' information of the video content from the digital world into the physical world. The motivating example presented in this paper is our novel dancing-tablet, in which a tablet-PC dances on the rhythm of the song, hence the 'action' information is not just confined into a 2D flat display but also expressed by it. This paper presents i) hardware design of our mechatronic dancingtablet platform, ii) software algorithm for musical feature extraction and iii) embodied computational model for mapping 'action' information of the musical expression to the mechatronic platform. Our user study shows that the overall perception of audio-video music is enhanced by our dancingtablet setup.

  • 18. Khan, M. S. L.
    et al.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Ur Réhman, S.
    Embodied tele-presence system (ETS): Designing tele-presence for video teleconferencing2014In: 3rd International Conference on Design, User Experience, and Usability: User Experience Design for Diverse Interaction Platforms and Environments, DUXU 2014, Held as Part of 16th International Conference on Human-Computer Interaction, HCI Int. 2014, 2014, no PART 2, p. 574-585Conference paper (Refereed)
    Abstract [en]

    In spite of the progress made in tele conferencing over the last decades, however, it is still far from a resolved issue. In this work, we present an intuitive video teleconferencing system, namely - Embodied Tele-Presence System (ETS) which is based on embodied interaction concept. This work proposes the results of a user study considering the hypothesis: " Embodied interaction based video conferencing system performs better than the standard video conferencing system in representing nonverbal behaviors, thus creating a 'feeling of presence' of a remote person among his/her local collaborators". Our ETS integrates standard audio-video conferencing with mechanical embodiment of head gestures of a remote person (as nonverbal behavior) to enhance the level of interaction. To highlight the technical challenges and design principles behind such tele-presence systems, we have also performed a system evaluation which shows the accuracy and efficiency of our ETS design. The paper further provides an overview of our case study and an analysis of our user evaluation. The user study shows that the proposed embodied interaction approach in video teleconferencing increases 'in-meeting interaction' and enhance a 'feeling of presence' among remote participant and his collaborators.

  • 19. Khan, M. S. L.
    et al.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Ur Réhman, S.
    Tele-immersion: Virtual reality based collaboration2016In: 18th International Conference on Human-Computer Interaction, HCI International 2016, Springer, 2016, p. 352-357Conference paper (Refereed)
    Abstract [en]

    The ‘perception of being present in another space’ during video teleconferencing is a challenging task. This work makes an effort to improve upon a user perception of being ‘present’ in another space by employing a virtual reality (VR) headset and an embodied telepresence system (ETS). In our application scenario, a remote participant uses a VR headset to collaborate with local collaborators. At a local site, an ETS is used as a physical representation of the remote participant among his/her local collaborators. The head movements of the remote person is mapped and presented by the ETS along with audio-video communication. Key considerations of complete design are discussed, where solutions to challenges related to head tracking, audio-video communication and data communication are presented. The proposed approach is validated by the user study where quantitative analysis is done on immersion and presence parameters.

  • 20. Khan, M. S. L.
    et al.
    Réhman, S. U.
    Söderström, U.
    Halawani, A.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Face-off: A face reconstruction technique for virtual reality (VR) scenarios2016In: 14th European Conference on Computer Vision, ECCV 2016, Springer, 2016, p. 490-503Conference paper (Refereed)
    Abstract [en]

    Virtual Reality (VR) headsets occlude a significant portion of human face. The real human face is required in many VR applications, for example, video teleconferencing. This paper proposes a wearable camera setup-based solution to reconstruct the real face of a person wearing VR headset. Our solution lies in the core of asymmetrical principal component analysis (aPCA). A user-specific training model is built using aPCA with full face, lips and eye region information. During testing phase, lower face region and partial eye information is used to reconstruct the wearer face. Online testing session consists of two phases, (i) calibration phase and (ii) reconstruction phase. In former, a small calibration step is performed to align test information with training data, while the later uses half face information to reconstruct the full face using aPCAbased trained-data. The proposed approach is validated with qualitative and quantitative analysis.

  • 21. Khan, M. S. L.
    et al.
    Ur Rehman, S.
    Hera, P. L.
    Liu, F.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    A pilot user's prospective in mobile robotic telepresence system2014In: 2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014, IEEE conference proceedings, 2014Conference paper (Refereed)
    Abstract [en]

    In this work we present an interactive video conferencing system specifically designed for enhancing the experience of video teleconferencing for a pilot user. We have used an Embodied Telepresence System (ETS) which was previously designed to enhance the experience of video teleconferencing for the collaborators. In this work we have deployed an ETS in a novel scenario to improve the experience of pilot user during distance communication. The ETS is used to adjust the view of the pilot user at the distance location (e.g. distance located conference/meeting). The velocity profile control for the ETS is developed which is implicitly controlled by the head of the pilot user. The experiment was conducted to test whether the view adjustment capability of an ETS increases the collaboration experience of video conferencing for the pilot user or not. The user study was conducted in which participants (pilot users) performed interaction using ETS and with traditional computer based video conferencing tool. Overall, the user study suggests the effectiveness of our approach and hence results in enhancing the experience of video conferencing for the pilot user. © 2014 Asia-Pacific Signal and Information Processing Ass.

  • 22. Khan, M. S. L.
    et al.
    ur Réhman, S.
    Mi, Y.
    Naeem, U.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Moveable facial features in a social mediator2017In: 17th International Conference on Intelligent Virtual Agents, IVA 2017, Springer, 2017, Vol. 10498, p. 205-208Conference paper (Refereed)
    Abstract [en]

    Human face and facial features based behavior has a major impact in human-human communications. Creating face based personality traits and its representations in a social robot is a challenging task. In this paper, we propose an approach for a robotic face presentation based on moveable 2D facial features and present a comparative study when a synthesized face is projected using three setups; 1) 3D mask, 2) 2D screen, and 3) our 2D moveable facial feature based visualization. We found that robot’s personality and character is highly influenced by the projected face quality as well as the motion of facial features.

  • 23.
    Khan, Muhammad Sikandar Lal
    et al.
    Umea Univ, Dept Appl Phys & Elect, S-90187 Umea, Sweden..
    Halawani, Alaa
    Umea Univ, Dept Appl Phys & Elect, S-90187 Umea, Sweden.;Palestine Polytech Univ, Comp Engn Dept, Hebron 90100, Palestine..
    Rehman, Shafiq Ur
    Umea Univ, Dept Appl Phys & Elect, S-90187 Umea, Sweden..
    Li, Haibo
    KTH, School of Electrical Engineering and Computer Science (EECS), Media Technology and Interaction Design, MID.
    Action Augmented Real Virtuality: A Design for Presence2018In: IEEE Transactions on Cognitive and Developmental Systems, ISSN 2379-8920, Vol. 10, no 4, p. 961-972Article in journal (Refereed)
    Abstract [en]

    This paper addresses the important question of how to design a video teleconferencing setup to increase the experience of spatial and social presence. Traditional video teleconferencing setups are lacking in presenting the nonverbal behaviors that humans express in face-to-face communication, which results in decrease in presence-experience. In order to address this issue, we first present a conceptual framework of presence for video teleconferencing. We introduce a modern presence concept called real virtuality and propose a new way of achieving this based on body or artifact actions to increase the feeling of presence, and we named this concept presence through actions. Using this new concept, we present the design of a novel action-augmented real virtuality prototype that considers the challenges related to the design of an action prototype, action embodiment, and face representation. Our action prototype is a telepresence mechatronic robot (TEBoT), and action embodiment is through a head-mounted display (HMD). The face representation solves the problem of face occlusion introduced by the HMD. The novel combination of HMD, TEBoT, and face representation algorithm has been tested in a real video teleconferencing scenario for its ability to solve the challenges related to spatial and social presence. We have performed a user study where the invited participants were requested to experience our novel setup and to compare it with a traditional video teleconferencing setup. The results show that the action capabilities not only increase the feeling of spatial presence but also increase the feeling of social presence of a remote person among local collaborators.

  • 24. Khan, Muhammad Sikandar Lal Khan
    et al.
    Réhman, Shafiq
    Lu, Zhihan
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Tele-embodied agent (TEA) for video teleconferencing2013In: Proceedings of the 12th International Conference on Mobile and Ubiquitous Multimedia, MUM 2013, Association for Computing Machinery (ACM), 2013Conference paper (Refereed)
    Abstract [en]

    We propose a design of teleconference system which express nonverbal behavior (in our case head gesture) along with audio-video communication. Previous audio-video confer- encing systems are abortive in presenting nonverbal behav- iors which we, as human, usually use in face to face in- teraction. Recently, research in teleconferencing systems has expanded to include nonverbal cues of remote person in their distance communication. The accurate representation of non-verbal gestures for such systems is still challenging because they are dependent on hand-operated devices (like mouse or keyboard). Furthermore, they still lack in present- ing accurate human gestures. We believe that incorporating embodied interaction in video teleconferencing, (i.e., using the physical world as a medium for interacting with digi- tal technology) can result in nonverbal behavior represen- tation. The experimental platform named Tele-Embodied Agent (TEA) is introduced which incorperates remote per- son's head gestures to study new paradigm of embodied in- teraction in video teleconferencing. Our preliminary test shows accuracy (with respect to pose angles) and efficiency (with respect to time) of our proposed design. TEA can be used in medical field, factories, offices, gaming industry, music industry and for training.

  • 25. Khan, Muhammad Sikandar Lal
    et al.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Rehman, Shafiq Ur
    Gaze perception and awareness in smart devices2016In: International journal of human-computer studies, ISSN 1071-5819, E-ISSN 1095-9300, Vol. 92-93, p. 55-65Article in journal (Refereed)
    Abstract [en]

    Eye contact and gaze awareness play a significant role for conveying emotions and intentions during face-to-face conversation. Humans can perceive each other's gaze quite naturally and accurately. However, the gaze awareness/perception are ambiguous during video teleconferencing performed by computer-based devices (such as laptops, tablet, and smart-phones). The reasons for this ambiguity are the (i) camera position relative to the screen and (ii) 2D rendition of 3D human face i.e., the 2D screen is unable to deliver an accurate gaze during video teleconferencing. To solve this problem, researchers have proposed different hardware setups with complex software algorithms. The most recent solution for accurate gaze perception employs 3D interfaces, such as 3D screens and 3D face-masks. However, today commonly used video teleconferencing devices are smart devices with 2D screens. Therefore, there is a need to improve gaze awareness/perception in these smart devices. In this work, we have revisited the question; how to improve a remote user's gaze awareness among his/her collaborators. Our hypothesis is that 'an accurate gaze perception can be achieved by the '3D embodiment' of a remote user's head gesture during video teleconferencing. We have prototyped an embodied telepresence system (ETS) for the 3D embodiment of a remote user's head. Our ETS is based on a 3-DOF neck robot with a mounted smart device (tablet PC). The electromechanical platform in combination with a smart device is a novel setup that is used for studying gaze awareness/perception in 2D screen-based smart devices during video teleconferencing. Two important gaze-related issues are considered in this work; namely (i) 'Mona-Lisa Gaze Effect' - the gaze is always directed at the person independent of his position in the room, and (ii) 'Gaze Awareness/Faithfulness' - the ability to perceive an accurate spatial relationship between the observing person and the object by an actor. Our results confirm that the 3D embodiment of a remote user head not only mitigates the Mona Lisa gaze effect but also supports three levels of gaze faithfulness, hence, accurately projecting the human gaze in distant space.

  • 26. Kondori, F. A.
    et al.
    Liu, L.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Telelife: An immersive media experience for rehabilitation2014In: 2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014, IEEE conference proceedings, 2014Conference paper (Refereed)
    Abstract [en]

    In recent years, emergence of telerehabilitation systems for home-based therapy has altered healthcare systems. Telerehabilitation enables therapists to observe patients status via Internet, thus a patient does not have to visit rehabilitation facilities for every rehabilitation session. Despite the fact that telerehabilitation provides great opportunities, there are two major issues that affect effectiveness of telerehabilitation: relegation of the patient at home, and loss of direct supervision of the therapist. Since patients have no actual interaction with other persons during the rehabilitation period, they will become isolated and gradually lose their social skills. Moreover, without direct supervision of therapists, rehabilitation exercises can be performed with bad compensation strategies that lead to a poor quality recovery. To resolve these issues, we propose telelife, a new concept for future rehabilitation systems. The idea is to use media technology to create a totally new immersive media experience for rehabilitation. In telerehabilitation patients locally execute exercises, and therapists remotely monitor patients' status. In telelife patients, however, remotely perform exercises and therapists locally monitor. Thus, not only telelife enables rehabilitation at distance, but also improves the patients' social competences, and provides direct supervision of therapists. In this paper we introduce telelife to enhance telerehabilitation, and investigate technical challenges and possible methods to achieve telelife.

  • 27. Kondori, F. A.
    et al.
    Yousefi, Shahrouz
    Umeå University, Sweden .
    Li, Haibo
    Umeå University, Sweden .
    Real 3D interaction behind mobile phones for augmented environments2011Conference paper (Refereed)
    Abstract [en]

    Number of mobile devices such as mobile phones or PDAs has been dramatically increased over the recent years. New mobile devices are equipped with integrated cameras and large displays which make the interaction with device easier and more efficient. Although most of the previous works on interaction between humans and mobile devices are based on 2D touch-screen displays, camera-based interaction opens a new way to manipulate in 3D space behind the device in the camera's field of view. This paper suggests the use of particular patterns from local orientation of the image called Rotational Symmetries to detect and localize human gesture. Relative rotation and translation of human gesture between consecutive frames are estimated by means of extracting stable features. Consequently, this information can be used to facilitate the 3D manipulation of virtual objects in various applications in mobile devices.

  • 28. Kondori, F. A.
    et al.
    Yousefi, Shahrouz
    Umeå University, Sweden .
    Li, Haibo
    Umeå University, Sweden .
    Sonning, S.
    3D head pose estimation using the Kinect2011Conference paper (Refereed)
    Abstract [en]

    Head pose estimation plays an essential role for bridging the information gap between humans and computers. Conventional head pose estimation methods are mostly done in images captured by cameras. However accurate and robust pose estimation is often problematic. In this paper we present an algorithm for recovering the six degrees of freedom (DOF) of motion of a head from a sequence of range images taken by the Microsoft Kinect for Xbox 360. The proposed algorithm utilizes a least-squares minimization of the difference between the measured rate of change of depth at a point and the rate predicted by the depth rate constraint equation. We segment the human head from its surroundings and background, and then we estimate the head motion. Our system has the capability to recover the six DOF of the head motion of multiple people in one image. The proposed system is evaluated in our lab and presents superior results.

  • 29.
    Kondori, Farid Abedan
    et al.
    Umea Univ, Dept Appl Phys & Elect, Umea, Sweden..
    Liu, Li
    Umea Univ, Dept Appl Phys & Elect, Umea, Sweden..
    Li, Haibo
    KTH, School of Electrical Engineering and Computer Science (EECS), Media Technology and Interaction Design, MID.
    Telelife: An Immersive Media Experience for Rehabilitation2014In: 2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), IEEE , 2014Conference paper (Refereed)
    Abstract [en]

    In recent years, emergence of telerehabilitation systems for home-based therapy has altered healthcare systems. Telerehabilitation enables therapists to observe patients status via Internet, thus a patient does not have to visit rehabilitation facilities for every rehabilitation session. Despite the fact that telerehabilitation provides great opportunities, there are two major issues that affect effectiveness of telerehabilitation: relegation of the patient at home, and loss of direct supervision of the therapist. Since patients have no actual interaction with other persons during the rehabilitation period, they will become isolated and gradually lose their social skills. Moreover, without direct supervision of therapists, rehabilitation exercises can be performed with bad compensation strategies that lead to a poor quality recovery. To resolve these issues, we propose telelife, a new concept for future rehabilitation systems. The idea is to use media technology to create a totally new immersive media experience for rehabilitation. In telerehabilitation patients locally execute exercises, and therapists remotely monitor patients' status. In telelife patients, however, remotely perform exercises and therapists locally monitor. Thus, not only telelife enables rehabilitation at distance, but also improves the patients' social competences, and provides direct supervision of therapists. In this paper we introduce telelife to enhance telerehabilitation, and investigate technical challenges and possible methods to achieve telelife.

  • 30. Kondori, Farid Abedan
    et al.
    Yousefi, Shahrouz
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Kouma, Jean-Paul
    Liu, Li
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Direct hand pose estimation for immersive gestural interaction2015In: Pattern Recognition Letters, ISSN 0167-8655, E-ISSN 1872-7344, Vol. 66, p. 91-99Article in journal (Refereed)
    Abstract [en]

    This paper presents a novel approach for performing intuitive gesture based interaction using depth data acquired by Kinect. The main challenge to enable immersive gestural interaction is dynamic gesture recognition. This problem can be formulated as a combination of two tasks; gesture recognition and gesture pose estimation. Incorporation of fast and robust pose estimation method would lessen the burden to a great extent. In this paper we propose a direct method for real-time hand pose estimation. Based on the range images, a new version of optical flow constraint equation is derived, which can be utilized to directly estimate 3D hand motion without any need of imposing other constraints. Extensive experiments illustrate that the proposed approach performs properly in real-time with high accuracy. As a proof of concept, we demonstrate the system performance in 3D object manipulation On two different setups; desktop computing, and mobile platform. This reveals the system capability to accommodate different interaction procedures. In addition, a user study is conducted to evaluate learnability, user experience and interaction quality in 3D gestural interaction in comparison to 2D touchscreen interaction.

  • 31.
    Kondori, Farid Abedan
    et al.
    Umeå Univ, SE-90187 Umea, Sweden..
    Yousefi, Shahrouz
    KTH.
    Ostovar, Ahmad
    Umeå Univ, SE-90187 Umea, Sweden..
    Liu, Li
    Umeå Univ, SE-90187 Umea, Sweden..
    Li, Haibo
    KTH, School of Electrical Engineering and Computer Science (EECS), Media Technology and Interaction Design, MID.
    A Direct Method for 3D Hand Pose Recovery2014In: 2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), IEEE COMPUTER SOC , 2014, p. 345-350Conference paper (Refereed)
    Abstract [en]

    This paper presents a novel approach for performing intuitive 3D gesture-based interaction using depth data acquired by Kinect. Unlike current depth-based systems that focus only on classical gesture recognition problem, we also consider 3D gesture pose estimation for creating immersive gestural interaction. In this paper, we formulate gesture-based interaction system as a combination of two separate problems, gesture recognition and gesture pose estimation. We focus on the second problem and propose a direct method for recovering hand motion parameters. Based on the range images, a new version of optical flow constraint equation is derived, which can be utilized to directly estimate 3D hand motion without any need of imposing other constraints. Our experiments illustrate that the proposed approach performs properly in real-time with high accuracy. As a proof of concept, we demonstrate the system performance in 3D object manipulation. This application is intended to explore the system capabilities in real-time biomedical applications. Eventually, system usability test is conducted to evaluate the learnability, user experience and interaction quality in 3D interaction in comparison to 2D touch-screen interaction.

  • 32. Li, B.
    et al.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC).
    Söderström, U.
    Distinctive curves features2016In: Electronics Letters, ISSN 0013-5194, E-ISSN 1350-911X, Vol. 52, no 3, p. 197-U83Article in journal (Refereed)
    Abstract [en]

    Curves and lines are geometrical, abstract features of an image. Whereas interest points are more limited, curves and lines provide much more information of the image structure. However, the research done in curve and line detection is very fragmented. The concept of scale space is not yet fused very well into curve and line detection. Keypoint (e.g. SIFT, SURF, ORB) is a successful concept which represent features (e.g. blob, corner etc.) in scale space. Stimulated by the keypoint concept, a method which extracts distinctive curves (DICU) in scale space, including lines as a special form of curve features is proposed. A curve feature can be represented by three keypoints (two end points, and one middle point). A good way to test the quality of detected curves is to analyse the repeatability under various image transformations. DICU using the standard Oxford benchmark is evaluated. The overlap error is calculated by averaging the overlap error of three keypoints on the curve. Experiment results show that DICU achieves good repeatability comparing with other state-of-the-art methods. To match curve features, a relatively uncomplicated way is to combine local descriptors of three keypoints on each curve.

  • 33. Li, B.
    et al.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Söderström, U.
    Scale-invariant corner keypoints2014Conference paper (Refereed)
    Abstract [en]

    Effective and efficient generation of keypoints from images is the first step of many computer vision applications, such as object matching. The last decade presented us with an arms race toward faster and more robust keypoint detection, feature description and matching. This resulted in several new algorithms, for example Scale Invariant Features Transform (SIFT), Speed-up Robust Feature (SURF), Oriented FAST and Rotated BRIEF (ORB) and Binary Robust Invariant Scalable Keypoints (BRISK). The keypoint detection has been improved using various techniques in most of these algorithms. However, in the search for faster computing, the accuracy of the algorithms is decreasing. In this paper, we present SICK (Scale-Invariant Corner Keypoints), which is a novel method for fast keypoint detection. Our experiment results show that SICK is faster to compute and more robust than recent state-of-the-art methods.

  • 34.
    Li, Haibo
    et al.
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Hedman, Anders
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Harnessing Crowds to Avert or Mitigate Acts Terrorism: A Collective Intelligence Call for Action2016In: 2016 EUROPEAN INTELLIGENCE AND SECURITY INFORMATICS CONFERENCE (EISIC) / [ed] Brynielsson, J Johansson, F, IEEE , 2016, p. 203-203Conference paper (Refereed)
    Abstract [en]

    Averting acts of terrorism through non-traditional means of surveillance and control: the use of crowd sourcing (collective intelligence) and the development of a new class of anti-terror mobile apps. The proposed class of anti-terrorist apps is based on two dimensions: the individual and the central. By individual, we mean the individual app user and by central we mean a central organizational locus of coordination and control in the fight against terrorism. Such a central locus could be a governmental agency or a national/international security organization active in the fight against terrorism.

  • 35. Lu, G.
    et al.
    He, J.
    Yan, J.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID. Nanjing University of Posts and Telecommunications.
    Convolutional neural network for facial expression recognition2016In: Journal of Nanjing University of Posts and Telecommunications, ISSN 1673-5439, Vol. 36, no 1, p. 16-22Article in journal (Refereed)
    Abstract [en]

    To avoid the complex explicit feature extraction process in traditional expression recognition, a convolutional neural network (CNN) for the facial expression recognition is proposed. Firstly, the facial expression image is normalized and the implicit features are extracted by using the trainable convolution kernel. Then, the maximum pooling is used to reduce the dimensions of the extracted implicit features. Finally, the Softmax classifier is used to classify the facial expressions of the test samples. The experiment is carried out on the CK+ facial expression database using the graphics processing unit (GPU). Experimental results show the performance and the generalization ability of the CNN for facial expression recognition.

  • 36. Lu, G.
    et al.
    Yang, C.
    Yang, W.
    Yan, J.
    Li, Haibo
    KTH, School of Electrical Engineering and Computer Science (EECS), Media Technology and Interaction Design, MID.
    Micro-expression recognition based on LBP-TOP features2017In: Nanjing Youdian Daxue Xuebao (Ziran Kexue Ban)/Journal of Nanjing University of Posts and Telecommunications (Natural Science), Vol. 37, no 6, p. 1-7Article in journal (Refereed)
    Abstract [en]

    Micro-expressions are involuntary facial expressions revealing true feelings when a person tries to conceal facial expressions.Compared with normal facial expressions,the most significant characteristic of micro-expressions is their short duration and weak intensity,thus it is diffcult to be recognized.In this paper,a micro-expression recognition method based on local binary pattern from three orthogonal plane(LBP-TOP) features and support vector machine (SVM)-based classifier is proposed.Firstly,the LBP-TOP operators are used to extract micro-expression features.Then,the feature selection algorithm combining the ReliefF with manifold learning algorithm based on locally linear embedding (LLE) is proposed to reduce the dimensionality of extracted LBP-TOP feature vectors.Finally,the SVM-based classifier with radial basis function (RBF) kernel is used to classify test samples into five categories of micro-expressions:happiness,disgust,repression,surprise,and others.Experiments are carried out on the micro-expression database CASME II using leave-one-subject-out cross validation (LOSO-CV) method.The classification accuracy can reach 58.98%.Experimental results show the effectiveness of the proposed method. 

  • 37. Lu, Z.
    et al.
    Réhman, S.
    Khan, M. S. L.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID. KTH.
    Anaglyph 3D Stereoscopic Visualization of 2D Video based on Fundamental Matrix2013In: Proceedings - 2013 International Conference on Virtual Reality and Visualization, ICVRV 2013, IEEE , 2013, p. 305-308Conference paper (Refereed)
    Abstract [en]

    In this paper, we propose a simple Anaglyph 3D stereo generation algorithm from 2D video sequence with monocular camera. In our novel approach we employ camera pose estimation method to directly generate stereoscopic 3D from 2D video without building depth map explicitly. Our cost effective method is suitable for arbitrary real-world video sequence and produces smooth results. We use image stitching based on plane correspondence using fundamental matrix. To this end we also demonstrate that correspondence plane image stitching based on Homography matrix only cannot generate better result. Furthermore, we utilize the structure from motion (with fundamental matrix) based reconstructed camera pose model to accomplish visual anaglyph 3D illusion. The proposed approach demonstrates a very good performance for most of the video sequences.

  • 38. Lv, Z.
    et al.
    Feng, L.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Feng, S.
    Hand-free motion interaction on google glass2014In: SIGGRAPH Asia 2014 Mobile Graphics and Interactive Applications, SA 2014, 2014Conference paper (Refereed)
    Abstract [en]

    There is an increasing interest in creating wearable device interaction technologies. Novel emerging user interface technologies (e.g. eye-ball tracking, speech recognition, gesture recognition, ECG, EEG and fusion of them) have the potential to significantly affect market share in PC, smartphones, tablets and latest wearable devices such as google glass. As a result, displacing these technologies in devices such as smart phones and wearable devices is challenging. Google glass has many impressive characteristics (i.e. voice actions, head wake up, wink detection), which are human-glass interface (HGI) technologies. Google glass won't meet the 'the occlusion problem' and 'the fat finger problem' any more, which are the problems of direct-touch finger input on touch screen. However, google glass only provides a touchpad that includes haptics with simple 'tapping and sliding your finger' gestures which is a one-dimensional interaction in fact, instead of the traditional two-dimensional interaction based on the complete touch screen of smartphone. The one-dimensional 'swipe the touchpad' interaction with a row of 'Cards' which replace traditional two-dimensional icon menu limits the intuitive and flexibility of HGI. Therefore, there is a growing interest in implementing 3D gesture recognition vision systems in which optical sensors capture real-time video of the user and ubiquitous algorithms are then used to determine what the user's gestures are, without the user having to hold any device. We will demonstrate a hand-free motion interaction application based on computer vision technology on google glass. Presented application allows user to perform touch-less interaction by hand or foot gesture in front of the camera of google glass. Based on the same core ubiquitous gestures recognition algorithm as used in this demonstration, a hybrid wearable smartphone system based on mixed hardware and software has been presented in our previous work [Lv 2013][Lu et al. 2013][Lv et al. 2013], which can support either hand or foot interaction with today' smartphone.

  • 39. Lv, Z.
    et al.
    Feng, S.
    Lal Khan, M. S.
    Ur Réhman, S.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Foot motion sensing: Augmented game interface based on foot interaction for smartphone2014In: Conference on Human Factors in Computing Systems - Proceedings, 2014, p. 293-296Conference paper (Refereed)
    Abstract [en]

    We designed and developmented two games: real-time augmented football game and augmented foot piano game to demonstrate a innovative interface based on foot motion sensing approach for smart phone. In the proposed novel interface, the computer vision based hybrid detection and tracking method provides a core support for foot interaction interface by accurately tracking the shoes. Based on the proposed interaction interface, wo demonstrations are developed, the applications employ augmented reality technology to render the game graphics and game status information on smart phones screen. The players interact with the game using foot interaction toward the rear camera, which triggers the interaction event. This interface supports basic foot motion sensing (i.e. direction of movement, velocity, rhythm).

  • 40. Lv, Z.
    et al.
    Halawani, A.
    Khan, M. S. L.
    Réhman, S. U.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Finger in air: Touch-less interaction on smartphone2013In: Proceedings of the 12th International Conference on Mobile and Ubiquitous Multimedia, MUM 2013, Association for Computing Machinery (ACM), 2013Conference paper (Refereed)
    Abstract [en]

    In this paper we present a vision based intuitive interaction method for smart mobile devices. It is based on markerless finger gesture detection which attempts to provide a 'natural user interface'. There is no additional hardware necessary for real-time finger gesture estimation. To evaluate the strengths and effectiveness of proposed method, we design two smart phone applications, namely circle menu application - provides user with graphics and smart phone's status information, and bouncing ball game- a finger gesture based bouncing ball application. The users interact with these applications using finger gestures through the smart phone's camera view, which trigger the interaction event and generate activity sequences for interactive buffers. Our preliminary user study evaluation demonstrates effectiveness and the social acceptability of proposed interaction approach.

  • 41. Lv, Zhihan
    et al.
    Feng, Liangbing
    Feng, Shengzhong
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Extending Touch-less Interaction on Vision Based Wearable Device2015In: 2015 IEEE VIRTUAL REALITY CONFERENCE (VR), IEEE conference proceedings, 2015, p. 231-232Conference paper (Refereed)
    Abstract [en]

    A touch-less interaction technology on vision based wearable device is designed and evaluated. Users interact with the application with dynamic hands/feet gestures in front of the camera. Several proof-of-concept prototypes with eleven dynamic gestures are developed based on the touch-less interaction. At last, a comparing user study evaluation is proposed to demonstrate the usability of the touch-less approach, as well as the impact on user's emotion, running on a wearable framework or Google Glass.

  • 42. Lv, Zhihan
    et al.
    Halawani, Alaa
    Feng, Shengzhong
    ur Rehman, Shafiq
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Touch-less interactive augmented reality game on vision-based wearable device2015In: Personal and Ubiquitous Computing, ISSN 1617-4909, E-ISSN 1617-4917, Vol. 19, no 3-4, p. 551-567Article in journal (Refereed)
    Abstract [en]

    There is an increasing interest in creating pervasive games based on emerging interaction technologies. In order to develop touch-less, interactive and augmented reality games on vision-based wearable device, a touch-less motion interaction technology is designed and evaluated in this work. Users interact with the augmented reality games with dynamic hands/feet gestures in front of the camera, which triggers the interaction event to interact with the virtual object in the scene. Three primitive augmented reality games with eleven dynamic gestures are developed based on the proposed touch-less interaction technology as proof. At last, a comparing evaluation is proposed to demonstrate the social acceptability and usability of the touch-less approach, running on a hybrid wearable framework or with Google Glass, as well as workload assessment, user's emotions and satisfaction.

  • 43. Lv, Zhihan
    et al.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Imagining In-Air Interaction for Hemiplegia Sufferer2015In: 2015 INTERNATIONAL CONFERENCE ON VIRTUAL REHABILITATION PROCEEDINGS (ICVR), 2015, p. 149-150Conference paper (Refereed)
    Abstract [en]

    In this paper, we described the imagination scenarios of a touch-less interaction technology for hemiplegia, which can support either hand or foot interaction with the smartphone or head mounted device (HMD). The computer vision interaction technology is implemented in our previous work, which provides a core support for gesture interaction by accurately detecting and tracking the hand or foot gesture. The patients interact with the application using hand/foot gesture motion in the camera view.

  • 44. Lv, Zjhan
    et al.
    Halawani, Alaa
    Feng, Shengzhong
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Réhman, S.U
    Multimodal Hand and Foot Gesture Interaction for Handheld Devices2014In: ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP), ISSN 1551-6857, E-ISSN 1551-6865, Vol. 11, article id 10Article in journal (Refereed)
    Abstract [en]

    We present a hand-and-foot-based multimodal interaction approach for handheld devices. Our method combines input modalities (i.e., hand and foot) and provides a coordinated output to both modalities along with audio and video. Human foot gesture is detected and tracked using contour-based template detection (CTD) and Tracking-Learning-Detection (TLD) algorithm. 3D foot pose is estimated from passive homography matrix of the camera. 3D stereoscopic and vibrotactile are used to enhance the immersive feeling. We developed a multimodal football game based on the multimodal approach as a proof-of-concept. We confirm our systems user satisfaction through a user study.

  • 45. Réhman Ur, S.
    et al.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Using vibrotactile language for multimodal human animals communication and interaction2014In: ACM International Conference Proceeding Series, ACM Digital Library, 2014Conference paper (Refereed)
    Abstract [en]

    In this work we aim to facilitate computer mediated multimodal communication and interaction between human and animal based on vibrotactile stimuli. To study and influence the behavior of animals, usually researchers use 2D/3D visual stimuli. However we use vibrotactile pattern based language which provides the opportunity to communicate and interact with animals. We have performed experiment with a vibrotactile based human-animal multimodal communication system to study the effectiveness of vibratory stimuli applied to the animal skin along with audio and visual stimuli. The preliminary results are encouraging and indicate that low-resolution tactual displays are effective in transmitting information.

  • 46. Shao, W.
    et al.
    Lin, Y.
    Bao, B.
    Wang, L.
    Ge, Q.
    Li, Haibo
    KTH, School of Electrical Engineering and Computer Science (EECS), Media Technology and Interaction Design, MID.
    Blind deblurring using discriminative image smoothing2018In: 1st Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2018, Springer Verlag , 2018, p. 490-500Conference paper (Refereed)
    Abstract [en]

    This paper aims to exploit the full potential of gradient-based methods, attempting to explore a simple, robust yet discriminative image prior for blind deblurring. The specific contributions are three-fold: Above all, a pure gradient-based heavy-tailed model is proposed as a generalized integration of the normalized sparsity and the relative total variation. On the second, a plug-and-play algorithm is deduced to alternatively estimate the intermediate sharp image and the nonparametric blur kernel. With the numerical scheme, image estimation is simplified to an image smoothing problem. Lastly, a great many experiments are performed accompanied with comparisons with state-of-the-art approaches on synthetic benchmark datasets and real blurry images in various scenarios. The experimental results show well the effectiveness and robustness of the proposed method. 

  • 47. Shao, W. -Z
    et al.
    Li, H. -B
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Elad, M.
    Bi-l0-l2-norm regularization for blind motion deblurring2015In: Journal of Visual Communication and Image Representation, ISSN 1047-3203, E-ISSN 1095-9076, Vol. 33, p. 42-59Article in journal (Refereed)
    Abstract [en]

    In blind motion deblurring, leading methods today tend towards highly non-convex approximations of the l<inf>0</inf>-norm, especially in the image regularization term. In this paper, we propose a simple, effective and fast approach for the estimation of the motion blur-kernel, through a bi-l<inf>0</inf>-l<inf>2</inf>-norm regularization imposed on both the intermediate sharp image and the blur-kernel. Compared with existing methods, the proposed regularization is shown to be more effective and robust, leading to a more accurate motion blur-kernel and a better final restored image. A fast numerical scheme is deployed for alternatingly computing the sharp image and the blur-kernel, by coupling the operator splitting and augmented Lagrangian methods. Experimental results on both a benchmark image dataset and real-world motion blurred images show that the proposed approach is highly competitive with state-of-the-art methods in both deblurring effectiveness and computational efficiency.

  • 48. Shao, Wen-Ze
    et al.
    Ge, Qi
    Deng, Hai-Song
    Wei, Zhi-Hui
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Motion Deblurring Using Non-stationary Image Modeling2015In: Journal of Mathematical Imaging and Vision, ISSN 0924-9907, E-ISSN 1573-7683, Vol. 52, no 2, p. 234-248Article in journal (Refereed)
    Abstract [en]

    It is well-known that shaken cameras or mobile phones during exposure usually lead to motion blurry photographs. Therefore, camera shake deblurring or motion deblurring is required and requested in many practical scenarios. The contribution of this paper is the proposal of a simple yet effective approach for motion blur kernel estimation, i.e., blind motion deblurring. Though there have been proposed severalmethods formotion blur kernel estimation in the literature, we impose a type of non-stationary Gaussian prior on the gradient fields of sharp images, in order to automatically detect and purse the salient edges of images as the important clues to blur kernel estimation. On one hand, the prior is able to promote sparsity inherited in the non-stationarity of the precision parameters (inverse of variances). On the other hand, since the prior is in a Gaussian form, there exists a great possibility of deducing a conceptually simple and computationally tractable inference scheme. Specifically, the well-known expectation-maximization algorithm is used to alternatingly estimate the motion blur kernels, the salient edges of images as well as the precision parameters in the image prior. In difference from many existing methods, no hyperpriors are imposed on any parameters in this paper; there are not any pre-processing steps involved in the proposed method, either, such as explicit suppression of random noise or prediction of salient edge structures. With estimated motion blur kernels, the deblurred images are finally generated using an off-the-shelf non-blind deconvolution method proposed by Krishnan and Fergus (Adv Neural Inf Process Syst 22:1033-1041, 2009). The rationality and effectiveness of our proposed method have been well demonstrated by the experimental results on both synthetic and realistic motion blurry images, showing state-of-the-art blind motion deblurring performance of the proposed approach in the term of quantitative metric as well as visual perception.

  • 49. Shao, Wen-Ze
    et al.
    Ge, Qi
    Gan, Zong-Liang
    Deng, Hai-Song
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    A Generalized Robust Minimization Framework for Low-Rank Matrix Recovery2014In: Mathematical problems in engineering (Print), ISSN 1024-123X, E-ISSN 1563-5147, p. 656074-Article in journal (Refereed)
    Abstract [en]

    This paper considers the problem of recovering low-rank matrices which are heavily corrupted by outliers or large errors. To improve the robustness of existing recovery methods, the problem is solved by formulating it as a generalized nonsmooth nonconvex minimization functional via exploiting the Schatten p-norm (0 < p <= 1) and L-q(0 <q <= 1) seminorm. Two numerical algorithms are provided based on the augmented Lagrange multiplier (ALM) and accelerated proximal gradient (APG) methods as well as efficient root-finder strategies. Experimental results demonstrate that the proposed generalized approach is more inclusive and effective compared with state-of-the-art methods, either convex or nonconvex.

  • 50.
    Shao, Wen-Ze
    et al.
    NUPT, Coll Telecommun & Informat Engn, Nanjing, Jiangsu, Peoples R China.;NUPT, Natl Engn Res Ctr Commun & Networking, Nanjing, Jiangsu, Peoples R China..
    Ge, Qi
    NUPT, Coll Telecommun & Informat Engn, Nanjing, Jiangsu, Peoples R China..
    Wang, Li-Qian
    NUPT, Coll Telecommun & Informat Engn, Nanjing, Jiangsu, Peoples R China..
    Lin, Yun-Zhi
    Georgia Inst Technol, Sch Elect & Comp Engn, Atlanta, GA 30332 USA.;Southeast Univ, Sch Automat, Nanjing, Jiangsu, Peoples R China..
    Deng, Hai-Song
    Nanjing Audit Univ, Sch Sci, Nanjing, Jiangsu, Peoples R China..
    Li, Haibo
    KTH, School of Electrical Engineering and Computer Science (EECS), Media Technology and Interaction Design, MID. NUPT, Coll Telecommun & Informat Engn, Nanjing, Jiangsu, Peoples R China.
    Nonparametric Blind Super-Resolution Using Adaptive Heavy-Tailed Priors2019In: Journal of Mathematical Imaging and Vision, ISSN 0924-9907, E-ISSN 1573-7683, Vol. 61, no 6, p. 885-917Article in journal (Refereed)
    Abstract [en]

    Single-image nonparametric blind super-resolution is a fundamental image restoration problem yet largely ignored in the past decades among the computational photography and computer vision communities. An interesting phenomenon is observed that learning-based single-image super-resolution (SR) has been experiencing a rapid development since the boom of the sparse representation in 2005s and especially the representation learning in 2010s, wherein the high-res image is generally blurred by a supposed bicubic or Gaussian blur kernel. However, the parametric assumption on the form of blur kernels does not hold in most practical applications because in real low-res imaging a high-res image can undergo complex blur processes, e.g., Gaussian-shaped kernels of varying sizes, ellipse-shaped kernels of varying orientations, curvilinear kernels of varying trajectories. The paper is mainly motivated by one of our previous works: Shao and Elad (in: Zhang (ed) ICIG 2015, Part III, Lecture notes in computer science, Springer, Cham, 2015). Specifically, we take one step further in this paper and present a type of adaptive heavy-tailed image priors, which result in a new regularized formulation for nonparametric blind super-resolution. The new image priors can be expressed and understood as a generalized integration of the normalized sparsity measure and relative total variation. Although it seems that the proposed priors are simple, the core merit of the priors is their practical capability for the challenging task of nonparametric blur kernel estimation for both super-resolution and deblurring. Harnessing the priors, a higher-quality intermediate high-res image becomes possible and therefore more accurate blur kernel estimation can be accomplished. A great many experiments are performed on both synthetic and real-world blurred low-res images, demonstrating the comparative or even superior performance of the proposed algorithm convincingly. Meanwhile, the proposed priors are demonstrated quite applicable to blind image deblurring which is a degenerated problem of nonparametric blind SR.

12 1 - 50 of 82
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf