Change search
Refine search result
12 1 - 50 of 66
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1. Abedan Kondori, Farid
    et al.
    Yousefi, Shahrouz
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Direct Head Pose Estimation Using Kinect-type Sensors2014In: Electronics Letters, ISSN 0013-5194, E-ISSN 1350-911XArticle in journal (Refereed)
  • 2. Abedan Kondori, Farid
    et al.
    Yousefi, Shahrouz
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Liu, Li
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID. Nanjing University of Posts and Telecommunications, Nanjing, China .
    Head Operated Electric Wheelchair2014In: Proceedings of the IEEE Southwest Symposium on Image Analysis and Interpretation, 2014, p. 53-56Conference paper (Refereed)
    Abstract [en]

    Currently, the most common way to control an electric wheelchair is to use joystick. However, there are some individuals unable to operate joystick-driven electric wheelchairs due to sever physical disabilities, like quadriplegia patients. This paper proposes a novel head pose estimation method to assist such patients. Head motion parameters are employed to control and drive an electric wheelchair. We introduce a direct method for estimating user head motion, based on a sequence of range images captured by Kinect. In this work, we derive new version of the optical flow constraint equation for range images. We show how the new equation can be used to estimate head motion directly. Experimental results reveal that the proposed system works with high accuracy in real-time. We also show simulation results for navigating the electric wheelchair by recovering user head motion.

  • 3.
    Cheng, Xiaogang
    et al.
    KTH, School of Computer Science and Communication (CSC). Nanjing University of Posts and Telecommunications, Nanjing, China.
    Yang, B.
    Liu, G.
    Olofsson, T.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC).
    A variational approach to atmospheric visibility estimation in the weather of fog and haze2018In: Sustainable cities and society, ISSN 2210-6707, Vol. 39, p. 215-224Article in journal (Refereed)
    Abstract [en]

    Real-time atmospheric visibility estimation in foggy and hazy weather plays a crucial role in ensuring traffic safety. Overcoming the inherent drawbacks with traditional optical estimation methods, like limited sampling volume and high cost, vision-based approaches have received much more attention in recent research on atmospheric visibility estimation. Based on the classical Koschmieder's formula, atmospheric visibility estimation is carried out by extracting an inherent extinction coefficient. In this paper we present a variational framework to handle the nature of time-varying extinction coefficient and develop a novel algorithm of extracting the extinction coefficient through a piecewise functional fitting of observed luminance curves. The developed algorithm is validated and evaluated with a big database of road traffic video from Tongqi expressway (in China). The test results are very encouraging and show that the proposed algorithm could achieve an estimation error rate of 10%. More significantly, it is the first time that the effectiveness of Koschmieder's formula in atmospheric visibility estimation was validated with a big dataset, which contains more than two million luminance curves extracted from real-world traffic video surveillance data.

  • 4.
    Cheng, Xiaogang
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS). Nanjing Univ Posts & Telecommun, Coll Telecommun & Informat Engn, Nanjing 210003, Jiangsu, Peoples R China.
    Yang, Bin
    Hedman, Anders
    KTH, School of Electrical Engineering and Computer Science (EECS), Media Technology and Interaction Design, MID.
    Olofsson, Thomas
    Li, Haibo
    KTH, School of Electrical Engineering and Computer Science (EECS), Media Technology and Interaction Design, MID.
    Van Gool, Luc
    NIDL: A pilot study of contactless measurement of skin temperature for intelligent building2019In: Energy and Buildings, ISSN 0378-7788, E-ISSN 1872-6178, Vol. 198, p. 340-352Article in journal (Refereed)
    Abstract [en]

    Human thermal comfort measurement plays a critical role in giving feedback signals for building energy efficiency. A contactless measuring method based on subtleness magnification and deep learning (NIDL) was designed to achieve a comfortable, energy efficient built environment. The method relies on skin feature data, e.g., subtle motion and texture variation, and a 315-layer deep neural network for constructing the relationship between skin features and skin temperature. A physiological experiment was conducted for collecting feature data (1.44 million) and algorithm validation. The contactless measurement algorithm based on a partly-personalized saturation temperature model (NIPST) was used for algorithm performance comparisons. The results show that the mean error and median error of the NIDL are 0.476 degrees C and 0.343 degrees C which is equivalent to accuracy improvements of 39.07% and 38.76%, respectively.

  • 5.
    Cheng, Xiaogang
    et al.
    Nanjing Univ Posts & Telecommun, Coll Telecommun & Informat Engn, Nanjing 210003, Jiangsu, Peoples R China.;Swiss Fed Inst Technol, Comp Vis Lab, CH-8092 Zurich, Switzerland..
    Yang, Bin
    Xian Univ Architecture & Technol, Sch Bldg Serv Sci & Engn, Xian 710055, Shaanxi, Peoples R China.;Umea Univ, Dept Appl Phys & Elect, S-90187 Umea, Sweden..
    Tan, Kaige
    KTH.
    Isaksson, Erik
    KTH, School of Electrical Engineering and Computer Science (EECS), Media Technology and Interaction Design, MID.
    Li, Liren
    Nanjing Tech Univ, Sch Comp Sci & Technol, Nanjing 211816, Jiangsu, Peoples R China..
    Hedman, Anders
    KTH, School of Electrical Engineering and Computer Science (EECS), Media Technology and Interaction Design, MID.
    Olofsson, Thomas
    Umea Univ, Dept Appl Phys & Elect, S-90187 Umea, Sweden..
    Li, Haibo
    KTH, School of Electrical Engineering and Computer Science (EECS), Media Technology and Interaction Design, MID. Nanjing Univ Posts & Telecommun, Coll Telecommun & Informat Engn, Nanjing 210003, Jiangsu, Peoples R China.
    A Contactless Measuring Method of Skin Temperature based on the Skin Sensitivity Index and Deep Learning2019In: Applied Sciences, E-ISSN 2076-3417, Vol. 9, no 7, article id 1375Article in journal (Refereed)
    Abstract [en]

    Featured Application The NISDL method proposed in this paper can be used for real time contactless measuring of human skin temperature, which reflects human body thermal comfort status and can be used for control HVAC devices. Abstract In human-centered intelligent building, real-time measurements of human thermal comfort play critical roles and supply feedback control signals for building heating, ventilation, and air conditioning (HVAC) systems. Due to the challenges of intra- and inter-individual differences and skin subtleness variations, there has not been any satisfactory solution for thermal comfort measurements until now. In this paper, a contactless measuring method based on a skin sensitivity index and deep learning (NISDL) was proposed to measure real-time skin temperature. A new evaluating index, named the skin sensitivity index (SSI), was defined to overcome individual differences and skin subtleness variations. To illustrate the effectiveness of SSI proposed, a two multi-layers deep learning framework (NISDL method I and II) was designed and the DenseNet201 was used for extracting features from skin images. The partly personal saturation temperature (NIPST) algorithm was use for algorithm comparisons. Another deep learning algorithm without SSI (DL) was also generated for algorithm comparisons. Finally, a total of 1.44 million image data was used for algorithm validation. The results show that 55.62% and 52.25% error values (NISDL method I, II) are scattered at (0 degrees C, 0.25 degrees C), and the same error intervals distribution of NIPST is 35.39%.

  • 6. Darvish, A. M.
    et al.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Söderström, U.
    Super-resolution facial images from single input images based on discrete wavelet transform2014In: Proceedings - International Conference on Pattern Recognition, 2014, p. 843-848Conference paper (Refereed)
    Abstract [en]

    In this work, we are presenting a technique that allows for accurate estimation of frequencies in higher dimensions than the original image content. This technique uses asymmetrical Principal Component Analysis together with Discrete Wavelet Transform (aPCA-DWT). For example, high quality content can be generated from low quality cameras since the necessary frequencies can be estimated through reliable methods. Within our research, we build models for interpreting facial images where super-resolution versions of human faces can be created. We have worked on several different experiments, extracting the frequency content in order to create models with aPCA-DWT. The results are presented along with experiments of deblurring and zooming beyond the original image resolution. For example, when an image is enlarged 16 times in decoding, the proposed technique outperforms interpolation with more than 7 dB on average.

  • 7. Ge, Q.
    et al.
    Shen, F.
    Jing, X. -Y
    Wu, F.
    Xie, S. -P
    Yue, D.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Active contour evolved by joint probability classification on Riemannian manifold2016In: Signal, Image and Video Processing, ISSN 1863-1703, E-ISSN 1863-1711, Vol. 10, no 7, p. 1257-1264Article in journal (Refereed)
    Abstract [en]

    In this paper, we present an active contour model for image segmentation based on a nonparametric distribution metric without any intensity a priori of the image. A novel nonparametric distance metric, which is called joint probability classification, is established to drive the active contour avoiding the instability induced by multimodal intensity distribution. Considering an image as a Riemannian manifold with spatial and intensity information, the contour evolution is performed on the image manifold by embedding geometric image feature into the active contour model. The experimental results on medical and texture images demonstrate the advantages of the proposed method.

  • 8. Halawani, A.
    et al.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID. KTH.
    FingerInk: Turn your glass into a digital board2013In: Proceedings of the 25th Australian Computer-Human Interaction Conference: Augmentation, Application, Innovation, Collaboration, OzCHI 2013, Association for Computing Machinery (ACM), 2013, p. 393-396Conference paper (Refereed)
    Abstract [en]

    We present a robust vision-based technology for hand and finger detection and tracking that can be used in many CHI scenarios. The method can be used in real-life setups and does not assume any predefined conditions. Moreover, it does not require any additional expensive hardware. It fits well into user's environment without major changes and hence can be used in ambient intelligence paradigm. Another contribution is the interaction using glass which is a natural, yet challenging environment to interact with. We introduce the concept of "invisible information layer" embedded into normal window glass that is used as an interaction medium thereafter.

  • 9. Halawani, A.
    et al.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID. Nanjing University of Posts and Telecommunications.
    Template-based search: A tool for scene analysis2016In: Proceeding - 2016 IEEE 12th International Colloquium on Signal Processing and its Applications, CSPA 2016, IEEE conference proceedings, 2016, p. 1-6Conference paper (Refereed)
    Abstract [en]

    This paper proposes a simple and yet effective technique for shape-based scene analysis, in which detection and/or tracking of specific objects or structures in the image is desirable. The idea is based on using predefined binary templates of the structures to be located in the image. The template is matched to contours in a given edge image to locate the designated entity. These templates are allowed to deform in order to deal with variations in the structure's shape and size. Deformation is achieved by dividing the template into segments. The dynamic programming search algorithm is used to accomplish the matching process, achieving very robust results in cluttered and noisy scenes in the applications presented.

  • 10. Halawani, Alaa
    et al.
    Rehman, Shafiq Ur
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Active vision for tremor disease monitoring2015In: 6TH INTERNATIONAL CONFERENCE ON APPLIED HUMAN FACTORS AND ERGONOMICS (AHFE 2015) AND THE AFFILIATED CONFERENCES, AHFE 2015, Elsevier, 2015, p. 2042-2048Conference paper (Refereed)
    Abstract [en]

    The aim of this work is to introduce a prototype for monitoring tremor diseases using computer vision techniques. While vision has been previously used for this purpose, the system we are introducing differs intrinsically from other traditional systems. The essential difference is characterized by the placement of the camera on the user's body rather than in front of it, and thus reversing the whole process of motion estimation. This is called active motion tracking. Active vision is simpler in setup and achieves more accurate results compared to traditional arrangements, which we refer to as "passive" here. One main advantage of active tracking is its ability to detect even tiny motions using its simple setup, and that makes it very suitable for monitoring tremor disorders.

  • 11. Khan, M. S. L.
    et al.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Rehman, Shafiq Ur
    Telepresence Mechatronic Robot (TEBoT): Towards the design and control of socially interactive bio-inspired system2016In: Journal of Intelligent & Fuzzy Systems, ISSN 1064-1246, E-ISSN 1875-8967, Vol. 31, no 5, p. 2597-2610Article in journal (Refereed)
    Abstract [en]

    Socially interactive systems are embodied agents that engage in social interactions with humans. From a design perspective, these systems are built by considering a biologically inspired design (Bio-inspired) that can mimic and simulate human-like communication cues and gestures. The design of a bio-inspired system usually consists of (i) studying biological characteristics, (ii) designing a similar biological robot, and (iii) motion planning, that can mimic the biological counterpart. In this article, we present a design, development, control-strategy and verification of our socially interactive bio-inspired robot, namely - Telepresence Mechatronic Robot (TEBoT). The key contribution of our work is an embodiment of a real human-neck movements by, i) designing a mechatronic platform based on the dynamics of a real human neck and ii) capturing the real head movements through our novel single-camera based vision algorithm. Our socially interactive bio-inspired system is based on an intuitive integration-design strategy that combines computer vision based geometric head pose estimation algorithm, model based design (MBD) approach and real-time motion planning techniques. We have conducted an extensive testing to demonstrate effectiveness and robustness of our proposed system.

  • 12. Khan, M. S. L.
    et al.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Réhman, S. U.
    Expressive multimedia: Bringing action to physical world by dancing-tablet2015In: HCMC 2015 - Proceedings of the 2nd Workshop on Computational Models of Social Interactions: Human-Computer-Media Communication, co-located with ACM MM 2015, ACM Digital Library, 2015, p. 9-14Conference paper (Refereed)
    Abstract [en]

    The design practice based on embodied interaction concept focuses on developing new user interfaces for computer devices that merge the digital content with the physical world. In this work we have proposed a novel embodied interaction based design in which the 'action' information of the digital content is presented in the physical world. More specifically, we have mapped the 'action' information of the video content from the digital world into the physical world. The motivating example presented in this paper is our novel dancing-tablet, in which a tablet-PC dances on the rhythm of the song, hence the 'action' information is not just confined into a 2D flat display but also expressed by it. This paper presents i) hardware design of our mechatronic dancingtablet platform, ii) software algorithm for musical feature extraction and iii) embodied computational model for mapping 'action' information of the musical expression to the mechatronic platform. Our user study shows that the overall perception of audio-video music is enhanced by our dancingtablet setup.

  • 13. Khan, M. S. L.
    et al.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Ur Réhman, S.
    Embodied tele-presence system (ETS): Designing tele-presence for video teleconferencing2014In: 3rd International Conference on Design, User Experience, and Usability: User Experience Design for Diverse Interaction Platforms and Environments, DUXU 2014, Held as Part of 16th International Conference on Human-Computer Interaction, HCI Int. 2014, 2014, no PART 2, p. 574-585Conference paper (Refereed)
    Abstract [en]

    In spite of the progress made in tele conferencing over the last decades, however, it is still far from a resolved issue. In this work, we present an intuitive video teleconferencing system, namely - Embodied Tele-Presence System (ETS) which is based on embodied interaction concept. This work proposes the results of a user study considering the hypothesis: " Embodied interaction based video conferencing system performs better than the standard video conferencing system in representing nonverbal behaviors, thus creating a 'feeling of presence' of a remote person among his/her local collaborators". Our ETS integrates standard audio-video conferencing with mechanical embodiment of head gestures of a remote person (as nonverbal behavior) to enhance the level of interaction. To highlight the technical challenges and design principles behind such tele-presence systems, we have also performed a system evaluation which shows the accuracy and efficiency of our ETS design. The paper further provides an overview of our case study and an analysis of our user evaluation. The user study shows that the proposed embodied interaction approach in video teleconferencing increases 'in-meeting interaction' and enhance a 'feeling of presence' among remote participant and his collaborators.

  • 14. Khan, M. S. L.
    et al.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Ur Réhman, S.
    Tele-immersion: Virtual reality based collaboration2016In: 18th International Conference on Human-Computer Interaction, HCI International 2016, Springer, 2016, p. 352-357Conference paper (Refereed)
    Abstract [en]

    The ‘perception of being present in another space’ during video teleconferencing is a challenging task. This work makes an effort to improve upon a user perception of being ‘present’ in another space by employing a virtual reality (VR) headset and an embodied telepresence system (ETS). In our application scenario, a remote participant uses a VR headset to collaborate with local collaborators. At a local site, an ETS is used as a physical representation of the remote participant among his/her local collaborators. The head movements of the remote person is mapped and presented by the ETS along with audio-video communication. Key considerations of complete design are discussed, where solutions to challenges related to head tracking, audio-video communication and data communication are presented. The proposed approach is validated by the user study where quantitative analysis is done on immersion and presence parameters.

  • 15. Khan, M. S. L.
    et al.
    Réhman, S. U.
    Söderström, U.
    Halawani, A.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Face-off: A face reconstruction technique for virtual reality (VR) scenarios2016In: 14th European Conference on Computer Vision, ECCV 2016, Springer, 2016, p. 490-503Conference paper (Refereed)
    Abstract [en]

    Virtual Reality (VR) headsets occlude a significant portion of human face. The real human face is required in many VR applications, for example, video teleconferencing. This paper proposes a wearable camera setup-based solution to reconstruct the real face of a person wearing VR headset. Our solution lies in the core of asymmetrical principal component analysis (aPCA). A user-specific training model is built using aPCA with full face, lips and eye region information. During testing phase, lower face region and partial eye information is used to reconstruct the wearer face. Online testing session consists of two phases, (i) calibration phase and (ii) reconstruction phase. In former, a small calibration step is performed to align test information with training data, while the later uses half face information to reconstruct the full face using aPCAbased trained-data. The proposed approach is validated with qualitative and quantitative analysis.

  • 16. Khan, M. S. L.
    et al.
    Ur Rehman, S.
    Hera, P. L.
    Liu, F.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    A pilot user's prospective in mobile robotic telepresence system2014In: 2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014, IEEE conference proceedings, 2014Conference paper (Refereed)
    Abstract [en]

    In this work we present an interactive video conferencing system specifically designed for enhancing the experience of video teleconferencing for a pilot user. We have used an Embodied Telepresence System (ETS) which was previously designed to enhance the experience of video teleconferencing for the collaborators. In this work we have deployed an ETS in a novel scenario to improve the experience of pilot user during distance communication. The ETS is used to adjust the view of the pilot user at the distance location (e.g. distance located conference/meeting). The velocity profile control for the ETS is developed which is implicitly controlled by the head of the pilot user. The experiment was conducted to test whether the view adjustment capability of an ETS increases the collaboration experience of video conferencing for the pilot user or not. The user study was conducted in which participants (pilot users) performed interaction using ETS and with traditional computer based video conferencing tool. Overall, the user study suggests the effectiveness of our approach and hence results in enhancing the experience of video conferencing for the pilot user. © 2014 Asia-Pacific Signal and Information Processing Ass.

  • 17.
    Khan, Muhammad Sikandar Lal
    et al.
    Umea Univ, Dept Appl Phys & Elect, S-90187 Umea, Sweden..
    Halawani, Alaa
    Umea Univ, Dept Appl Phys & Elect, S-90187 Umea, Sweden.;Palestine Polytech Univ, Comp Engn Dept, Hebron 90100, Palestine..
    Rehman, Shafiq Ur
    Umea Univ, Dept Appl Phys & Elect, S-90187 Umea, Sweden..
    Li, Haibo
    KTH, School of Electrical Engineering and Computer Science (EECS), Media Technology and Interaction Design, MID.
    Action Augmented Real Virtuality: A Design for Presence2018In: IEEE Transactions on Cognitive and Developmental Systems, ISSN 2379-8920, Vol. 10, no 4, p. 961-972Article in journal (Refereed)
    Abstract [en]

    This paper addresses the important question of how to design a video teleconferencing setup to increase the experience of spatial and social presence. Traditional video teleconferencing setups are lacking in presenting the nonverbal behaviors that humans express in face-to-face communication, which results in decrease in presence-experience. In order to address this issue, we first present a conceptual framework of presence for video teleconferencing. We introduce a modern presence concept called real virtuality and propose a new way of achieving this based on body or artifact actions to increase the feeling of presence, and we named this concept presence through actions. Using this new concept, we present the design of a novel action-augmented real virtuality prototype that considers the challenges related to the design of an action prototype, action embodiment, and face representation. Our action prototype is a telepresence mechatronic robot (TEBoT), and action embodiment is through a head-mounted display (HMD). The face representation solves the problem of face occlusion introduced by the HMD. The novel combination of HMD, TEBoT, and face representation algorithm has been tested in a real video teleconferencing scenario for its ability to solve the challenges related to spatial and social presence. We have performed a user study where the invited participants were requested to experience our novel setup and to compare it with a traditional video teleconferencing setup. The results show that the action capabilities not only increase the feeling of spatial presence but also increase the feeling of social presence of a remote person among local collaborators.

  • 18. Khan, Muhammad Sikandar Lal Khan
    et al.
    Réhman, Shafiq
    Lu, Zhihan
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Tele-embodied agent (TEA) for video teleconferencing2013In: Proceedings of the 12th International Conference on Mobile and Ubiquitous Multimedia, MUM 2013, Association for Computing Machinery (ACM), 2013Conference paper (Refereed)
    Abstract [en]

    We propose a design of teleconference system which express nonverbal behavior (in our case head gesture) along with audio-video communication. Previous audio-video confer- encing systems are abortive in presenting nonverbal behav- iors which we, as human, usually use in face to face in- teraction. Recently, research in teleconferencing systems has expanded to include nonverbal cues of remote person in their distance communication. The accurate representation of non-verbal gestures for such systems is still challenging because they are dependent on hand-operated devices (like mouse or keyboard). Furthermore, they still lack in present- ing accurate human gestures. We believe that incorporating embodied interaction in video teleconferencing, (i.e., using the physical world as a medium for interacting with digi- tal technology) can result in nonverbal behavior represen- tation. The experimental platform named Tele-Embodied Agent (TEA) is introduced which incorperates remote per- son's head gestures to study new paradigm of embodied in- teraction in video teleconferencing. Our preliminary test shows accuracy (with respect to pose angles) and efficiency (with respect to time) of our proposed design. TEA can be used in medical field, factories, offices, gaming industry, music industry and for training.

  • 19. Kondori, F. A.
    et al.
    Liu, L.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Telelife: An immersive media experience for rehabilitation2014In: 2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014, IEEE conference proceedings, 2014Conference paper (Refereed)
    Abstract [en]

    In recent years, emergence of telerehabilitation systems for home-based therapy has altered healthcare systems. Telerehabilitation enables therapists to observe patients status via Internet, thus a patient does not have to visit rehabilitation facilities for every rehabilitation session. Despite the fact that telerehabilitation provides great opportunities, there are two major issues that affect effectiveness of telerehabilitation: relegation of the patient at home, and loss of direct supervision of the therapist. Since patients have no actual interaction with other persons during the rehabilitation period, they will become isolated and gradually lose their social skills. Moreover, without direct supervision of therapists, rehabilitation exercises can be performed with bad compensation strategies that lead to a poor quality recovery. To resolve these issues, we propose telelife, a new concept for future rehabilitation systems. The idea is to use media technology to create a totally new immersive media experience for rehabilitation. In telerehabilitation patients locally execute exercises, and therapists remotely monitor patients' status. In telelife patients, however, remotely perform exercises and therapists locally monitor. Thus, not only telelife enables rehabilitation at distance, but also improves the patients' social competences, and provides direct supervision of therapists. In this paper we introduce telelife to enhance telerehabilitation, and investigate technical challenges and possible methods to achieve telelife.

  • 20. Kondori, F. A.
    et al.
    Yousefi, Shahrouz
    Umeå University, Sweden .
    Li, Haibo
    Umeå University, Sweden .
    Real 3D interaction behind mobile phones for augmented environments2011Conference paper (Refereed)
    Abstract [en]

    Number of mobile devices such as mobile phones or PDAs has been dramatically increased over the recent years. New mobile devices are equipped with integrated cameras and large displays which make the interaction with device easier and more efficient. Although most of the previous works on interaction between humans and mobile devices are based on 2D touch-screen displays, camera-based interaction opens a new way to manipulate in 3D space behind the device in the camera's field of view. This paper suggests the use of particular patterns from local orientation of the image called Rotational Symmetries to detect and localize human gesture. Relative rotation and translation of human gesture between consecutive frames are estimated by means of extracting stable features. Consequently, this information can be used to facilitate the 3D manipulation of virtual objects in various applications in mobile devices.

  • 21. Kondori, F. A.
    et al.
    Yousefi, Shahrouz
    Umeå University, Sweden .
    Li, Haibo
    Umeå University, Sweden .
    Sonning, S.
    3D head pose estimation using the Kinect2011Conference paper (Refereed)
    Abstract [en]

    Head pose estimation plays an essential role for bridging the information gap between humans and computers. Conventional head pose estimation methods are mostly done in images captured by cameras. However accurate and robust pose estimation is often problematic. In this paper we present an algorithm for recovering the six degrees of freedom (DOF) of motion of a head from a sequence of range images taken by the Microsoft Kinect for Xbox 360. The proposed algorithm utilizes a least-squares minimization of the difference between the measured rate of change of depth at a point and the rate predicted by the depth rate constraint equation. We segment the human head from its surroundings and background, and then we estimate the head motion. Our system has the capability to recover the six DOF of the head motion of multiple people in one image. The proposed system is evaluated in our lab and presents superior results.

  • 22.
    Kondori, Farid Abedan
    et al.
    Umea Univ, Dept Appl Phys & Elect, Umea, Sweden..
    Liu, Li
    Umea Univ, Dept Appl Phys & Elect, Umea, Sweden..
    Li, Haibo
    KTH, School of Electrical Engineering and Computer Science (EECS), Media Technology and Interaction Design, MID.
    Telelife: An Immersive Media Experience for Rehabilitation2014In: 2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), IEEE , 2014Conference paper (Refereed)
    Abstract [en]

    In recent years, emergence of telerehabilitation systems for home-based therapy has altered healthcare systems. Telerehabilitation enables therapists to observe patients status via Internet, thus a patient does not have to visit rehabilitation facilities for every rehabilitation session. Despite the fact that telerehabilitation provides great opportunities, there are two major issues that affect effectiveness of telerehabilitation: relegation of the patient at home, and loss of direct supervision of the therapist. Since patients have no actual interaction with other persons during the rehabilitation period, they will become isolated and gradually lose their social skills. Moreover, without direct supervision of therapists, rehabilitation exercises can be performed with bad compensation strategies that lead to a poor quality recovery. To resolve these issues, we propose telelife, a new concept for future rehabilitation systems. The idea is to use media technology to create a totally new immersive media experience for rehabilitation. In telerehabilitation patients locally execute exercises, and therapists remotely monitor patients' status. In telelife patients, however, remotely perform exercises and therapists locally monitor. Thus, not only telelife enables rehabilitation at distance, but also improves the patients' social competences, and provides direct supervision of therapists. In this paper we introduce telelife to enhance telerehabilitation, and investigate technical challenges and possible methods to achieve telelife.

  • 23. Kondori, Farid Abedan
    et al.
    Yousefi, Shahrouz
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Kouma, Jean-Paul
    Liu, Li
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Direct hand pose estimation for immersive gestural interaction2015In: Pattern Recognition Letters, ISSN 0167-8655, E-ISSN 1872-7344, Vol. 66, p. 91-99Article in journal (Refereed)
    Abstract [en]

    This paper presents a novel approach for performing intuitive gesture based interaction using depth data acquired by Kinect. The main challenge to enable immersive gestural interaction is dynamic gesture recognition. This problem can be formulated as a combination of two tasks; gesture recognition and gesture pose estimation. Incorporation of fast and robust pose estimation method would lessen the burden to a great extent. In this paper we propose a direct method for real-time hand pose estimation. Based on the range images, a new version of optical flow constraint equation is derived, which can be utilized to directly estimate 3D hand motion without any need of imposing other constraints. Extensive experiments illustrate that the proposed approach performs properly in real-time with high accuracy. As a proof of concept, we demonstrate the system performance in 3D object manipulation On two different setups; desktop computing, and mobile platform. This reveals the system capability to accommodate different interaction procedures. In addition, a user study is conducted to evaluate learnability, user experience and interaction quality in 3D gestural interaction in comparison to 2D touchscreen interaction.

  • 24.
    Kondori, Farid Abedan
    et al.
    Umeå Univ, SE-90187 Umea, Sweden..
    Yousefi, Shahrouz
    KTH.
    Ostovar, Ahmad
    Umeå Univ, SE-90187 Umea, Sweden..
    Liu, Li
    Umeå Univ, SE-90187 Umea, Sweden..
    Li, Haibo
    KTH, School of Electrical Engineering and Computer Science (EECS), Media Technology and Interaction Design, MID.
    A Direct Method for 3D Hand Pose Recovery2014In: 2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), IEEE COMPUTER SOC , 2014, p. 345-350Conference paper (Refereed)
    Abstract [en]

    This paper presents a novel approach for performing intuitive 3D gesture-based interaction using depth data acquired by Kinect. Unlike current depth-based systems that focus only on classical gesture recognition problem, we also consider 3D gesture pose estimation for creating immersive gestural interaction. In this paper, we formulate gesture-based interaction system as a combination of two separate problems, gesture recognition and gesture pose estimation. We focus on the second problem and propose a direct method for recovering hand motion parameters. Based on the range images, a new version of optical flow constraint equation is derived, which can be utilized to directly estimate 3D hand motion without any need of imposing other constraints. Our experiments illustrate that the proposed approach performs properly in real-time with high accuracy. As a proof of concept, we demonstrate the system performance in 3D object manipulation. This application is intended to explore the system capabilities in real-time biomedical applications. Eventually, system usability test is conducted to evaluate the learnability, user experience and interaction quality in 3D interaction in comparison to 2D touch-screen interaction.

  • 25. Li, B.
    et al.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC).
    Söderström, U.
    Distinctive curves features2016In: Electronics Letters, ISSN 0013-5194, E-ISSN 1350-911X, Vol. 52, no 3, p. 197-U83Article in journal (Refereed)
    Abstract [en]

    Curves and lines are geometrical, abstract features of an image. Whereas interest points are more limited, curves and lines provide much more information of the image structure. However, the research done in curve and line detection is very fragmented. The concept of scale space is not yet fused very well into curve and line detection. Keypoint (e.g. SIFT, SURF, ORB) is a successful concept which represent features (e.g. blob, corner etc.) in scale space. Stimulated by the keypoint concept, a method which extracts distinctive curves (DICU) in scale space, including lines as a special form of curve features is proposed. A curve feature can be represented by three keypoints (two end points, and one middle point). A good way to test the quality of detected curves is to analyse the repeatability under various image transformations. DICU using the standard Oxford benchmark is evaluated. The overlap error is calculated by averaging the overlap error of three keypoints on the curve. Experiment results show that DICU achieves good repeatability comparing with other state-of-the-art methods. To match curve features, a relatively uncomplicated way is to combine local descriptors of three keypoints on each curve.

  • 26. Li, B.
    et al.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Söderström, U.
    Scale-invariant corner keypoints2014Conference paper (Refereed)
    Abstract [en]

    Effective and efficient generation of keypoints from images is the first step of many computer vision applications, such as object matching. The last decade presented us with an arms race toward faster and more robust keypoint detection, feature description and matching. This resulted in several new algorithms, for example Scale Invariant Features Transform (SIFT), Speed-up Robust Feature (SURF), Oriented FAST and Rotated BRIEF (ORB) and Binary Robust Invariant Scalable Keypoints (BRISK). The keypoint detection has been improved using various techniques in most of these algorithms. However, in the search for faster computing, the accuracy of the algorithms is decreasing. In this paper, we present SICK (Scale-Invariant Corner Keypoints), which is a novel method for fast keypoint detection. Our experiment results show that SICK is faster to compute and more robust than recent state-of-the-art methods.

  • 27.
    Li, Haibo
    et al.
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Hedman, Anders
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Harnessing Crowds to Avert or Mitigate Acts Terrorism: A Collective Intelligence Call for Action2016In: 2016 EUROPEAN INTELLIGENCE AND SECURITY INFORMATICS CONFERENCE (EISIC) / [ed] Brynielsson, J Johansson, F, IEEE , 2016, p. 203-203Conference paper (Refereed)
    Abstract [en]

    Averting acts of terrorism through non-traditional means of surveillance and control: the use of crowd sourcing (collective intelligence) and the development of a new class of anti-terror mobile apps. The proposed class of anti-terrorist apps is based on two dimensions: the individual and the central. By individual, we mean the individual app user and by central we mean a central organizational locus of coordination and control in the fight against terrorism. Such a central locus could be a governmental agency or a national/international security organization active in the fight against terrorism.

  • 28. Lu, G.
    et al.
    Yang, C.
    Yang, W.
    Yan, J.
    Li, Haibo
    KTH, School of Electrical Engineering and Computer Science (EECS), Media Technology and Interaction Design, MID.
    Micro-expression recognition based on LBP-TOP features2017In: Nanjing Youdian Daxue Xuebao (Ziran Kexue Ban)/Journal of Nanjing University of Posts and Telecommunications (Natural Science), Vol. 37, no 6, p. 1-7Article in journal (Refereed)
    Abstract [en]

    Micro-expressions are involuntary facial expressions revealing true feelings when a person tries to conceal facial expressions.Compared with normal facial expressions,the most significant characteristic of micro-expressions is their short duration and weak intensity,thus it is diffcult to be recognized.In this paper,a micro-expression recognition method based on local binary pattern from three orthogonal plane(LBP-TOP) features and support vector machine (SVM)-based classifier is proposed.Firstly,the LBP-TOP operators are used to extract micro-expression features.Then,the feature selection algorithm combining the ReliefF with manifold learning algorithm based on locally linear embedding (LLE) is proposed to reduce the dimensionality of extracted LBP-TOP feature vectors.Finally,the SVM-based classifier with radial basis function (RBF) kernel is used to classify test samples into five categories of micro-expressions:happiness,disgust,repression,surprise,and others.Experiments are carried out on the micro-expression database CASME II using leave-one-subject-out cross validation (LOSO-CV) method.The classification accuracy can reach 58.98%.Experimental results show the effectiveness of the proposed method. 

  • 29. Lu, Z.
    et al.
    Réhman, S.
    Khan, M. S. L.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID. KTH.
    Anaglyph 3D Stereoscopic Visualization of 2D Video based on Fundamental Matrix2013In: Proceedings - 2013 International Conference on Virtual Reality and Visualization, ICVRV 2013, IEEE , 2013, p. 305-308Conference paper (Refereed)
    Abstract [en]

    In this paper, we propose a simple Anaglyph 3D stereo generation algorithm from 2D video sequence with monocular camera. In our novel approach we employ camera pose estimation method to directly generate stereoscopic 3D from 2D video without building depth map explicitly. Our cost effective method is suitable for arbitrary real-world video sequence and produces smooth results. We use image stitching based on plane correspondence using fundamental matrix. To this end we also demonstrate that correspondence plane image stitching based on Homography matrix only cannot generate better result. Furthermore, we utilize the structure from motion (with fundamental matrix) based reconstructed camera pose model to accomplish visual anaglyph 3D illusion. The proposed approach demonstrates a very good performance for most of the video sequences.

  • 30. Lv, Z.
    et al.
    Feng, L.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Feng, S.
    Hand-free motion interaction on google glass2014In: SIGGRAPH Asia 2014 Mobile Graphics and Interactive Applications, SA 2014, 2014Conference paper (Refereed)
    Abstract [en]

    There is an increasing interest in creating wearable device interaction technologies. Novel emerging user interface technologies (e.g. eye-ball tracking, speech recognition, gesture recognition, ECG, EEG and fusion of them) have the potential to significantly affect market share in PC, smartphones, tablets and latest wearable devices such as google glass. As a result, displacing these technologies in devices such as smart phones and wearable devices is challenging. Google glass has many impressive characteristics (i.e. voice actions, head wake up, wink detection), which are human-glass interface (HGI) technologies. Google glass won't meet the 'the occlusion problem' and 'the fat finger problem' any more, which are the problems of direct-touch finger input on touch screen. However, google glass only provides a touchpad that includes haptics with simple 'tapping and sliding your finger' gestures which is a one-dimensional interaction in fact, instead of the traditional two-dimensional interaction based on the complete touch screen of smartphone. The one-dimensional 'swipe the touchpad' interaction with a row of 'Cards' which replace traditional two-dimensional icon menu limits the intuitive and flexibility of HGI. Therefore, there is a growing interest in implementing 3D gesture recognition vision systems in which optical sensors capture real-time video of the user and ubiquitous algorithms are then used to determine what the user's gestures are, without the user having to hold any device. We will demonstrate a hand-free motion interaction application based on computer vision technology on google glass. Presented application allows user to perform touch-less interaction by hand or foot gesture in front of the camera of google glass. Based on the same core ubiquitous gestures recognition algorithm as used in this demonstration, a hybrid wearable smartphone system based on mixed hardware and software has been presented in our previous work [Lv 2013][Lu et al. 2013][Lv et al. 2013], which can support either hand or foot interaction with today' smartphone.

  • 31. Lv, Z.
    et al.
    Feng, S.
    Lal Khan, M. S.
    Ur Réhman, S.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Foot motion sensing: Augmented game interface based on foot interaction for smartphone2014In: Conference on Human Factors in Computing Systems - Proceedings, 2014, p. 293-296Conference paper (Refereed)
    Abstract [en]

    We designed and developmented two games: real-time augmented football game and augmented foot piano game to demonstrate a innovative interface based on foot motion sensing approach for smart phone. In the proposed novel interface, the computer vision based hybrid detection and tracking method provides a core support for foot interaction interface by accurately tracking the shoes. Based on the proposed interaction interface, wo demonstrations are developed, the applications employ augmented reality technology to render the game graphics and game status information on smart phones screen. The players interact with the game using foot interaction toward the rear camera, which triggers the interaction event. This interface supports basic foot motion sensing (i.e. direction of movement, velocity, rhythm).

  • 32. Lv, Z.
    et al.
    Halawani, A.
    Khan, M. S. L.
    Réhman, S. U.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Finger in air: Touch-less interaction on smartphone2013In: Proceedings of the 12th International Conference on Mobile and Ubiquitous Multimedia, MUM 2013, Association for Computing Machinery (ACM), 2013Conference paper (Refereed)
    Abstract [en]

    In this paper we present a vision based intuitive interaction method for smart mobile devices. It is based on markerless finger gesture detection which attempts to provide a 'natural user interface'. There is no additional hardware necessary for real-time finger gesture estimation. To evaluate the strengths and effectiveness of proposed method, we design two smart phone applications, namely circle menu application - provides user with graphics and smart phone's status information, and bouncing ball game- a finger gesture based bouncing ball application. The users interact with these applications using finger gestures through the smart phone's camera view, which trigger the interaction event and generate activity sequences for interactive buffers. Our preliminary user study evaluation demonstrates effectiveness and the social acceptability of proposed interaction approach.

  • 33. Lv, Zhihan
    et al.
    Feng, Liangbing
    Feng, Shengzhong
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Extending Touch-less Interaction on Vision Based Wearable Device2015In: 2015 IEEE VIRTUAL REALITY CONFERENCE (VR), IEEE conference proceedings, 2015, p. 231-232Conference paper (Refereed)
    Abstract [en]

    A touch-less interaction technology on vision based wearable device is designed and evaluated. Users interact with the application with dynamic hands/feet gestures in front of the camera. Several proof-of-concept prototypes with eleven dynamic gestures are developed based on the touch-less interaction. At last, a comparing user study evaluation is proposed to demonstrate the usability of the touch-less approach, as well as the impact on user's emotion, running on a wearable framework or Google Glass.

  • 34. Lv, Zhihan
    et al.
    Halawani, Alaa
    Feng, Shengzhong
    ur Rehman, Shafiq
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Touch-less interactive augmented reality game on vision-based wearable device2015In: Personal and Ubiquitous Computing, ISSN 1617-4909, E-ISSN 1617-4917, Vol. 19, no 3-4, p. 551-567Article in journal (Refereed)
    Abstract [en]

    There is an increasing interest in creating pervasive games based on emerging interaction technologies. In order to develop touch-less, interactive and augmented reality games on vision-based wearable device, a touch-less motion interaction technology is designed and evaluated in this work. Users interact with the augmented reality games with dynamic hands/feet gestures in front of the camera, which triggers the interaction event to interact with the virtual object in the scene. Three primitive augmented reality games with eleven dynamic gestures are developed based on the proposed touch-less interaction technology as proof. At last, a comparing evaluation is proposed to demonstrate the social acceptability and usability of the touch-less approach, running on a hybrid wearable framework or with Google Glass, as well as workload assessment, user's emotions and satisfaction.

  • 35. Lv, Zjhan
    et al.
    Halawani, Alaa
    Feng, Shengzhong
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Réhman, S.U
    Multimodal Hand and Foot Gesture Interaction for Handheld Devices2014In: ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP), ISSN 1551-6857, E-ISSN 1551-6865, Vol. 11, article id 10Article in journal (Refereed)
    Abstract [en]

    We present a hand-and-foot-based multimodal interaction approach for handheld devices. Our method combines input modalities (i.e., hand and foot) and provides a coordinated output to both modalities along with audio and video. Human foot gesture is detected and tracked using contour-based template detection (CTD) and Tracking-Learning-Detection (TLD) algorithm. 3D foot pose is estimated from passive homography matrix of the camera. 3D stereoscopic and vibrotactile are used to enhance the immersive feeling. We developed a multimodal football game based on the multimodal approach as a proof-of-concept. We confirm our systems user satisfaction through a user study.

  • 36. Shao, W.
    et al.
    Lin, Y.
    Bao, B.
    Wang, L.
    Ge, Q.
    Li, Haibo
    KTH, School of Electrical Engineering and Computer Science (EECS), Media Technology and Interaction Design, MID.
    Blind deblurring using discriminative image smoothing2018In: 1st Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2018, Springer Verlag , 2018, p. 490-500Conference paper (Refereed)
    Abstract [en]

    This paper aims to exploit the full potential of gradient-based methods, attempting to explore a simple, robust yet discriminative image prior for blind deblurring. The specific contributions are three-fold: Above all, a pure gradient-based heavy-tailed model is proposed as a generalized integration of the normalized sparsity and the relative total variation. On the second, a plug-and-play algorithm is deduced to alternatively estimate the intermediate sharp image and the nonparametric blur kernel. With the numerical scheme, image estimation is simplified to an image smoothing problem. Lastly, a great many experiments are performed accompanied with comparisons with state-of-the-art approaches on synthetic benchmark datasets and real blurry images in various scenarios. The experimental results show well the effectiveness and robustness of the proposed method. 

  • 37. Shao, W. -Z
    et al.
    Li, H. -B
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Elad, M.
    Bi-l0-l2-norm regularization for blind motion deblurring2015In: Journal of Visual Communication and Image Representation, ISSN 1047-3203, E-ISSN 1095-9076, Vol. 33, p. 42-59Article in journal (Refereed)
    Abstract [en]

    In blind motion deblurring, leading methods today tend towards highly non-convex approximations of the l<inf>0</inf>-norm, especially in the image regularization term. In this paper, we propose a simple, effective and fast approach for the estimation of the motion blur-kernel, through a bi-l<inf>0</inf>-l<inf>2</inf>-norm regularization imposed on both the intermediate sharp image and the blur-kernel. Compared with existing methods, the proposed regularization is shown to be more effective and robust, leading to a more accurate motion blur-kernel and a better final restored image. A fast numerical scheme is deployed for alternatingly computing the sharp image and the blur-kernel, by coupling the operator splitting and augmented Lagrangian methods. Experimental results on both a benchmark image dataset and real-world motion blurred images show that the proposed approach is highly competitive with state-of-the-art methods in both deblurring effectiveness and computational efficiency.

  • 38. Shao, Wen-Ze
    et al.
    Ge, Qi
    Deng, Hai-Song
    Wei, Zhi-Hui
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Motion Deblurring Using Non-stationary Image Modeling2015In: Journal of Mathematical Imaging and Vision, ISSN 0924-9907, E-ISSN 1573-7683, Vol. 52, no 2, p. 234-248Article in journal (Refereed)
    Abstract [en]

    It is well-known that shaken cameras or mobile phones during exposure usually lead to motion blurry photographs. Therefore, camera shake deblurring or motion deblurring is required and requested in many practical scenarios. The contribution of this paper is the proposal of a simple yet effective approach for motion blur kernel estimation, i.e., blind motion deblurring. Though there have been proposed severalmethods formotion blur kernel estimation in the literature, we impose a type of non-stationary Gaussian prior on the gradient fields of sharp images, in order to automatically detect and purse the salient edges of images as the important clues to blur kernel estimation. On one hand, the prior is able to promote sparsity inherited in the non-stationarity of the precision parameters (inverse of variances). On the other hand, since the prior is in a Gaussian form, there exists a great possibility of deducing a conceptually simple and computationally tractable inference scheme. Specifically, the well-known expectation-maximization algorithm is used to alternatingly estimate the motion blur kernels, the salient edges of images as well as the precision parameters in the image prior. In difference from many existing methods, no hyperpriors are imposed on any parameters in this paper; there are not any pre-processing steps involved in the proposed method, either, such as explicit suppression of random noise or prediction of salient edge structures. With estimated motion blur kernels, the deblurred images are finally generated using an off-the-shelf non-blind deconvolution method proposed by Krishnan and Fergus (Adv Neural Inf Process Syst 22:1033-1041, 2009). The rationality and effectiveness of our proposed method have been well demonstrated by the experimental results on both synthetic and realistic motion blurry images, showing state-of-the-art blind motion deblurring performance of the proposed approach in the term of quantitative metric as well as visual perception.

  • 39. Shao, Wen-Ze
    et al.
    Ge, Qi
    Gan, Zong-Liang
    Deng, Hai-Song
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    A Generalized Robust Minimization Framework for Low-Rank Matrix Recovery2014In: Mathematical problems in engineering (Print), ISSN 1024-123X, E-ISSN 1563-5147, p. 656074-Article in journal (Refereed)
    Abstract [en]

    This paper considers the problem of recovering low-rank matrices which are heavily corrupted by outliers or large errors. To improve the robustness of existing recovery methods, the problem is solved by formulating it as a generalized nonsmooth nonconvex minimization functional via exploiting the Schatten p-norm (0 < p <= 1) and L-q(0 <q <= 1) seminorm. Two numerical algorithms are provided based on the augmented Lagrange multiplier (ALM) and accelerated proximal gradient (APG) methods as well as efficient root-finder strategies. Experimental results demonstrate that the proposed generalized approach is more inclusive and effective compared with state-of-the-art methods, either convex or nonconvex.

  • 40.
    Shao, Wen-Ze
    et al.
    NUPT, Coll Telecommun & Informat Engn, Nanjing, Jiangsu, Peoples R China.;NUPT, Natl Engn Res Ctr Commun & Networking, Nanjing, Jiangsu, Peoples R China..
    Ge, Qi
    NUPT, Coll Telecommun & Informat Engn, Nanjing, Jiangsu, Peoples R China..
    Wang, Li-Qian
    NUPT, Coll Telecommun & Informat Engn, Nanjing, Jiangsu, Peoples R China..
    Lin, Yun-Zhi
    Georgia Inst Technol, Sch Elect & Comp Engn, Atlanta, GA 30332 USA.;Southeast Univ, Sch Automat, Nanjing, Jiangsu, Peoples R China..
    Deng, Hai-Song
    Nanjing Audit Univ, Sch Sci, Nanjing, Jiangsu, Peoples R China..
    Li, Haibo
    KTH, School of Electrical Engineering and Computer Science (EECS), Media Technology and Interaction Design, MID. NUPT, Coll Telecommun & Informat Engn, Nanjing, Jiangsu, Peoples R China.
    Nonparametric Blind Super-Resolution Using Adaptive Heavy-Tailed Priors2019In: Journal of Mathematical Imaging and Vision, ISSN 0924-9907, E-ISSN 1573-7683, Vol. 61, no 6, p. 885-917Article in journal (Refereed)
    Abstract [en]

    Single-image nonparametric blind super-resolution is a fundamental image restoration problem yet largely ignored in the past decades among the computational photography and computer vision communities. An interesting phenomenon is observed that learning-based single-image super-resolution (SR) has been experiencing a rapid development since the boom of the sparse representation in 2005s and especially the representation learning in 2010s, wherein the high-res image is generally blurred by a supposed bicubic or Gaussian blur kernel. However, the parametric assumption on the form of blur kernels does not hold in most practical applications because in real low-res imaging a high-res image can undergo complex blur processes, e.g., Gaussian-shaped kernels of varying sizes, ellipse-shaped kernels of varying orientations, curvilinear kernels of varying trajectories. The paper is mainly motivated by one of our previous works: Shao and Elad (in: Zhang (ed) ICIG 2015, Part III, Lecture notes in computer science, Springer, Cham, 2015). Specifically, we take one step further in this paper and present a type of adaptive heavy-tailed image priors, which result in a new regularized formulation for nonparametric blind super-resolution. The new image priors can be expressed and understood as a generalized integration of the normalized sparsity measure and relative total variation. Although it seems that the proposed priors are simple, the core merit of the priors is their practical capability for the challenging task of nonparametric blur kernel estimation for both super-resolution and deblurring. Harnessing the priors, a higher-quality intermediate high-res image becomes possible and therefore more accurate blur kernel estimation can be accomplished. A great many experiments are performed on both synthetic and real-world blurred low-res images, demonstrating the comparative or even superior performance of the proposed algorithm convincingly. Meanwhile, the proposed priors are demonstrated quite applicable to blind image deblurring which is a degenerated problem of nonparametric blind SR.

  • 41.
    Shao, Wen-Ze
    et al.
    Nanjing Univ Posts & Telecommun, Coll Telecommun & Informat Engn, Nanjing, Jiangsu, Peoples R China.;Nanjing Univ Posts & Telecommun, Natl Engn Res Ctr Commun & Networking, Nanjing, Jiangsu, Peoples R China..
    Xu, Jing-Jing
    Nanjing Univ Posts & Telecommun, Coll Telecommun & Informat Engn, Nanjing, Jiangsu, Peoples R China..
    Chen, Long
    Nanjing Univ Posts & Telecommun, Coll Telecommun & Informat Engn, Nanjing, Jiangsu, Peoples R China..
    Ge, Qi
    Nanjing Univ Posts & Telecommun, Coll Telecommun & Informat Engn, Nanjing, Jiangsu, Peoples R China..
    Wang, Li-Qian
    Nanjing Univ Posts & Telecommun, Coll Telecommun & Informat Engn, Nanjing, Jiangsu, Peoples R China..
    Bao, Bing-Kun
    Nanjing Univ Posts & Telecommun, Coll Telecommun & Informat Engn, Nanjing, Jiangsu, Peoples R China..
    Li, Haibo
    KTH, School of Electrical Engineering and Computer Science (EECS), Media Technology and Interaction Design, MID. Nanjing Univ Posts & Telecommun, Coll Telecommun & Informat Engn, Nanjing, Jiangsu, Peoples R China..
    On potentials of regularized Wasserstein generative adversarial networks for realistic hallucination of tiny faces2019In: Neurocomputing, ISSN 0925-2312, E-ISSN 1872-8286, Vol. 364, p. 1-15Article in journal (Refereed)
    Abstract [en]

    Super-resolution of facial images, a.k.a. face hallucination, has been intensively studied in the past decades due to the increasingly emerging analysis demands in video surveillance, e.g., face detection, verification, identification. However, the actual performance of most previous hallucination approaches will drop dramatically when a very low-res tiny face is provided, due to the challenging multimodality of the problem as well as lack of an informative prior as a strong semantic guidance. Inspired by the latest progress in deep unsupervised learning, this paper focuses on tiny faces of size 16 x 16 pixels, hallucinating them to their 8 x upsampling versions by exploring the potentials of Wasserstein generative adversarial networks (WGAN). Besides a pixel-wise L2 regularization term imposed to the generative model, it is found that our advocated autoencoding generator with both residual and skip connections is a critical component for WGAN representing the facial contour and semantic content to a reasonable precision. With the additional Lipschitz penalty and architectural considerations for the critic in WGAN, the proposed approach finally achieves state-of-the-art hallucination performance in terms of both visual perception and objective assessment. The cropped CelebA face dataset is primarily used to aid the tuning and analysis of the new method, termed as tfh-WGAN. Experimental results demonstrate that the proposed approach not only achieves realistic hallucination of tiny faces, but also adapts to pose, expression, illuminance and occluded variations to a great degree.

  • 42. Ur Rehman, S.
    et al.
    Khan, M. S. L.
    Li, L.
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Vibrotactile TV for immersive experience2014In: 2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2014, IEEE conference proceedings, 2014Conference paper (Refereed)
    Abstract [en]

    Audio and video are two powerful media forms to shorten the distance between audience/viewer and actors or players in the TV and films. The recent research shows that today people are using more and more multimedia contents on mobile devices, such as tablets and smartphones. Therefore, an important question emerges - how can we render high-quality, personal immersive experiences to consumers on these systems? To give audience an immersive engagement that differs from 'watching a play', we have designed a study to render complete immersive media which include the 'emotional information' based on augmented vibrotactile-coding on the back of the user along with audio-video signal. The reported emotional responses to videos viewed with and without haptic enhancement, show that participants exhibited an increased emotional response to media with haptic enhancement. Overall, these studies suggest that the effectiveness of our approach and using a multisensory approach increase immersion and user satisfaction.

  • 43. Wu, Jinsong
    et al.
    Bisio, Igor
    Gniady, Chris
    Hossain, Ekram
    Valla, Massimo
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Context-aware networking and communications: Part 12014In: IEEE Communications Magazine, ISSN 0163-6804, E-ISSN 1558-1896, Vol. 52, no 6, p. 14-15Article in journal (Refereed)
  • 44. Wu, Jinsong
    et al.
    Bisio, Igor
    Gniady, Chris
    Hossain, Ekram
    Valla, Massimo
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Context-aware networking and communications: Part 22014In: IEEE Communications Magazine, ISSN 0163-6804, E-ISSN 1558-1896, Vol. 52, no 8, p. 64-65Article in journal (Refereed)
  • 45.
    Xie, Shipeng
    et al.
    Nanjing Univ Posts & Telecommun, Coll Telecommun & Informat Engn, Nanjing 210003, Jiangsu, Peoples R China..
    Yang, Chengyuan
    Nanjing Univ Posts & Telecommun, Coll Telecommun & Informat Engn, Nanjing 210003, Jiangsu, Peoples R China..
    Zhang, Zijian
    Cent S Univ, Dept Oncol, Xiangya Hosp, Changsha 410008, Hunan, Peoples R China..
    Li, Haibo
    KTH, School of Electrical Engineering and Computer Science (EECS), Media Technology and Interaction Design, MID. Nanjing Univ Posts & Telecommun, Coll Telecommun & Informat Engn, Nanjing 210003, Jiangsu, Peoples R China..
    Scatter Artifacts Removal Usings Using Learning-Based Method for CBCT in IGRT System2018In: IEEE Access, E-ISSN 2169-3536, Vol. 6, p. 78031-78037Article in journal (Refereed)
    Abstract [en]

    Cone-beam-computed tomography (CBCT) has shown enormous potential in recent years, but it is limited by severe scatter artifacts. This paper proposes a scatter-correction algorithm based on a deep convolutional neural network to reduce artifacts for CBCT in an image-guided radiation therapy (IGRT) system. A two-step registration method that is essential in our algorithm is implemented to preprocess data before training. The testing result on real data acquired from the IGRT system demonstrates the ability of our approach to learn artifacts distribution. Furthermore, the proposed method can be applied to enhance the performance on such applications as dose estimation and segmentation.

  • 46.
    Xie, Shipeng
    et al.
    Nanjing Univ Posts & Telecommun, Coll Telecommun & Informat Engn, Nanjing 210003, Jiangsu, Peoples R China..
    Zheng, Xinyu
    Nanjing Univ Posts & Telecommun, Coll Telecommun & Informat Engn, Nanjing 210003, Jiangsu, Peoples R China..
    Shao, Wen-Ze
    Nanjing Univ Posts & Telecommun, Coll Telecommun & Informat Engn, Nanjing 210003, Jiangsu, Peoples R China..
    Zhang, Yu-Dong
    Univ Leicester, Dept Informat, Leicester LE1 7RH, Leics, England..
    Lv, Tianxiang
    Southeast Univ, Lab Image Sci & Technol, Key Lab Comp Network & Informat Integrat, Minist Educ, Nanjing 210096, Jiangsu, Peoples R China..
    Li, Haibo
    KTH, School of Electrical Engineering and Computer Science (EECS), Media Technology and Interaction Design, MID. Nanjing Univ Posts & Telecommun, Coll Telecommun & Informat Engn, Nanjing 210003, Jiangsu, Peoples R China..
    Non-Blind Image Deblurring Method by the Total Variation Deep Network2019In: IEEE Access, E-ISSN 2169-3536, Vol. 7, p. 37536-37544Article in journal (Refereed)
    Abstract [en]

    There are a lot of non-blind image deblurring methods, especially with the total variation (TV) model-based method. However, how to choose the parameters adaptively for regularization is a major open problem. We proposed a very novel method that is based on the TV deep network to learn the best parameters adaptively for regularization. We used deep learning and prior knowledge to set up a TV-based deep network and calculate the parameters of regularization, such as biases and weights. Therefore, we used the idea of a deep network to update these parameters automatically to avoid sophisticated calculations. Our experimental results by our proposed network are significantly better than several other methods, in respect of detail retention and anti-noise performance. At the same time, we can achieve the same effect with a minimum number of training sets, thus speeding up the calculation.

  • 47. Yan, J.
    et al.
    Lu, G.
    Li, Haibo
    KTH, School of Electrical Engineering and Computer Science (EECS), Media Technology and Interaction Design, MID. College of Telecommunications & Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing, 210003, China.
    Wang, S.
    Bimodal emotion recognition based on facial expression and speech2018In: Journal of Nanjing University of Posts and Telecommunications, ISSN 1673-5439, Vol. 38, no 1, p. 60-65Article in journal (Refereed)
    Abstract [en]

    In the area of future artificial intelligence, the emotion recognition of the computers will play a more important role. For the bimodal emotion recognition from facial expression and speech, a feature fusion method based on sparse canonical correlation analysis is presented. Firstly, the emotion features from facial expression and speech are respectively extract. Then, the parse canonical correlation analysis is used to fuse the bimodal emotion features. Finally, the K-nearest neighbor classifier is used for emotion recognition. The experimental results show that the bimodal method based on the sparse canonical correlation analysis can obtain better recognition rate than the speech and the facial expression with single modal.

  • 48.
    Yang, Bin
    et al.
    Xian Univ Architecture & Technol, Sch Bldg Serv Sci & Engn, Xian 710055, Shaanxi, Peoples R China.;Umea Univ, Dept Appl Phys & Elect, S-90187 Umea, Sweden..
    Cheng, Xiaogang
    KTH, School of Electrical Engineering and Computer Science (EECS). Nanjing Univ Posts & Telecommun, Coll Telecommun & Informat Engn, Nanjing 210003, Jiangsu, Peoples R China.;Swiss Fed Inst Technol, Comp Vis Lab, CH-8092 Zurich, Switzerland..
    Dai, Dengxin
    Swiss Fed Inst Technol, Comp Vis Lab, CH-8092 Zurich, Switzerland..
    Olofsson, Thomas
    Umea Univ, Dept Appl Phys & Elect, S-90187 Umea, Sweden..
    Li, Haibo
    KTH, School of Electrical Engineering and Computer Science (EECS), Media Technology and Interaction Design, MID.
    Meier, Alan
    Univ Calif Davis, Energy & Efficiency Inst, Davis, CA 95616 USA..
    Real-time and contactless measurements of thermal discomfort based on human poses for energy efficient control of buildings2019In: Building and Environment, ISSN 0360-1323, E-ISSN 1873-684X, Vol. 162, article id UNSP 106284Article in journal (Refereed)
    Abstract [en]

    Individual thermal discomfort perception gives important feedback signals for energy efficient control of building heating, ventilation and air conditioning systems. However, there is few effective method to measure thermal discomfort status of occupants in a real-time and contactless way. A novel method based on contactless measurements of human thermal discomfort status was presented. Images of occupant poses, which are related to thermoregulation mechanisms, were captured by a digital camera and the corresponding 2D coordinates were obtained. These poses were converted into skeletal configurations. An algorithm was developed to recognize different poses related to thermal discomfort, such as hugging oneself or wiping sweat off the brow. The algorithm could recognize twelve thermal discomfort related human poses. These poses were derived from a questionnaire survey of 369 human subjects. Some other human subjects participated in the validation experiments of the proposed method. All twelve thermal discomfort related poses can be recognized effectively.

  • 49.
    Yousefi, Shahrouz
    et al.
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Abedan Kondori, Farid
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID. KTH.
    Bare-hand Gesture Recognition and Tracking through the Large-scale Image Retrieval2014In: VISAPP 2014 - Proceedings of the 9th International Conference on Computer Vision Theory and Applications, 2014Conference paper (Refereed)
  • 50.
    Yousefi, Shahrouz
    et al.
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    Abedan Kondori, Farid
    Li, Haibo
    KTH, School of Computer Science and Communication (CSC).
    Interactive 3D Visualization on a 4K Wall-Sized Display2014Conference paper (Refereed)
    Abstract [en]

    This paper introduces a novel vision-based approach for realistic interaction between user and display's content. An extremely accurate motion capture system is proposed to measure and track the user's head motion in 3D space. Video frames captured by the low-cost head-mounted camera are processed to retrieve the 3D motion parameters. The retrieved information facilitates the real-time 3D interaction. This technology turns any 2D screen to interactive 3D display, enabling users to control and manipulate the content as a digital window. The proposed system is tested and verified on a huge wall-sized 4K screen.

12 1 - 50 of 66
CiteExportLink to result list
Permanent link
Cite
Citation style