Change search
Refine search result
1 - 30 of 30
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the 'Create feeds' function.
  • 1.
    Azizpour, Hossein
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Razavian, Ali Sharif
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Sullivan, Josephine
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Maki, Atsuto
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Carlsson, Stefan
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    From Generic to Specific Deep Representations for Visual Recognition2015In: Proceedings of CVPR 2015, IEEE conference proceedings, 2015Conference paper (Refereed)
    Abstract [en]

    Evidence is mounting that ConvNets are the best representation learning method for recognition. In the common scenario, a ConvNet is trained on a large labeled dataset and the feed-forward units activation, at a certain layer of the network, is used as a generic representation of an input image. Recent studies have shown this form of representation to be astoundingly effective for a wide range of recognition tasks. This paper thoroughly investigates the transferability of such representations w.r.t. several factors. It includes parameters for training the network such as its architecture and parameters of feature extraction. We further show that different visual recognition tasks can be categorically ordered based on their distance from the source task. We then show interesting results indicating a clear correlation between the performance of tasks and their distance from the source task conditioned on proposed factors. Furthermore, by optimizing these factors, we achieve stateof-the-art performances on 16 visual recognition tasks.

  • 2.
    Azizpour, Hossein
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Sharif Razavian, Ali
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Sullivan, Josephine
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Maki, Atsuto
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Carlssom, Stefan
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Factors of Transferability for a Generic ConvNet Representation2016In: IEEE Transaction on Pattern Analysis and Machine Intelligence, ISSN 0162-8828, E-ISSN 1939-3539, Vol. 38, no 9, p. 1790-1802, article id 7328311Article in journal (Refereed)
    Abstract [en]

    Evidence is mounting that Convolutional Networks (ConvNets) are the most effective representation learning method for visual recognition tasks. In the common scenario, a ConvNet is trained on a large labeled dataset (source) and the feed-forward units activation of the trained network, at a certain layer of the network, is used as a generic representation of an input image for a task with relatively smaller training set (target). Recent studies have shown this form of representation transfer to be suitable for a wide range of target visual recognition tasks. This paper introduces and investigates several factors affecting the transferability of such representations. It includes parameters for training of the source ConvNet such as its architecture, distribution of the training data, etc. and also the parameters of feature extraction such as layer of the trained ConvNet, dimensionality reduction, etc. Then, by optimizing these factors, we show that significant improvements can be achieved on various (17) visual recognition tasks. We further show that these visual recognition tasks can be categorically ordered based on their similarity to the source task such that a correlation between the performance of tasks and their similarity to the source task w.r.t. the proposed factors is observed.

  • 3.
    Brudfors, Mikael
    et al.
    KTH, School of Computer Science and Communication (CSC).
    Seitel, Alexander
    University of British Columbia.
    Rasoulian, Abtin
    University of British Columbia.
    Lasso, Andras
    Queens University, Canada.
    Lessoway, Victoria
    Woman's Hospital, Vancouver, Canada.
    Osborn, Jill
    St Pauls Hospital, Vancouver, Canada.
    Maki, Atsuto
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Rohling, Robert
    University of British Columbia.
    Abolmaesumi, Purang
    University of British Columbia.
    Towards real-time, tracker-less 3D ultrasound guidance for spine anaesthesia2015In: International Conference on Information Processing in Computer-Assisted Interventions, 2015Conference paper (Refereed)
  • 4.
    Brudfors, Mikael
    et al.
    KTH, School of Computer Science and Communication (CSC).
    Seitel, Alexander
    University of British Columbia.
    Rasoulian, Abtin
    University of British Columbia.
    Lasso, Andras
    Queens University, Canada.
    Lessoway, Victoria
    Woman's Hospital, Vancouver, Canada.
    Osborn, Jill
    St Pauls Hospital, Vancouver, Canada.
    Maki, Atsuto
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Rohling, Robert
    University of British Columbia.
    Abolmaesumi, Purang
    University of British Columbia.
    Towards real-time, tracker-less 3D ultrasound guidance for spine anaesthesia2015In: International Journal of Computer Assisted Radiology and Surgery, ISSN 1861-6410, E-ISSN 1861-6429, Vol. 10, no 6, p. 855-865Article in journal (Refereed)
    Abstract [en]

    Purpose: Epidural needle insertions and facet joint injections play an important role in spine anaesthesia. The main challenge of safe needle insertion is the deep location of the target, resulting in a narrow and small insertion channel close to sensitive anatomy. Recent approaches utilizing ultrasound (US) as a low-cost and widely available guiding modality are promising but have yet to become routinely used in clinical practice due to the difficulty in interpreting US images, their limited view of the internal anatomy of the spine, and/or inclusion of cost-intensive tracking hardware which impacts the clinical workflow. Methods: We propose a novel guidance system for spine anaesthesia. An efficient implementation allows us to continuously align and overlay a statistical model of the lumbar spine on the live 3D US stream without making use of additional tracking hardware. The system is evaluated in vivo on 12 volunteers. Results: The in vivo study showed that the anatomical features of the epidural space and the facet joints could be continuously located, at a volume rate of 0.5 Hz, within an accuracy of 3 and 7 mm, respectively. Conclusions: A novel guidance system for spine anaesthesia has been presented which augments a live 3D US stream with detailed anatomical information of the spine. Results from an in vivo study indicate that the proposed system has potential for assisting the physician in quickly finding the target structure and planning a safe insertion trajectory in the spine.

  • 5. Eklundh, Jan-Olof
    et al.
    Uhlin, Tomas
    Nordlund, Peter
    Maki, Atsuto
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Active Vision and Seeing Robots1996In: International Symposium on Robotics Research, 1996Conference paper (Refereed)
  • 6. Eklundh, Jan-Olof
    et al.
    Uhlin, Tomas
    Nordlund, Peter
    Maki, Atsuto
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Developing an Active Observer1995In: Asian Conference on Computer Vision, 1995, Vol. 1035, p. 181-190Conference paper (Refereed)
  • 7.
    Fukui, Kazuhiro
    et al.
    Tsukuba University, Japan.
    Maki, Atsuto
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Difference subspace and its generalization for subspace-based methods2015In: IEEE Transaction on Pattern Analysis and Machine Intelligence, ISSN 0162-8828, E-ISSN 1939-3539, Vol. 37, no 11, p. 2164-2177Article in journal (Refereed)
    Abstract [en]

    Subspace-based methods are known to provide a practical solution for image set-based object recognition. Based on the insight that local shape differences between objects offer a sensitive cue for recognition, this paper addresses the problem of extracting a subspace representing the difference components between class subspaces generated from each set of object images independently of each other. We first introduce the difference subspace (DS), a novel geometric concept between two subspaces as an extension of a difference vector between two vectors, and describe its effectiveness in analyzing shape differences. We then generalize it to the generalized difference subspace (GDS) for multi-class subspaces, and show the benefit of applying this to subspace and mutual subspace methods, in terms of recognition capability. Furthermore, we extend these methods to kernel DS (KDS) and kernel GDS (KGDS) by a nonlinear kernel mapping to deal with cases involving larger changes in viewing direction. In summary, the contributions of this paper are as follows: 1) a DS/KDS between two class subspaces characterizes shape differences between the two respectively corresponding objects, 2) the projection of an input vector onto a DS/KDS realizes selective visualization of shape differences between objects, and 3) the projection of an input vector or subspace onto a GDS/KGDS is extremely effective at extracting differences between multiple subspaces, and therefore improves object recognition performance. We demonstrate validity through shape analysis on synthetic and real images of 3D objects as well as extensive comparison of performance on classification tests with several related methods; we study the performance in face image classification on the Yale face database B+ and the CMU Multi-PIE database, and hand shape classification of multi-view images.

  • 8.
    Ghadirzadeh, Ali
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Bütepage, Judith
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Maki, Atsuto
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Björkman, Mårten
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    A sensorimotor reinforcement learning framework for physical human-robot interaction2016In: IEEE International Conference on Intelligent Robots and Systems, IEEE, 2016, p. 2682-2688Conference paper (Refereed)
    Abstract [en]

    Modeling of physical human-robot collaborations is generally a challenging problem due to the unpredictive nature of human behavior. To address this issue, we present a data-efficient reinforcement learning framework which enables a robot to learn how to collaborate with a human partner. The robot learns the task from its own sensorimotor experiences in an unsupervised manner. The uncertainty in the interaction is modeled using Gaussian processes (GP) to implement a forward model and an actionvalue function. Optimal action selection given the uncertain GP model is ensured by Bayesian optimization. We apply the framework to a scenario in which a human and a PR2 robot jointly control the ball position on a plank based on vision and force/torque data. Our experimental results show the suitability of the proposed method in terms of fast and data-efficient model learning, optimal action selection under uncertainty and equal role sharing between the partners.

  • 9.
    Ghadirzadeh, Ali
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Kootstra, Gert
    Wageningen University, The Netherlands.
    Maki, Atsuto
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Björkman, Mårten
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Learning visual forward models to compensate for self-induced image motion2014In: 23rd IEEE International Conference on Robot and Human Interactive Communication: IEEE RO-MAN, IEEE , 2014, p. 1110-1115Conference paper (Refereed)
    Abstract [en]

    Predicting the sensory consequences of an agent's own actions is considered an important skill for intelligent behavior. In terms of vision, so-called visual forward models can be applied to learn such predictions. This is no trivial task given the high-dimensionality of sensory data and complex action spaces. In this work, we propose to learn the visual consequences of changes in pan and tilt of a robotic head using a visual forward model based on Gaussian processes and SURF correspondences. This is done without any assumptions on the kinematics of the system or requirements on calibration. The proposed method is compared to an earlier work using accumulator-based correspondences and Radial Basis function networks. We also show the feasibility of the proposed method for detection of independent motion using a moving camera system. By comparing the predicted and actual captured images, image motion due to the robot's own actions and motion caused by moving external objects can be distinguished. Results show the proposed method to be preferable from the earlier method in terms of both prediction errors and ability to detect independent motion.

  • 10.
    Ghadirzadeh, Ali
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Maki, Atsuto
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Björkman, Mårten
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS. KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    A Sensorimotor Approach for Self-Learning of Hand-Eye Coordination2015In: IEEE/RSJ International Conference onIntelligent Robots and Systems, Hamburg, September 28 - October 02, 2015, IEEE conference proceedings, 2015, p. 4969-4975Conference paper (Refereed)
    Abstract [en]

    This paper presents a sensorimotor contingencies (SMC) based method to fully autonomously learn to perform hand-eye coordination. We divide the task into two visuomotor subtasks, visual fixation and reaching, and implement these on a PR2 robot assuming no prior information on its kinematic model. Our contributions are three-fold: i) grounding a robot in the environment by exploiting SMCs in the action planning system, which eliminates the need for prior knowledge of the kinematic or dynamic models of the robot; ii) using a forward model to search for proper actions to solve the task by minimizing a cost function, instead of training a separate inverse model, to speed up training; iii) encoding 3D spatial positions of a target object based on the robot’s joint positions, thus avoiding calibration with respect to an external coordinate system. The method is capable of learning the task of hand-eye coordination from scratch by less than 20 sensory-motor pairs that are iteratively generated at real-time speed. In order to examine the robustness of the method while dealing with nonlinear image distortions, we apply a so-called retinal mapping image deformation to the input images. Experimental results show the successfulness of the method even under considerable image deformations.

  • 11.
    Ghadirzadeh, Ali
    et al.
    KTH, School of Computer Science and Communication (CSC), Robotics, perception and learning, RPL.
    Maki, Atsuto
    KTH, School of Computer Science and Communication (CSC), Robotics, perception and learning, RPL.
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Robotics, perception and learning, RPL.
    Björkman, Mårten
    KTH, School of Computer Science and Communication (CSC), Robotics, perception and learning, RPL.
    Deep predictive policy training using reinforcement learning2017In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2017, Institute of Electrical and Electronics Engineers (IEEE), 2017, p. 2351-2358, article id 8206046Conference paper (Refereed)
    Abstract [en]

    Skilled robot task learning is best implemented by predictive action policies due to the inherent latency of sensorimotor processes. However, training such predictive policies is challenging as it involves finding a trajectory of motor activations for the full duration of the action. We propose a data-efficient deep predictive policy training (DPPT) framework with a deep neural network policy architecture which maps an image observation to a sequence of motor activations. The architecture consists of three sub-networks referred to as the perception, policy and behavior super-layers. The perception and behavior super-layers force an abstraction of visual and motor data trained with synthetic and simulated training samples, respectively. The policy super-layer is a small subnetwork with fewer parameters that maps data in-between the abstracted manifolds. It is trained for each task using methods for policy search reinforcement learning. We demonstrate the suitability of the proposed architecture and learning framework by training predictive policies for skilled object grasping and ball throwing on a PR2 robot. The effectiveness of the method is illustrated by the fact that these tasks are trained using only about 180 real robot attempts with qualitative terminal rewards.

  • 12.
    Högman, Virgile
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Björkman, Mårten
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Maki, Atsuto
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    A sensorimotor learning framework for object categorization2016In: IEEE Transactions on Cognitive and Developmental Systems, ISSN 2379-8920, Vol. 8, no 1, p. 15-25Article in journal (Refereed)
    Abstract [en]

    This paper presents a framework that enables a robot to discover various object categories through interaction. The categories are described using action-effect relations, i.e. sensorimotor contingencies rather than more static shape or appearance representation. The framework provides a functionality to classify objects and the resulting categories, associating a class with a specific module. We demonstrate the performance of the framework by studying a pushing behavior in robots, encoding the sensorimotor contingencies and their predictability with Gaussian Processes. We show how entropy-based action selection can improve object classification and how functional categories emerge from the similarities of effects observed among the objects. We also show how a multidimensional action space can be realized by parameterizing pushing using both position and velocity.

  • 13.
    Johansson, Johan
    et al.
    KTH, School of Computer Science and Communication (CSC).
    Solli, Martin
    FLIR Systems AB.
    Maki, Atsuto
    KTH, School of Computer Science and Communication (CSC), Robotics, perception and learning, RPL.
    An Evaluation of Local Feature Detectors and Descriptors for Infrared Images2016In: Lecture Notes in Computer Science, Volume 9915, 2016, p. 711-723Conference paper (Refereed)
    Abstract [en]

    This paper provides a comparative performance evaluation of local features for infrared (IR) images across different combinations of common detectors and descriptors. Although numerous studies report comparisons of local features designed for ordinary visual images, their performance on IR images is far less charted. We perform a systematic investigation, thoroughly exploiting the established benchmark while also introducing a new IR image data set. The contribution is two-fold: we (i) evaluate the performance of both local float type and more recent binary type detectors and descriptors in their combinations under a variety (6 kinds) of image transformations, and (ii) make a new IR image data set publicly available. Through our investigation we gain novel and useful insights for applying state-of-the art local features to IR images with different properties.

  • 14.
    Maki, Atsuto
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Bretzner, Lars
    Eklundh, Jan-Olof
    Local Fourir Phase and Disparity Estimates: An Analytical Study1995In: International Conference on Computer Analysis of Images and Patterns, 1995, Vol. 970, p. 868-873Conference paper (Refereed)
  • 15.
    Maki, Atsuto
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Nordlund, Peter
    Eklundh, Jan-Olof
    A computational model of depth-based attention1996In: International Conference on Pattern Recognition, 1996, p. 734-739Conference paper (Refereed)
  • 16.
    Maki, Atsuto
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Nordlund, Peter
    Eklundh, Jan-Olof
    Attentional Scene Segmentation: Integrating Depth and Motion2000In: Computer Vision and Image Understanding, Vol. 78, no 3, p. 351-373Article in journal (Refereed)
  • 17.
    Maki, Atsuto
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Uhlin, Tomas
    Disparity Selection in Binocular Pursuit1995In: IEICE transactions on information and systems, ISSN 0916-8532, E-ISSN 1745-1361, Vol. E78-D, no 12, p. 1591-1597Article in journal (Refereed)
  • 18.
    Maki, Atsuto
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Uhlin, Tomas
    Eklundh, Jan-Olof
    A Direct Disparity Estimation Technique for Depth Segmentation1996In: IAPR Workshop on Machine Vision Applications, 1996, p. 530-533Conference paper (Refereed)
  • 19.
    Maki, Atsuto
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Uhlin, Tomas
    Eklundh, Jan-Olof
    Disparity Selection in Binocular Pursuit1994In: IAPR Workshop on Machine Vision Applications, 1994, p. 182-185Conference paper (Refereed)
  • 20.
    Maki, Atsuto
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Uhlin, Tomas
    Eklundh, Jan-Olof
    Phase-Based Disparity Estimation in Binocular Tracking1993In: Scandinavian Conference on Image Analysis, 1993Conference paper (Refereed)
  • 21.
    Mizuyama, Hajime
    et al.
    Department of Mechanical Engineering and Science, Kyoto University.
    Yamada, Kayo
    Department of Mechanical Engineering and Science, Kyoto University.
    Tanaka, Kazuto
    Department of Biomedical Engineering, Doshisha University.
    Maki, Atsuto
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. Graduate School of Informatics, Kyoto University.
    Explanatory analysis of the manner in which an instructor adaptively organizes skilled motion teaching process2013In: International Journal of Industrial Ergonomics, ISSN 0169-8141, E-ISSN 1872-8219, Vol. 43, no 5, p. 430-438Article in journal (Refereed)
    Abstract [en]

    Mastering a skilled motion usually requires a step-by-step progression through multiple learning phases with different subgoals. It is not easy for a learner to properly organize such a complex learning process without assistance. Hence, this task is often facilitated interactively by a human instructor through verbal advice. In many cases, the instructor's teaching strategy in relation to decomposing the entire learning process into phases, setting a subgoal for each learning phase, choosing verbal advice to guide the learner toward this subgoal, etc. remains intuitive and has not yet been formally understood. Thus, taking the basic motion of wok handling as an example, this paper presents several concrete teaching processes involving an advice sequence and the corresponding changes in the motion performance in a feature variable space. Thereby, the paper analyzes and represents the actual strategy taken in an easy-to-interpret form. As a result, it confirms that the instructor determines the set of advice elements to be given based, not simply on the observable characteristics of the latest motion performance, but more adaptively upon the interaction history with the learner. Relevance to industry: Teaching a skilled motion efficiently is essential in various industrial sectors such as those involving manual assembly. An experienced instructor may adaptively organize the entire interactive process of teaching a learner to accelerate the learning of correct motion skills.

  • 22. Nawata, Shinya
    et al.
    Maki, Atsuto
    KTH, School of Electrical Engineering and Computer Science (EECS).
    Takashi, Hikihara
    Power packet transferability via symbol propagation matrix2018In: Proceedings of the Royal Society. Mathematical, Physical and Engineering Sciences, ISSN 1364-5021, E-ISSN 1471-2946, Vol. 474, no 2213, article id 20170552Article in journal (Refereed)
    Abstract [en]

    A power packet is a unit of electric power composed of a power pulse and an information tag. In Shannon’s information theory, messages are represented by symbol sequences in a digitized manner. Referring to this formulation, we define symbols in power packetization as a minimum unit of power transferred by a tagged pulse. Here, power is digitized and quantized. In this paper, we consider packetized power in networks for a finite duration, giving symbols and their energies to the networks. A network structure is defined using a graph whose nodes represent routers, sources and destinations. First, we introduce the concept of a symbol propagation matrix (SPM) in which symbols are transferred at links during unit times. Packetized power is described as a network flow in a spatio-temporal structure. Then, we study the problem of selecting an SPM in terms of transferability, that is, the possibility to represent given energies at sources and destinations during the finite duration. To select an SPM, we consider a network flow problem of packetized power. The problem is formulated as an M-convex submodular flow problem which is a solvable generalization of the minimum cost flow problem. Finally, through examples, we verify that this formulation provides reasonable packetized power.

    The full text will be freely available from 2019-05-17 16:34
  • 23. Norlander, Rickard
    et al.
    Grahn, Josef
    Maki, Atsuto
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Wooden Knot Detection Using ConvNet Transfer Learning2015In: Scandinavian Conference on Image Analysis, 2015Conference paper (Refereed)
  • 24. Olczak, Jakub
    et al.
    Fahlberg, Niklas
    Maki, Atsuto
    KTH, School of Computer Science and Communication (CSC), Robotics, perception and learning, RPL.
    Razavian, Ali Sharif
    KTH, School of Computer Science and Communication (CSC), Robotics, perception and learning, RPL. Danderyd Hosp, Karolinska Inst, Sweden.
    Jilert, Anthony
    Stark, Andre
    Skoldenberg, Olof
    Gordon, Max
    Artificial intelligence for analyzing orthopedic trauma radiographs Deep learning algorithms-are they on par with humans for diagnosing fractures?2017In: Acta Orthopaedica, ISSN 1745-3674, E-ISSN 1745-3682, Vol. 88, no 6, p. 581-586Article in journal (Refereed)
    Abstract [en]

    Background and purpose - Recent advances in artificial intelligence (deep learning) have shown remarkable performance in classifying non-medical images, and the technology is believed to be the next technological revolution. So far it has never been applied in an orthopedic setting, and in this study we sought to determine the feasibility of using deep learning for skeletal radiographs. Methods - We extracted 256,000 wrist, hand, and ankle radiographs from Danderyd's Hospital and identified 4 classes: fracture, laterality, body part, and exam view. We then selected 5 openly available deep learning networks that were adapted for these images. The most accurate network was benchmarked against a gold standard for fractures. We furthermore compared the network's performance with 2 senior orthopedic surgeons who reviewed images at the same resolution as the network. Results - All networks exhibited an accuracy of at least 90% when identifying laterality, body part, and exam view. The final accuracy for fractures was estimated at 83% for the best performing network. The network performed similarly to senior orthopedic surgeons when presented with images at the same resolution as the network. The 2 reviewer Cohen's kappa under these conditions was 0.76. Interpretation - This study supports the use for orthopedic radiographs of artificial intelligence, which can perform at a human level. While current implementation lacks important features that surgeons require, e.g. risk of dislocation, classifications, measurements, and combining multiple exam views, these problems have technical solutions that are waiting to be implemented for orthopedics.

  • 25.
    Papakostas, Ioannis
    et al.
    KTH, School of Information and Communication Technology (ICT). Semiconductor Energy Laboratory Co., Ltd..
    Dikaros, Georgios
    KTH, School of Information and Communication Technology (ICT). Semiconductor Energy Laboratory Co., Ltd..
    Maeda, Shuhei
    SEL.
    Ohmaru, Takuro
    SEL.
    Ikeda, Takayuki
    SEL.
    Maki, Atsuto
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Yamazaki, Shunpei
    SEL.
    Efficient Motion Capturing and Idling Stop Display system utilizing CAAC: IGZO semiconductor FETs2015In: / [ed] Information Processing Society of Japan (IPSJ), 2015Conference paper (Refereed)
    Abstract [en]

    This paper focuses on the integration of two applications of crystalline oxide semiconductor FETs, developed by SEL, the Motion capturing system and the Idling Stop (IDS) Display, in order to create a new efficient motion capturing and display system. For the idling stop feature, refreshing is performed only if there are differences between the current displayed frame and the one that is input to the device. The new system ensures efficiency by eliminating the need for an image processor and extra memory on the display side, since comparison has already been performed by an analog processor on the Motion capturing system and images are fed to the display through an interface which supports SVGA transmission.

  • 26.
    Pham, M.-T.
    et al.
    Toshiba Research Europe Ltd..
    Woodford, O. J.
    Toshiba Research Europe Ltd..
    Perbet, F.
    Toshiba Research Europe Ltd..
    Maki, Atsuto
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Gherardi, R.
    Toshiba Research Europe Ltd..
    Stenger, B.
    Toshiba Research Europe Ltd..
    Cipolla, R.
    University of Cambridge.
    Distances and Means of Direct Similarities2014In: International Journal of Computer Vision, ISSN 0920-5691, E-ISSN 1573-1405, Vol. 112, no 3, p. 285-306Article in journal (Refereed)
    Abstract [en]

    The non-Euclidean nature of direct isometries in a Euclidean space, i.e. transformations consisting of a rotation and a translation, creates difficulties when computing distances, means and distributions over them, which have been well studied in the literature. Direct similarities, transformations consisting of a direct isometry and a positive uniform scaling, present even more of a challenge—one which we demonstrate and address here. In this article, we investigate divergences (a superset of distances without constraints on symmetry and sub-additivity) for comparing direct similarities, and means induced by them via minimizing a sum of squared divergences. We analyze several standard divergences: the Euclidean distance using the matrix representation of direct similarities, a divergence from Lie group theory, and the family of all left-invariant distances derived from Riemannian geometry. We derive their properties and those of their induced means, highlighting several shortcomings. In addition, we introduce a novel family of left-invariant divergences, called SRT divergences, which resolve several issues associated with the standard divergences. In our evaluation we empirically demonstrate the derived properties of the divergences and means, both qualitatively and quantitatively, on synthetic data. Finally, we compare the divergences in a real-world application: vote-based, scale-invariant object recognition. Our results show that the new divergences presented here, and their means, are both more effective and faster to compute for this task.

  • 27.
    Razavian, Ali Sharif
    et al.
    KTH, School of Computer Science and Communication (CSC), Robotics, perception and learning, RPL.
    Sullivan, Josephine
    KTH.
    Carlsson, Stefan
    KTH.
    Maki, Atsuto
    KTH, School of Computer Science and Communication (CSC), Robotics, perception and learning, RPL.
    Visual instance retrieval with deep convolutional networks2016In: ITE Transactions on Media Technology and Applications, ISSN 2186-7364, Vol. 4, no 3, p. 251-258Article in journal (Refereed)
    Abstract [en]

    This paper provides an extensive study on the availability of image representations based on convolutional networks (ConvNets) for the task of visual instance retrieval. Besides the choice of convolutional layers, we present an efficient pipeline exploiting multi-scale schemes to extract local features, in particular, by taking geometric invariance into explicit account, i.e. positions, scales and spatial consistency. In our experiments using five standard image retrieval datasets, we demonstrate that generic ConvNet image representations can outperform other state-of-the-art methods if they are extracted appropriately.

  • 28.
    Sharif Razavian, Ali
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Azizpour, Hossein
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Maki, Atsuto
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Sullivan, Josephine
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Ek, Carl Henrik
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Carlsson, Stefan
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Persistent Evidence of Local Image Properties in Generic ConvNets2015In: Image Analysis: 19th Scandinavian Conference, SCIA 2015, Copenhagen, Denmark, June 15-17, 2015. Proceedings / [ed] Paulsen, Rasmus R., Pedersen, Kim S., Springer Publishing Company, 2015, p. 249-262Conference paper (Refereed)
    Abstract [en]

    Supervised training of a convolutional network for object classification should make explicit any information related to the class of objects and disregard any auxiliary information associated with the capture of the image or thevariation within the object class. Does this happen in practice? Although this seems to pertain to the very final layers in the network, if we look at earlier layers we find that this is not the case. Surprisingly, strong spatial information is implicit. This paper addresses this, in particular, exploiting the image representation at the first fully connected layer,i.e. the global image descriptor which has been recently shown to be most effective in a range of visual recognition tasks. We empirically demonstrate evidences for the finding in the contexts of four different tasks: 2d landmark detection, 2d object keypoints prediction, estimation of the RGB values of input image, and recovery of semantic label of each pixel. We base our investigation on a simple framework with ridge rigression commonly across these tasks,and show results which all support our insight. Such spatial information can be used for computing correspondence of landmarks to a good accuracy, but should potentially be useful for improving the training of the convolutional nets for classification purposes.

  • 29.
    Sharif Razavian, Ali
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Sullivan, Josephine
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Maki, Atsuto
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Carlsson, Stefan
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    A Baseline for Visual Instance Retrieval with Deep Convolutional Networks2015Conference paper (Refereed)
  • 30.
    Uhlin, Tomas
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Nordlund, Peter
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Maki, Atsuto
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Eklundh, Jan-Olof
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Towards an Active Visual Observer1995In: Computer Vision, 1995. Proceedings., Fifth International Conference on, 1995, p. 679-686Conference paper (Refereed)
    Abstract [en]

    We present a binocular active vision system that can attend to and fixate a moving target. Our system has an open and expandable design and it forms the first steps of a long term effort towards developing an active observer using vision to interact with the environment, in particular capable of figure-ground segmentation. We also present partial real-time implementations of this system and show their performance in real-world situations together with motor control. In pursuit we particularly focus on occlusions of other targets, both stationary and moving, and integrate three cues, ego- motion, target motion and target disparity, to obtain an overall robust behavior. An active vision system must be open, expandable, and operate with whatever data are available momen- tarily. It must also be equipped with means and meth- ods to direct and change its attention. This system is therefore equipped with motion detection for changing attention and pursuit for maintaining attention, both of which run concurrently.

1 - 30 of 30
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf