Change search
Refine search result
1 - 10 of 10
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Azizpour, Hossein
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Razavian, Ali Sharif
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Sullivan, Josephine
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Maki, Atsuto
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Carlsson, Stefan
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    From Generic to Specific Deep Representations for Visual Recognition2015In: Proceedings of CVPR 2015, IEEE conference proceedings, 2015Conference paper (Refereed)
    Abstract [en]

    Evidence is mounting that ConvNets are the best representation learning method for recognition. In the common scenario, a ConvNet is trained on a large labeled dataset and the feed-forward units activation, at a certain layer of the network, is used as a generic representation of an input image. Recent studies have shown this form of representation to be astoundingly effective for a wide range of recognition tasks. This paper thoroughly investigates the transferability of such representations w.r.t. several factors. It includes parameters for training the network such as its architecture and parameters of feature extraction. We further show that different visual recognition tasks can be categorically ordered based on their distance from the source task. We then show interesting results indicating a clear correlation between the performance of tasks and their distance from the source task conditioned on proposed factors. Furthermore, by optimizing these factors, we achieve stateof-the-art performances on 16 visual recognition tasks.

  • 2.
    Azizpour, Hossein
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Sharif Razavian, Ali
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Sullivan, Josephine
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Maki, Atsuto
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Carlssom, Stefan
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Factors of Transferability for a Generic ConvNet Representation2016In: IEEE Transaction on Pattern Analysis and Machine Intelligence, ISSN 0162-8828, E-ISSN 1939-3539, Vol. 38, no 9, p. 1790-1802, article id 7328311Article in journal (Refereed)
    Abstract [en]

    Evidence is mounting that Convolutional Networks (ConvNets) are the most effective representation learning method for visual recognition tasks. In the common scenario, a ConvNet is trained on a large labeled dataset (source) and the feed-forward units activation of the trained network, at a certain layer of the network, is used as a generic representation of an input image for a task with relatively smaller training set (target). Recent studies have shown this form of representation transfer to be suitable for a wide range of target visual recognition tasks. This paper introduces and investigates several factors affecting the transferability of such representations. It includes parameters for training of the source ConvNet such as its architecture, distribution of the training data, etc. and also the parameters of feature extraction such as layer of the trained ConvNet, dimensionality reduction, etc. Then, by optimizing these factors, we show that significant improvements can be achieved on various (17) visual recognition tasks. We further show that these visual recognition tasks can be categorically ordered based on their similarity to the source task such that a correlation between the performance of tasks and their similarity to the source task w.r.t. the proposed factors is observed.

  • 3.
    Carlsson, Stefan
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Robotics, Perception and Learning, RPL.
    Azizpour, Hossein
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    Razavian, Ali Sharif
    KTH, School of Electrical Engineering and Computer Science (EECS), Robotics, Perception and Learning, RPL.
    Sullivan, Josephine
    KTH, School of Electrical Engineering and Computer Science (EECS), Robotics, Perception and Learning, RPL.
    Smith, Kevin
    KTH, School of Electrical Engineering and Computer Science (EECS), Computational Science and Technology (CST).
    The Preimage of Rectifier Network Activities2017In: International Conference on Learning Representations (ICLR), 2017Conference paper (Refereed)
    Abstract [en]

    The preimage of the activity at a certain level of a deep network is the set of inputs that result in the same node activity. For fully connected multi layer rectifier networks we demonstrate how to compute the preimages of activities at arbitrary levels from knowledge of the parameters in a deep rectifying network. If the preimage set of a certain activity in the network contains elements from more than one class it means that these classes are irreversibly mixed. This implies that preimage sets which are piecewise linear manifolds are building blocks for describing the input manifolds specific classes, ie all preimages should ideally be from the same class. We believe that the knowledge of how to compute preimages will be valuable in understanding the efficiency displayed by deep learning networks and could potentially be used in designing more efficient training algorithms.

  • 4. Olczak, Jakub
    et al.
    Fahlberg, Niklas
    Maki, Atsuto
    KTH, School of Computer Science and Communication (CSC), Robotics, perception and learning, RPL.
    Razavian, Ali Sharif
    KTH, School of Computer Science and Communication (CSC), Robotics, perception and learning, RPL. Danderyd Hosp, Karolinska Inst, Sweden.
    Jilert, Anthony
    Stark, Andre
    Skoldenberg, Olof
    Gordon, Max
    Artificial intelligence for analyzing orthopedic trauma radiographs Deep learning algorithms-are they on par with humans for diagnosing fractures?2017In: Acta Orthopaedica, ISSN 1745-3674, E-ISSN 1745-3682, Vol. 88, no 6, p. 581-586Article in journal (Refereed)
    Abstract [en]

    Background and purpose - Recent advances in artificial intelligence (deep learning) have shown remarkable performance in classifying non-medical images, and the technology is believed to be the next technological revolution. So far it has never been applied in an orthopedic setting, and in this study we sought to determine the feasibility of using deep learning for skeletal radiographs. Methods - We extracted 256,000 wrist, hand, and ankle radiographs from Danderyd's Hospital and identified 4 classes: fracture, laterality, body part, and exam view. We then selected 5 openly available deep learning networks that were adapted for these images. The most accurate network was benchmarked against a gold standard for fractures. We furthermore compared the network's performance with 2 senior orthopedic surgeons who reviewed images at the same resolution as the network. Results - All networks exhibited an accuracy of at least 90% when identifying laterality, body part, and exam view. The final accuracy for fractures was estimated at 83% for the best performing network. The network performed similarly to senior orthopedic surgeons when presented with images at the same resolution as the network. The 2 reviewer Cohen's kappa under these conditions was 0.76. Interpretation - This study supports the use for orthopedic radiographs of artificial intelligence, which can perform at a human level. While current implementation lacks important features that surgeons require, e.g. risk of dislocation, classifications, measurements, and combining multiple exam views, these problems have technical solutions that are waiting to be implemented for orthopedics.

  • 5.
    Razavian, Ali Sharif
    et al.
    KTH, School of Computer Science and Communication (CSC), Robotics, perception and learning, RPL.
    Sullivan, Josephine
    KTH.
    Carlsson, Stefan
    KTH.
    Maki, Atsuto
    KTH, School of Computer Science and Communication (CSC), Robotics, perception and learning, RPL.
    Visual instance retrieval with deep convolutional networks2016In: ITE Transactions on Media Technology and Applications, ISSN 2186-7364, Vol. 4, no 3, p. 251-258Article in journal (Refereed)
    Abstract [en]

    This paper provides an extensive study on the availability of image representations based on convolutional networks (ConvNets) for the task of visual instance retrieval. Besides the choice of convolutional layers, we present an efficient pipeline exploiting multi-scale schemes to extract local features, in particular, by taking geometric invariance into explicit account, i.e. positions, scales and spatial consistency. In our experiments using five standard image retrieval datasets, we demonstrate that generic ConvNet image representations can outperform other state-of-the-art methods if they are extracted appropriately.

  • 6.
    Sharif Razavian, Ali
    KTH, School of Computer Science and Communication (CSC), Robotics, perception and learning, RPL.
    Convolutional Network Representation for Visual Recognition2017Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Image representation is a key component in visual recognition systems. In visual recognition problem, the solution or the model should be able to learn and infer the quality of certain visual semantics in the image. Therefore, it is important for the model to represent the input image in a way that the semantics of interest can be inferred easily and reliably. This thesis is written in the form of a compilation of publications and tries to look into the Convolutional Networks (CovnNets) representation in visual recognition problems from an empirical perspective. Convolutional Network is a special class of Neural Networks with a hierarchical structure where every layer’s output (except for the last layer) will be the input of another one. It was shown that ConvNets are powerful tools to learn a generic representation of an image. In this body of work, we first showed that this is indeed the case and ConvNet representation with a simple classifier can outperform highly-tuned pipelines based on hand-crafted features. To be precise, we first trained a ConvNet on a large dataset, then for every image in another task with a small dataset, we feedforward the image to the ConvNet and take the ConvNets activation on a certain layer as the image representation. Transferring the knowledge from the large dataset (source task) to the small dataset (target task) proved to be effective and outperformed baselines on a variety of tasks in visual recognition. We also evaluated the presence of spatial visual semantics in ConvNet representation and observed that ConvNet retains significant spatial information despite the fact that it has never been explicitly trained to preserve low-level semantics. We then tried to investigate the factors that affect the transferability of these representations. We studied various factors on a diverse set of visual recognition tasks and found a consistent correlation between the effect of those factors and the similarity of the target task to the source task. This intuition alongside the experimental results provides a guideline to improve the performance of visual recognition tasks using ConvNet features. Finally, we addressed the task of visual instance retrieval specifically as an example of how these simple intuitions can increase the performance of the target task massively.

  • 7.
    Sharif Razavian, Ali
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Aghazadeh, Omid
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Sullivan, Josephine
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Carlsson, Stefan
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Estimating Attention in Exhibitions Using Wearable Cameras2014In: Pattern Recognition (ICPR), 2014 22nd International Conference on, Stockholm, Sweden: IEEE conference proceedings, 2014, , p. 2691-2696p. 2691-2696Conference paper (Refereed)
    Abstract [en]

    This paper demonstrates a system for automatic detection of visual attention and identification of salient items at exhibitions (e.g. museum or an auction). The method is offline and is done on a video captured by a head mounted camera. Towards the estimation of attention, we define the notions of "saliency" and "interestingness" for an exhibition items. Our method is a combination of multiple state of the art techniques from different vision tasks such as tracking, image matching and retrieval. Many experiments are conducted to evaluate multiple aspects of our method. The method has proven to be robust to image blur, occlusion, truncation, and dimness. The experiments shows strong performance for the tasks of matching items, estimating focus frames and detecting salient and interesting items. This can be useful to the commercial vendors and museum curators and help them to understand which items are appealing more to the visitors.

  • 8.
    Sharif Razavian, Ali
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Azizpour, Hossein
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Maki, Atsuto
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Sullivan, Josephine
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Ek, Carl Henrik
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Carlsson, Stefan
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Persistent Evidence of Local Image Properties in Generic ConvNets2015In: Image Analysis: 19th Scandinavian Conference, SCIA 2015, Copenhagen, Denmark, June 15-17, 2015. Proceedings / [ed] Paulsen, Rasmus R., Pedersen, Kim S., Springer Publishing Company, 2015, p. 249-262Conference paper (Refereed)
    Abstract [en]

    Supervised training of a convolutional network for object classification should make explicit any information related to the class of objects and disregard any auxiliary information associated with the capture of the image or thevariation within the object class. Does this happen in practice? Although this seems to pertain to the very final layers in the network, if we look at earlier layers we find that this is not the case. Surprisingly, strong spatial information is implicit. This paper addresses this, in particular, exploiting the image representation at the first fully connected layer,i.e. the global image descriptor which has been recently shown to be most effective in a range of visual recognition tasks. We empirically demonstrate evidences for the finding in the contexts of four different tasks: 2d landmark detection, 2d object keypoints prediction, estimation of the RGB values of input image, and recovery of semantic label of each pixel. We base our investigation on a simple framework with ridge rigression commonly across these tasks,and show results which all support our insight. Such spatial information can be used for computing correspondence of landmarks to a good accuracy, but should potentially be useful for improving the training of the convolutional nets for classification purposes.

  • 9.
    Sharif Razavian, Ali
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Azizpour, Hossein
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Sullivan, Josephine
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Carlsson, Stefan
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    CNN features off-the-shelf: An Astounding Baseline for Recognition2014In: Proceedings of CVPR 2014, 2014Conference paper (Refereed)
    Abstract [en]

    Recent results indicate that the generic descriptors extracted from the convolutional neural networks are very powerful. This paper adds to the mounting evidence that this is indeed the case. We report on a series of experiments conducted for different recognition tasks using the publicly available code and model of the OverFeat network which was trained to perform object classification on ILSVRC13. We use features extracted from the OverFeat network as a generic image representation to tackle the diverse range of recognition tasks of object image classification, scene recognition, fine grained recognition, attribute detection and image retrieval applied to a diverse set of datasets. We selected these tasks and datasets as they gradually move further away from the original task and data the OverFeat network was trained to solve. Astonishingly, we report consistent superior results compared to the highly tuned state-of-the-art systems in all the visual classification tasks on various datasets. For instance retrieval it consistently outperforms low memory footprint methods except for sculptures dataset. The results are achieved using a linear SVM classifier (or L2 distance in case of retrieval) applied to a feature representation of size 4096 extracted from a layer in the net. The representations are further modified using simple augmentation techniques e.g. jittering. The results strongly suggest that features obtained from deep learning with convolutional nets should be the primary candidate in most visual recognition tasks.

  • 10.
    Sharif Razavian, Ali
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Sullivan, Josephine
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Maki, Atsuto
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Carlsson, Stefan
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    A Baseline for Visual Instance Retrieval with Deep Convolutional Networks2015Conference paper (Refereed)
1 - 10 of 10
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf