Change search
Refine search result
12 1 - 50 of 55
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Al-Zubaidy, Hussein
    et al.
    KTH, School of Electrical Engineering (EES), Centres, ACCESS Linnaeus Centre.
    Fodor, Viktoria
    KTH, School of Electrical Engineering (EES), Network and Systems engineering. KTH, School of Electrical Engineering (EES), Centres, ACCESS Linnaeus Centre.
    Dán, György
    KTH, School of Electrical Engineering (EES), Network and Systems engineering. KTH, School of Electrical Engineering (EES), Centres, ACCESS Linnaeus Centre.
    Flierl, Markus
    KTH, School of Electrical Engineering (EES), Information Science and Engineering. KTH, School of Electrical Engineering (EES), Centres, ACCESS Linnaeus Centre.
    Reliable Video Streaming With Strict Playout Deadline in Multihop Wireless Networks2017In: IEEE transactions on multimedia, ISSN 1520-9210, E-ISSN 1941-0077, Vol. 19, no 10, p. 2238-2251Article in journal (Refereed)
    Abstract [en]

    Motivated by emerging vision-based intelligent services, we consider the problem of rate adaptation for high-quality and low-delay visual information delivery over wireless networks using scalable video coding. Rate adaptation in this setting is inherently challenging due to the interplay between the variability of the wireless channels, the queuing at the network nodes, and the frame-based decoding and playback of the video content at the receiver at very short time scales. To address the problem, we propose a low-complexity model-based rate adaptation algorithm for scalable video streaming systems, building on a novel performance model based on stochastic network calculus. We validate the analytic model using extensive simulations. We show that it allows fast near-optimal rate adaptation for fixed transmission paths, as well as cross-layer optimized routing and video rate adaptation in mesh networks, with less than 10% quality degradation compared to the best achievable performance.

  • 2.
    Barry, Ousmane
    et al.
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Liu, Du
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Richter, Stefan
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Flierl, Markus
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Robust Motion-Compensated Orthogonal Video Coding Using EBCOT2010In: Proceedings - 4th Pacific-Rim Symposium on Image and Video Technology, PSIVT 2010, IEEE , 2010, p. 264-269Conference paper (Refereed)
    Abstract [en]

    This paper proposes a rate-distortion control for motion-compensatedorthogonal video coding schemes and evaluates its robustness to packet loss as faced in, e.g., IP networks. The robustness of standard hybrid video coding is extensively studied in the literature. In contrast, motion-compensated orthogonal subbands offer important advantages and new features for robust video transmission. In this work, we utilize so-called uni-directional motioncompensated orthogonal transforms in combination with entropy coding similar to EBCOT known from JPEG2000.The approach provides a flexible embedded structure and allows flexible rate-distortion optimization. Moreover, it may even permit separate encoding and rate control. The proposed rate-distortion control takes channel coding into account and obtains a preemptively protected representation. Our implementation is based on repetition codes, adapted to the channel condition, and improves the PSNR significantly. The optimization requires an estimate of the packet loss rate at the encoder and shows moderate sensitivity to estimation errors.

  • 3.
    Ebri Mars, David
    et al.
    KTH, School of Electrical Engineering (EES).
    Wu, Hanwei
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Li, Haopeng
    KTH, School of Electrical Engineering (EES).
    Flierl, Markus
    KTH, School of Electrical Engineering (EES), Communication Theory.
    GEOMETRY-BASED RANKING FOR MOBILE 3D VISUAL SEARCH USING HIERARCHICALLY STRUCTURED MULTI-VIEW FEATURES2015In: 2015 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), IEEE Computer Society, 2015, p. 3077-3081Conference paper (Refereed)
    Abstract [en]

    This paper proposes geometry-based ranking for mobile 3D visual search. It utilizes the underlying geometry of the 3D objects as well as the appearance to improve the ranking results. A double hierarchy has been embedded in the data structure, namely the hierarchically structured multi-view features for each object and a tree hierarchy from multi-view vocabulary trees. As the 3D geometry information is incorporated in the multi-view vocabulary tree, it allows us to evaluate the consistency of the 3D geometry at low computational complexity. Thus, a cost function is proposed for object ranking using geometric consistency. With that, we devise an iterative algorithm that accomplishes 3D geometry-based ranking. The experimental results show that our 3D geometry-based ranking improves the recall-datarate performance as well as the subjective ranking results for mobile 3D visual search.

  • 4.
    Flierl, Markus
    KTH, School of Electrical Engineering (EES), Centres, ACCESS Linnaeus Centre. KTH, School of Electrical Engineering (EES), Sound and Image Processing (Closed 130101).
    A l(1)-NORM PRESERVING MOTION-COMPENSATED TRANSFORM FOR SPARSE APPROXIMATION OF IMAGE SEQUENCES2010Conference paper (Refereed)
    Abstract [en]

    This paper discusses an adaptive non-linear transform for image sequences that aims to generate a l(1)-norm preserving sparse approximation for efficient coding. Most sparse approximation problems employ a linear model where images are represented by a basis and a sparse set of coefficients. In this work, however, we consider image sequences where linear measurements are of limited use due to motion. We present a motion-adaptive non-linear transform for a group of pictures that outputs common and detail coefficients and that minimizes the l(1) norm of the detail coefficients while preserving the overall l(1) norm. We demonstrate that we can achieve a smaller l(1) norm of the detail coefficients when compared to that of motion-adaptive linear measurements. Further, the decay of normalized absolute coefficients is faster than that of motion-adaptive linear measurements.

  • 5.
    Flierl, Markus
    KTH, School of Electrical Engineering (EES), Sound and Image Processing (Closed 130101). KTH, School of Electrical Engineering (EES), Centres, ACCESS Linnaeus Centre.
    Adaptive spatial wavelets for motion-compensated orthogonal video transforms2009In: 2009 16th IEEE International Conference on Image Processing (ICIP), IEEE , 2009, p. 1045-1048Conference paper (Refereed)
    Abstract [en]

    This paper discusses adaptive spatial wavelets for the class of motion-compensated orthogonal video transforms. Motion-compensated orthogonal transforms (MCOT) are temporal transforms for video sequences that maintain orthonormality while permitting flexible motion compensation. Orthogonality is maintained for arbitrary integer-pixel or sub-pixel motion compensation by cascading a sequence of incremental orthogonal transforms and updating so-called scale counters for each pixel. The energy of the input pictures is accumulated in a temporal low-band while the temporal high-bands are zero if the input pictures are identical after motion compensation. For efficient coding, the temporal subbands should be further spatially decomposed to exploit the spatial correlation within each temporal subband. In this paper, we discuss adaptive spatial wavelets that maintain the orthogonal representation of the temporal transforms. Similar to the temporal transforms, they update scale counters for efficient energy concentration. The type-1 adaptive wavelet is a Haar-like wavelet. The type-2 considers three pixels at a time and achieves better energy compaction than the type-1.

  • 6. Flierl, Markus
    et al.
    Girod, Bernd
    Stanford University.
    Generalized B pictures and the draft H.264/AVC video-compression standard2003In: IEEE transactions on circuits and systems for video technology (Print), ISSN 1051-8215, E-ISSN 1558-2205Article in journal (Refereed)
  • 7.
    Flierl, Markus
    et al.
    Stanford University.
    Girod, Bernd
    Stanford University.
    Multiview Video Compression: Exploiting Inter-Image Similarities2007In: IEEE signal processing magazine (Print), ISSN 1053-5888, E-ISSN 1558-0792, Vol. 24, no 6, p. 66-76Article in journal (Refereed)
    Abstract [en]

    Due to the vast raw bit rate of multiview video, efficient compression techniques are essential for 3D scene communication. As the video data originate from the same scene, the inherent similarities of the multiview imagery are exploited for efficient compression. These similarities can be classified into two types, inter-view similarity between adjacent camera views and temporal similarity between temporally successive images of each video.

  • 8.
    Flierl, Markus
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Information Science and Engineering.
    Girod, Bernd
    Stanford University.
    Systems, methods, devices and arrangements for motion-compensated image processing and coding2008Patent (Other (popular science, discussion, etc.))
  • 9. Flierl, Markus
    et al.
    Girod, Bernd
    Stanford University.
    Video Coding with Motion-Compensated Lifted Wavelet Transforms2004In: Signal processing. Image communication, ISSN 0923-5965, E-ISSN 1879-2677Article in journal (Refereed)
  • 10.
    Flierl, Markus
    et al.
    KTH, School of Electrical Engineering (EES).
    Girod, Bernd
    Stanford University.
    Video Coding with Superimposed Motion-Compensated Signals: Applications to H.264 and Beyond2010Book (Refereed)
  • 11.
    Flierl, Markus
    et al.
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Girod, Bernd
    Stanford University.
    Vandergheynst, Pierre
    EPFL.
    Image transform for video coding2006Patent (Other (popular science, discussion, etc.))
    Abstract [en]

    A method is disclosed for decomposing a set of even and odd pictures into low-band and high-band pictures respectively in a image decomposing unit, in which the even picture is used by at least two prediction motion compensators on which the output signal of each prediction motion compensator is scaled according to the number of prediction motion compensators. The method includes calculating the high-band picture by subtracting from the odd picture the scaled motion-compensated signals and using the high-band picture in the at least two update motion compensators, the output signal of each update motion compensator being scaled according to the number of update motion compensators. Finally, the low-band picture is calculated by adding the scaled update motion-compensated signals to the even picture.

  • 12. Flierl, Markus
    et al.
    Mavlankar, Aditya
    Stanford University.
    Girod, Bernd
    Stanford University.
    Motion and Disparity Compensated Coding for Multiview Video2007In: IEEE transactions on circuits and systems for video technology (Print), ISSN 1051-8215, E-ISSN 1558-2205Article in journal (Refereed)
  • 13. Flierl, Markus
    et al.
    Vandergheynst, Pierre
    EPFL.
    Distributed Coding of Highly Correlated Image Sequences with Motion-Compensated Temporal Wavelets2006In: EURASIP Journal on Applied Signal ProcessingArticle in journal (Refereed)
  • 14.
    Flierl, Markus
    et al.
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Vandergheynst, Pierre
    EPFL.
    Method for spatially scalable video coding2004Patent (Other (popular science, discussion, etc.))
    Abstract [en]

    A method for decomposing a digital image at resolution R and MR into a set of spatial sub-bands of resolution R and MR where MR>R and where the high-band at resolution MR is calculated by subtracting the filtered and up-sampled image at resolution R from the image at resolution MR and where the spatial low-band at resolution R is calculated by adding the filtered and down-sampled spatial high-band to the image at resolution R and where a rational factor for up-and down-sampling M is determined by the resolution ratio.

  • 15. Flierl, Markus
    et al.
    Wiegand, Thomas
    TU Berlin.
    Girod, Bernd
    Stanford University.
    Rate-constrained multihypothesis prediction for motion-compensated video compression2002In: IEEE transactions on circuits and systems for video technology (Print), ISSN 1051-8215, E-ISSN 1558-2205Article in journal (Refereed)
  • 16.
    Girdzijauskas, Ivana
    et al.
    Ericsson.
    Flierl, Markus
    KTH, School of Electrical Engineering and Computer Science (EECS), Information Science and Engineering.
    Georgakis, Apostolos
    Ericsson.
    Kumar Rana, Pravin
    KTH, School of Electrical Engineering and Computer Science (EECS), Information Science and Engineering.
    Methods and arrangements for 3D scene representation2010Patent (Other (popular science, discussion, etc.))
  • 17.
    Girdzijauskas, Ivana
    et al.
    Ericsson.
    Flierl, Markus
    KTH, School of Electrical Engineering and Computer Science (EECS), Information Science and Engineering.
    Kumar Rana, Pravin
    KTH, School of Electrical Engineering and Computer Science (EECS), Information Science and Engineering.
    Method and processor for 3D scene representation2012Patent (Other (popular science, discussion, etc.))
  • 18.
    Girod, Bernd
    et al.
    Stanford University.
    Flierl, Markus
    Videocodierung mit mehreren Referenzbildern2003In: it - Information Technology, ISSN 1611-2776Article in journal (Refereed)
  • 19.
    Helgason, Hannes
    et al.
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Li, Haopeng
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Flierl, Markus
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Multiscale framework for adaptive and robust enhancement of depth in multi-view imagery2012In: Image Processing (ICIP), 2012 19th IEEE International Conference on, IEEE , 2012, p. 13-16Conference paper (Refereed)
    Abstract [en]

    Depth Image Based Rendering (DIBR) is a standard technique in free viewpoint television for rendering virtual camera views. For synthesis it utilizes one or several reference texture images and associated depth images, which contain information about the 3D structure of the scene. Many popular depth estimation methods infer the depth information by considering texture images in pairs. This often leads to severe inconsistencies among multiple reference depth images, resulting in poor rendering quality. We propose a method which takes as input a set of depth images and returns an enhanced depth map to be used for rendering at the virtual viewpoint. Our framework is data-driven and based on a simple geometric multiscale model of the underlying depth. Inconsistencies and errors in the inputted depth images are handled locally using tools from the field of robust statistics. Numerical comparison shows the method outperform standard MPEG DIBR software.

  • 20.
    Li, Haopeng
    et al.
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Flierl, Markus
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    3D model hypotheses for player segmentation and rendering in free-viewpoint soccer video2012In: Proceedings - 2012 IEEE International Symposium on Multimedia, ISM 2012, IEEE , 2012, p. 203-209Conference paper (Refereed)
    Abstract [en]

    This paper presents a player segmentation approach based on 3D model hypotheses for soccer games. We use a hyperplane model for player modeling and a collection of piecewise geometric models for background modeling. To determine the assignment of each pixel in the image plane, we test it with two model hypotheses. We construct a cost function that measures the fitness of model hypotheses for each pixel. To fully utilize the perspective diversity of the multiview imagery, we propose a three-step strategy to choose the best model for each pixel. The experimental results show that our segmentation approach based on 3D model hypotheses outperforms conventional temporal median and graph cut methods for both subjective and objective evaluation.

  • 21.
    Li, Haopeng
    et al.
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Flierl, Markus
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Mobile 3D Visual Search using the Helmert Transformation of Stereo Features2013Conference paper (Refereed)
    Abstract [en]

    This work presents a scheme for mobile 3D visual search that facilitates mobile recognition of 3D objects. We use a multi-view approach to extract the 3D geometric information of the query objects and integrate it into SIFT descriptors. To meet a given transmission bandwidth, we use a rate-constrained quad-tree representation for feature selection and encoding. With this approach, we are able to progressively match the query features against the stereo features in the database and implement a robust geometric verification with the Helmert transformation.

  • 22.
    Li, Haopeng
    et al.
    KTH, School of Electrical Engineering (EES), Sound and Image Processing (Closed 130101).
    Flierl, Markus
    KTH, School of Electrical Engineering (EES), Sound and Image Processing (Closed 130101).
    Rate-Distortion-Optimized Content-Adaptive Coding For Immersive Networked Experience Of Sports Events2011Conference paper (Refereed)
    Abstract [en]

    This paper presents a content-adaptive coding scheme for immersive networked experience of sports events, in particular, soccer games. We assume that future sports events are captured by an array of fixed high-definition cameras which provide multiview image sequences for a free-viewpoint immersive networked experience in a home environment. We discuss a content-adaptive coding scheme for image sequences that exploits properties of such sequences and that permits efficient user interactions. In this work, we construct a rate distortion model for an image sequence to obtain the optimal bitrate allocation among static and dynamic content items. The optimal bitrate allocation results in a rate distortion performance of the coding scheme that outperforms that of conventional H.264/AVC coding significantly.

  • 23.
    Li, Haopeng
    et al.
    KTH, School of Electrical Engineering (EES), Sound and Image Processing (Closed 130101).
    Flierl, Markus
    KTH, School of Electrical Engineering (EES), Sound and Image Processing (Closed 130101).
    SIFT-BASED IMPROVEMENT OF DEPTH IMAGERY2011Conference paper (Refereed)
    Abstract [en]

    Depth Image Based Rendering (DIBR) is a widely used technique to enable free viewpoint television. It utilizes one or more reference texture images and their associated depth images to synthesize virtual camera views. The depth image plays a crucial role for DIBR. However, most of the conventional depth image estimation approaches determine the depth information from a limited set of nearby reference images. This leads to inconsistencies among multiple reference depth images, thus resulting in poor rendering quality. In this paper, we propose an approach that uses the Scale Invariant Feature Transform (SIFT) to improve depth images at virtual viewpoints. We extract SIFT features in left and right reference images, and use feature correspondences to improve the consistency between reference depth images. By doing so, the quality of rendered virtual views can be enhanced.

  • 24.
    Li, Haopeng
    et al.
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Flierl, Markus
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Sift-based modeling and coding of background scenes for multiview soccer video2012In: Image Processing (ICIP), 2012 19th IEEE International Conference on, IEEE , 2012, p. 1221-1224Conference paper (Refereed)
    Abstract [en]

    This paper presents a content-adaptive modeling and coding scheme for static multiview background scenes of soccer games. We discuss a content-adaptive modeling approach for static multiview background imagery that is based on piecewise geometric models of the content. We propose an approach that uses the Scale Invariant Feature Transform (SIFT) to extract the parameters of the geometric models. Moreover, a content-adaptive rendering approach is presented for handling occlusion problems in large baseline scenarios. The experimental results show that our content-adaptive modeling and coding scheme outperforms conventional DIBR schemes.

  • 25.
    Li, Haopeng
    et al.
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Flierl, Markus
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Sift-based multi-view cooperative tracking for soccer video2012In: Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on, IEEE , 2012, p. 1001-1004Conference paper (Refereed)
    Abstract [en]

    This paper presents a SIFT-based multi-view cooperative tracking scheme for multiple player tracking in soccer games. We assume that future sports events will be captured by an array of fixed high-definition cameras which provide multi-view video sequences. The imagery will then be used to provide a free-viewpoint networked experience. In this work, SIFT features are used to extract the interview and inter-frame correlation among related views. Hence, accurate 3D information of each player can be efficiently utilized for real time multiple player tracking. By sharing the 3D information with all cameras and exploiting the perspective diversity of the multi-camera system, occlusion problems can be solved effectively. The extracted 3D information improves the average reliability of tracking by more than 10% when compared to SIFT-based 2D tracking.

  • 26.
    Liu, Du
    et al.
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Flierl, Markus
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Energy Compaction on Graphs for Motion-Adaptive Transforms2015In: Data Compression Conference Proceedings, 2015, p. 457-Conference paper (Refereed)
    Abstract [en]

    It is well known that the Karhunen-Loeve Transform (KLT) diagonalizes the covariance matrix and gives the optimal energy compaction. Since the real covariance matrix may not be obtained in video compression, we consider a covariance model that can be constructed without extra cost. In this work, a covariance model based on a graph is considered for temporal transforms of videos. The relation between the covariance matrix and the Laplacian is studied. We obtain an explicit expression of the relation for tree graphs, where the trees are defined by motion information. The proposed graph-based covariance is a good model for motion-compensated image sequences. In terms of energy compaction, our graph-based covariance model has the potential to outperform the classical Laplacian-based signal analysis.

  • 27.
    Liu, Du
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Information Science and Engineering.
    Flierl, Markus
    KTH, School of Electrical Engineering and Computer Science (EECS), Information Science and Engineering.
    Fractional-Pel Accurate Motion-Adaptive Transforms2019In: IEEE Transactions on Image Processing, ISSN 1057-7149, E-ISSN 1941-0042, Vol. 28, no 6, p. 2731-2742, article id 8590746Article in journal (Refereed)
    Abstract [en]

    Fractional-pel accurate motion is widely used in video coding. For subband coding, fractional-pel accuracy is challenging since it is difficult to handle the complex motion field with temporal transforms. In our previous work, we designed integer accurate motion-adaptive transforms (MAT) which can transform integer accurate motion-connected coefficients. In this paper, we extend the integer MAT to fractional-pel accuracy. The integer MAT allows only one reference coefficient to be the lowhand coefficient. In this paper, we design the transform such that it permits multiple references and generates multiple low-band coefficients. In addition, our fractional-pel MAT can incorporate a general interpolation filter into the basis vector, such that the highband coefficient produced by the transform is the same as the prediction error from the interpolation filter. The fractional-pel MAT is always orthonormal. Thus, the energy is preserved by the transform. We compare the proposed fractional-pel MAT, the integer MAT, and the half-pel motion-compensated orthogonal transform (MCOT), while HEVC intra coding is used to encode the temporal subbands. The experimental results show that the proposed fractional-pel MAT outperforms the integer MAT and the half-pel MCOT. The gain achieved by the proposed MAT over the integer MAT can reach up to 1 dB in PSNR.

  • 28.
    Liu, Du
    et al.
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Flierl, Markus
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Graph-Based Construction and Assessment of Motion-Adaptive Transforms2013Conference paper (Refereed)
    Abstract [en]

    In this paper, we propose two algorithms to construct motion-adaptive transforms that are based on vertex-weighted graphs. The graphs are constructed by motion vector information. The weights of the vertices are given by scale factors that are used to accommodate proper concentration of energy in transforms. The vertex-weighted graph defines a one dimensional linear subspace. Thus, our transform basis is subspace constrained. We propose two algorithms. The first is based on the Gram-Schmidt orthonormalization of the discrete cosine transform (DCT) basis. The second combines the rotation of the DCT basis and the Gram-Schmidt orthonormalization. We assess both algorithms in terms of energy compaction. Moreover, we compare to prior work on graph-based rotation of the DCT basis and on so-called motion-compensated orthogonal transforms (MCOT). In our experiments, both algorithms outperform MCOT in terms of energy compaction. However, their performance is similar to that of graph-based rotation of the DCT basis.

  • 29.
    Liu, Du
    et al.
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Flierl, Markus
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Graph-Based Rotation of the DCT Basis for Motion-Adaptive Transforms2013In: 2013 20TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2013), IEEE conference proceedings, 2013, p. 1802-1805Conference paper (Refereed)
    Abstract [en]

    In this paper, we consider motion-adaptive transforms that are based on vertex-weighted graphs. The graphs are constructed by motion vector information and the weights of the vertices are given by scale factors, where the scale factors are used to control the energy compaction of the transform. The vertex-weighted graph defines a one dimensional linear subspace. Thus, our transform basis is subspace constrained. To find a full transform matrix that satisfies our subspace constraint, we rotate the discrete cosine transform (DCT) basis such that the first basis vector matches the subspace constraint. Since rotation is not unique in high dimensions, we choose a simple rotation that only rotates the DCT basis in the plane which is spanned by the first basis vector of the DCT and the subspace constraint. Experimental results on energy compaction show that the motion-adaptive transform based on this rotation is better than the motion-compensated orthogonal transform based on hierarchical decomposition while sharing the same first basis vector.

  • 30.
    Liu, Du
    et al.
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Flierl, Markus
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Motion-Adaptive Transforms Based on the Laplacian of Vertex-Weighted Graphs2014Conference paper (Refereed)
    Abstract [en]

    We construct motion-adaptive transforms for image sequences by using the eigenvectors of Laplacian matrices defined on vertex-weighted graphs, where the weights of the vertices are defined by scale factors. The vertex weights determine only the first basis vector of the linear transform uniquely. Therefore, we use these weights to define two Laplacians of vertex-weighted graphs. The eigenvectors of each Laplacian share the first basis vector as defined by the scale factors only. As the first basis vector is common for all considered Laplacians, we refer to it as subspace constraint. The first Laplacian uses the inverse scale factors, whereas the second utilizes the scale factors directly. The scale factors result from the assumption of ideal motion. Hence, the ideal unscaled pixels are equally connected and we are free to form arbitrary graphs, such as complete graphs, ring graphs, or motion-inherited graphs. Experimental results on energy compaction show that the Laplacian which is based on the inverse scale factors outperforms the one which is based on the direct scale factors. Moreover, Laplacians of motion-inherited graphs are superior than that of complete or ring graphs, when assessing the energy compaction of the resulting motion-adaptive transforms.

  • 31.
    Liu, Du
    et al.
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Flierl, Markus
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Motion-Adaptive Transforms based on Vertex-Weighted Graphs2013In: 2013 Data Compression Conference (DCC), IEEE Computer Society, 2013, p. 181-190Conference paper (Refereed)
    Abstract [en]

    Motion information in image sequences connects pixels that are highly correlated. In this paper, we consider vertex-weighted graphs that are formed by motion vector information. The vertex weights are defined by scale factors which are introduced to improve the energy compaction of motion-adaptive transforms. Further, we relate the vertex-weighted graph to a subspace constraint of the transform. Finally, we propose a subspace-constrained transform (SCT) that achieves optimal energy compaction for the given constraint. The subspace constraint is derived from the underlying motion information only and requires no additional information. Experimental results on energy compaction confirm that the motion-adaptive SCT outperforms motion-compensated orthogonal transforms while approaching the theoretical performance of the Karhunen Loeve Transform (KLT) along given motion trajectories.

  • 32.
    Liu, Du
    et al.
    KTH, School of Electrical Engineering (EES), Information Science and Engineering.
    Flierl, Markus
    KTH, School of Electrical Engineering (EES), Sound and Image Processing (Closed 130101). KTH, School of Electrical Engineering (EES), Centres, ACCESS Linnaeus Centre. KTH, School of Electrical Engineering (EES), Communication Theory.
    Video coding using multi-reference motion-adaptive transforms based on graphs2016In: 2016 IEEE 12th Image, Video, and Multidimensional Signal Processing Workshop, IVMSP 2016, IEEE, 2016Conference paper (Refereed)
    Abstract [en]

    The purpose of the work is to produce jointly coded frames for efficient video coding. We use motion-adaptive transforms in the temporal domain to generate the temporal subbands. The motion information is used to form graphs for transform construction. In our previous work, the motion-adaptive transform allows only one reference pixel to be the lowband coefficient. In this paper, we extend the motion-adaptive transform such that it permits multiple references and produces multiple lowband coefficients, which can be used in the case of bidirectional or multihypothesis motion estimation. The multi-reference motion-adaptive transform (MRMAT) is always orthonormal, thus, the energy is preserved by the transform. We compare MRMAT and the motion-compensated orthogonal transform (MCOT) [1], while HEVC intra coding is used to encode the temporal subbands. The experimental results show that MRMAT outperforms MCOT by about 0.6dB.

  • 33.
    Liu, Du
    et al.
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Flierl, Markus
    KTH, School of Electrical Engineering (EES), Sound and Image Processing (Closed 130101).
    Video coding with adaptive motion-compensated orthogonal transforms2012In: / [ed] Domanski, M; Grajek, T; Karwowski, D; Stasinski, R, IEEE , 2012, p. 293-296Conference paper (Refereed)
    Abstract [en]

    Well-known standard hybrid coding techniques utilize the concept of motion-compensated predictive coding in a closed-loop. The resulting coding dependencies are a major challenge for packet-based networks like the Internet. On the other hand, subband coding techniques avoid the dependencies of predictive coding and are able to generate video streams that better match packet-based networks. An interesting class for subband coding is the so-called motion-compensated orthogonal transform. It generates orthogonal subband coefficients for arbitrary underlying motion fields. In this paper, a theoretical signal model based on Gaussian distributions is discussed to construct a cost function for efficient rate allocation. Additionally, a rate-distortion efficient video coding scheme is developed that takes advantage of motion-compensated orthogonal transforms. The scheme combines multiple types of motion-compensated orthogonal transforms, variable block sizes, and half-pel accurate motion compensation. The experimental results show that this adaptive scheme outperforms individual motion-compensated orthogonal transforms by up to 2 dB.

  • 34.
    Lu, Xiaohua
    et al.
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Li, Haopeng
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Flierl, Markus
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    H.264-compatible coding of background soccer video using temporal subbands2012In: Proceedings - 2012 IEEE International Symposium on Multimedia, ISM 2012, IEEE , 2012, p. 141-144Conference paper (Refereed)
    Abstract [en]

    This paper presents an H.264-compatible temporal subband coding scheme for static background scenes of soccer video. We utilize orthonormal wavelet transforms to decompose a group of successive frames into temporal subbands. By exploiting the property of energy conservation of orthonormal wavelet transforms, we construct a rate distortion model for optimal bitrate allocation among different subbands. To take advantage of the high efficiency video codec H.264/AVC, we encode each subband with H.264/AVC Fidelity Range Extension (FRExt) intra-coding by assigning optimal bitrates. The experimental results show that our proposed coding scheme outperforms conventional video coding with H.264/AVC for both subjective and objective evaluations.

  • 35. Lyu, Xinrui
    et al.
    Li, Haopeng
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Flierl, Markus
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Hierarchically Structured Multi-View Features for Mobile Visual Search2014Conference paper (Refereed)
    Abstract [en]

    This paper presents an approach for using hierarchically structured multi-view features for mobile visual search. We utilize a graph model to describe the feature correspondences between multi-view images. To add features of images from new viewpoints, we design a level raising algorithm and the associated multi-view geometric verification, which are based on the properties of the hierarchical structure. With this approach, features from new viewpoints can be recursively added in an incremental fashion. Additionally, we design a query matching strategy which utilizes the advantage of the hierarchical structure. The experimental results show that our structure of the multi-view feature database can efficiently improve the performance of mobile visual search.

  • 36. Ma, Zhanyu
    et al.
    Rana, Pravin Kumar
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Taghia, Jalil
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Flierl, Markus
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Leijon, Arne
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Bayesian estimation of Dirichlet mixture model with variational inference2014In: Pattern Recognition, ISSN 0031-3203, E-ISSN 1873-5142, Vol. 47, no 9, p. 3143-3157Article in journal (Refereed)
    Abstract [en]

    In statistical modeling, parameter estimation is an essential and challengeable task. Estimation of the parameters in the Dirichlet mixture model (DMM) is analytically intractable, due to the integral expressions of the gamma function and its corresponding derivatives. We introduce a Bayesian estimation strategy to estimate the posterior distribution of the parameters in DMM. By assuming the gamma distribution as the prior to each parameter, we approximate both the prior and the posterior distribution of the parameters with a product of several mutually independent gamma distributions. The extended factorized approximation method is applied to introduce a single lower-bound to the variational objective function and an analytically tractable estimation solution is derived. Moreover, there is only one function that is maximized during iterations and, therefore, the convergence of the proposed algorithm is theoretically guaranteed. With synthesized data, the proposed method shows the advantages over the EM-based method and the previously proposed Bayesian estimation method. With two important multimedia signal processing applications, the good performance of the proposed Bayesian estimation method is demonstrated.

  • 37.
    Mars, David
    et al.
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Wu, Hanwei
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Li, Haopeng
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Flierl, Markus
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Joint Geometric Verification and Ranking using Multi-View Vocabulary Trees for Mobile 3D Visual Search2015In: Data Compression Conference Proceedings, 2015Conference paper (Refereed)
    Abstract [en]

    This paper proposes multi-view vocabulary trees for mobile 3D visual search. We generate hierarchically structured multi-view features and construct a multi-view vocabulary tree from the multi-view images. As the 3D geometry information is incorporated in the multi-view vocabulary tree, it allows us to design an algorithm forfast 3D geometric verification at low computational complexity. With that, we devise an iterative algorithm that accomplishes jointly matching and geometric verification. The experimental results show that our joint approach to matching and verification improves the recall-datarate performance as well as the subjective ranking results for mobile 3D visual search.

  • 38.
    Parthasarathy, Srinivas
    et al.
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Chopra, Akul
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Baudin, Emilie
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Rana, Pravin Kumar
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Flierl, Markus
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Denoising of volumetric depth confidence for view rendering2012In: 3DTV-Conference: The True Vision - Capture, Transmission and Display of 3D Video (3DTV-CON), 2012, IEEE , 2012, p. 1-4Conference paper (Refereed)
    Abstract [en]

    In this paper, we define volumetric depth confidence and propose a method to denoise this data by performing adaptive wavelet thresholding using three dimensional (3D) wavelet transforms. The depth information is relevant for emerging interactive multimedia applications such as 3D TV and free-viewpoint television (FTV). These emerging applications require high quality virtual view rendering to enable viewers to move freely in a dynamic real worldscene. Depth information of a real world scene from different viewpoints is used to render an arbitrary number of novel views. Usually, depth estimates of 3D object points from different viewpoints are inconsistent. This inconsistency of depth estimates affects the quality of view rendering negatively. Based on the superposition principle, we define a volumetric depth confidence description of the underlying geometry of natural 3D scenes by using these inconsistent depth estimates from different viewpoints. Our method denoises this noisy volumetric description, and with this, we enhance the quality of view rendering by up to 0.45 dB when compared to rendering with conventional MPEG depth maps.

  • 39.
    Rana, Pravin Kumar
    et al.
    KTH, School of Electrical Engineering (EES), Sound and Image Processing (Closed 130101). KTH, School of Electrical Engineering (EES), Centres, ACCESS Linnaeus Centre.
    Flierl, Markus
    KTH, School of Electrical Engineering (EES), Sound and Image Processing (Closed 130101). KTH, School of Electrical Engineering (EES), Centres, ACCESS Linnaeus Centre.
    Depth consistency testing for improved view interpolation2010In: , 2010, p. 384-389Conference paper (Refereed)
    Abstract [en]

    Multiview video will play a pivotal role in the next generation visual communication media services like three-dimensional (3D) television and free-viewpoint television. These advanced media services provide natural 3D impressions and enable viewers to move freely in a dynamic real world scene by changing the viewpoint. High quality virtual view interpolation is required to support free viewpoint viewing. Usually, depth maps of different viewpoints are used to reconstruct a novel view. As these depth maps are usually estimated individually by stereo-matching algorithms, they have very weak spatial consistency. The inconsistency of depth maps affects the quality of view interpolation. In this paper, we propose a method for depth consistency testing to improve view interpolation. The method addresses the problem by warping more than two depth maps from multiple reference viewpoints to the virtual viewpoint. We test the consistency among warped depth values and improve the depth value information of the virtual view. With that, we enhance the quality of the interpolated virtual view.

  • 40.
    Rana, Pravin Kumar
    et al.
    KTH, School of Electrical Engineering (EES), Sound and Image Processing (Closed 130101). KTH, School of Electrical Engineering (EES), Centres, ACCESS Linnaeus Centre.
    Flierl, Markus
    KTH, School of Electrical Engineering (EES), Sound and Image Processing (Closed 130101). KTH, School of Electrical Engineering (EES), Centres, ACCESS Linnaeus Centre.
    Depth Pixel Clustering for Consistency Testing of Multiview Depth2012In: European Signal Processing Conference, 2012, p. 1119-1123Conference paper (Refereed)
    Abstract [en]

    This paper proposes a clustering algorithm of depth pixels for consistency testing of multiview depth imagery. The testing addresses the inconsistencies among estimated depth maps of real world scenes by validating depth pixel connection evidence based on a hard connection threshold. With the proposed algorithm, we test the consistency among depth values generated from multiple depth observations using cluster adaptive connection thresholds. The connection threshold is based on statistical properties of depth pixels in a cluster or sub-cluster. This approach can improve the depth information of real world scenes at a given viewpoint. This allows us to enhance the quality of synthesized virtual views when compared to depth maps obtained by using fixed thresholding. Depth-image-based virtual view synthesis is widely used for upcoming multimedia services like three-dimensional television and free-viewpoint television.

  • 41.
    Rana, Pravin Kumar
    et al.
    KTH, School of Electrical Engineering (EES), Sound and Image Processing (Closed 130101). KTH, School of Electrical Engineering (EES), Centres, ACCESS Linnaeus Centre.
    Flierl, Markus
    KTH, School of Electrical Engineering (EES), Sound and Image Processing (Closed 130101). KTH, School of Electrical Engineering (EES), Centres, ACCESS Linnaeus Centre.
    View Interpolation with structured depth from multiview video2011Conference paper (Refereed)
    Abstract [en]

    In this paper, we propose a method for interpolating multiview imagery which uses structured depth maps and multiview video plus inter-view connection information to represent a three-dimensional (3D) scene. The structured depth map consists of an inter-view consistent principal depth map and auxiliary depth information. The structured depth maps address the inconsistencies among estimated depth maps which may degrade the quality of rendered virtual views. Generated from multiple depth observations, the structuring of the depth maps is based on tested and adaptively chosen inter-view connections. Further, the use of connection information on the multiview video minimizes distortion due to varying illumination in the interpolated virtual views. Our approach improves the quality of rendered virtual views by up to 4 dB when compared to the reference MPEG view synthesis software for emerging multimedia services like 3D television and free-viewpoint television. Our approach obtains first the structured depth maps and the corresponding connection information. Second, it exploits the inter-view connection information when interpolating virtual views.

  • 42.
    Rana, Pravin Kumar
    et al.
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Ma, Zhanyu
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Taghia, Jalil
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Flierl, Markus
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Multiview Depth Map Enhancement by Variational Bayes Inference Estimation of Dirichlet Mixture Models2013In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE , 2013, p. 1528-1532Conference paper (Refereed)
    Abstract [en]

    High quality view synthesis is a prerequisite for future free-viewpointtelevision. It will enable viewers to move freely in a dynamicreal world scene. Depth image based rendering algorithms willplay a pivotal role when synthesizing an arbitrary number of novelviews by using a subset of captured views and corresponding depthmaps only. Usually, each depth map is estimated individually bystereo-matching algorithms and, hence, shows lack of inter-viewconsistency. This inconsistency affects the quality of view synthesis negatively. This paper enhances the inter-view consistency ofmultiview depth imagery. First, our approach classifies the colorinformation in the multiview color imagery by modeling color witha mixture of Dirichlet distributions where the model parameters areestimated in a Bayesian framework with variational inference. Second, using the resulting color clusters, we classify the correspondingdepth values in the multiview depth imagery. Each clustered depthimage is subject to further sub-clustering. Finally, the resultingmean of each sub-cluster is used to enhance the depth imagery atmultiple viewpoints. Experiments show that our approach improvesthe average quality of virtual views by up to 0.8 dB when comparedto views synthesized by using conventionally estimated depth maps.

  • 43.
    Rana, Pravin Kumar
    et al.
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Taghia, Jalil
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Flier, Markus
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Statistical methods for inter-view depth enhancement2014In: 2014 3DTV-Conference: The True Vision - Capture, Transmission and Display of 3D Video (3DTV-CON), IEEE , 2014, p. 6874755-Conference paper (Refereed)
    Abstract [en]

    This paper briefly presents and evaluates recent advances in statistical methods for improving inter-view inconsistency in multiview depth imagery. View synthesis is vital in free-viewpoint television in order to allow viewers to move freely in a dynamic scene. Here, depth image-based rendering plays a pivotal role by synthesizing an arbitrary number of novel views by using a subset of captured views and corresponding depth maps only. Usually, each depth map is estimated individually at different viewpoints by stereo matching and, hence, shows lack of inter-view consistency. This lack of consistency affects the quality of view synthesis negatively. This paper discusses two different approaches to enhance the inter-view depth consistency. The first one uses generative models based on multiview color and depth classification to assign a probabilistic weight to each depth pixel. The weighted depth pixels are utilized to enhance depth maps. The second one performs inter-view consistency testing in depth difference space to enhance the depth maps at multiple viewpoints. We comparatively evaluate these two methods and discuss their pros and cons for future work.

  • 44.
    Rana, Pravin Kumar
    et al.
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Taghia, Jalil
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Flierl, Markus
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    A Variational Bayesian Inference Framework for Multiview Depth Image Enhancement2012In: Proceedings - 2012 IEEE International Symposium on Multimedia, ISM 2012, IEEE , 2012, p. 183-190Conference paper (Refereed)
    Abstract [en]

    In this paper, a general model-based framework for multiview depth image enhancement is proposed. Depth imagery plays a pivotal role in emerging free-viewpoint television. This technology requires high quality virtual view synthesis to enable viewers to move freely in a dynamic real world scene. Depth imagery of different viewpoints is used to synthesize an arbitrary number of novel views. Usually, the depth imagery is estimated individually by stereo-matching algorithms and, hence, shows lack of inter-view consistency. This inconsistency affects the quality of view synthesis negatively. This paper enhances the inter-view consistency of multiview depth imagery by using a variational Bayesian inference framework. First, our approach classifies the color information in the multiview color imagery. Second, using the resulting color clusters, we classify the corresponding depth values in the multiview depth imagery. Each clustered depth image is subject to further subclustering. Finally, the resulting mean of the sub-clusters is used to enhance the depth imagery at multiple viewpoints. Experiments show that our approach improves the quality of virtual views by up to 0.25 dB.

  • 45.
    Rana, Pravin Kumar
    et al.
    KTH, School of Electrical Engineering (EES), Communication Theory. KTH, School of Electrical Engineering (EES), Centres, ACCESS Linnaeus Centre.
    Taghia, Jalil
    KTH, School of Electrical Engineering (EES), Communication Theory. KTH, School of Electrical Engineering (EES), Centres, ACCESS Linnaeus Centre.
    Ma, Zhanyu
    Flierl, Markus
    KTH, School of Electrical Engineering (EES), Communication Theory. KTH, School of Electrical Engineering (EES), Centres, ACCESS Linnaeus Centre.
    Probabilistic Multiview Depth Image Enhancement Using Variational Inference2015In: IEEE Journal on Selected Topics in Signal Processing, ISSN 1932-4553, E-ISSN 1941-0484, Vol. 9, no 3, p. 435-448Article in journal (Refereed)
    Abstract [en]

    An inference-based multiview depth image enhancement algorithm is introduced and investigated in this paper. Multiview depth imagery plays a pivotal role in free-viewpoint television. This technology requires high-quality virtual view synthesis to enable viewers to move freely in a dynamic real world scene. Depth imagery of different viewpoints is used to synthesize an arbitrary number of novel views. Usually, the depth imagery is estimated individually by stereo-matching algorithms and, hence, shows inter-view inconsistency. This inconsistency affects the quality of view synthesis negatively. This paper enhances the multiview depth imagery at multiple viewpoints by probabilistic weighting of each depth pixel. First, our approach classifies the color pixels in the multiview color imagery. Second, using the resulting color clusters, we classify the corresponding depth values in the multiview depth imagery. Each clustered depth image is subject to further subclustering. Clustering based on generative models is used for assigning probabilistic weights to each depth pixel. Finally, these probabilistic weights are used to enhance the depth imagery at multiple viewpoints. Experiments show that our approach consistently improves the quality of virtual views by 0.2 dB to 1.6 dB, depending on the quality of the input multiview depth imagery.

  • 46.
    Varodayan, David
    et al.
    Stanford University.
    Chen, David
    Stanford University.
    Flierl, Markus
    KTH, School of Electrical Engineering (EES).
    Girod, Bernd
    Stanford University.
    Wyner-Ziv coding of video with unsupervised motion vector learning2008In: Signal processing. Image communication, ISSN 0923-5965, E-ISSN 1879-2677, Vol. 23, no 5, p. 369-378Article in journal (Refereed)
    Abstract [en]

    Distributed source coding theory has long promised a new method of encoding video that is much lower in complexity than conventional methods. In the distributed framework, the decoder is tasked with exploiting the redundancy of the video signal. Among the difficulties in realizing a practical codec has been the problem of motion estimation at the, decoder. In this paper, we propose a technique for unsupervised learning of forward motion vectors during the decoding of a frame with reference to its previous, reconstructed frame. The technique, described for both pixel-domain and. transform-domain coding, is an instance of the expectation maximization algorithm. The performance of our transform-domain motion learning video codec improves as GOP size grows. It is better than using motion-compensated temporal interpolation by 0.5 dB when GOP size is 2, and by even more when GOP size is larger. It performs within about 0.25dB of a codec that knows the motion vectors through an oracle, but is hundreds of orders of magnitude less complex than a corresponding brute-force decoder motion search approach would be.

  • 47.
    Wiegand, Thomas
    et al.
    TU Berlin.
    Girod, Bernd
    Stanford University.
    Flierl, Markus
    KTH, School of Electrical Engineering and Computer Science (EECS), Information Science and Engineering.
    Multi-hypothesis motion-compensated video image predictor1998Patent (Other (popular science, discussion, etc.))
  • 48.
    Wu, Hanwei
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Information Science and Engineering.
    Flierl, Markus
    KTH, School of Electrical Engineering and Computer Science (EECS), Information Science and Engineering.
    Component-based quadratic similarity identification for multivariate Gaussian sources2018In: Data Compression Conference Proceedings, Institute of Electrical and Electronics Engineers Inc. , 2018Conference paper (Refereed)
    Abstract [en]

    This paper considers the problem of compression for similarity identification. Unlike classic compression problems, the focus is not on reconstructing the original data. Instead, compression is determined by the reliability of answering given queries. The problem is characterized by the identification rate of a source which is the minimum compression rate which allows reliable answers for a given similarity threshold. In this work, we investigate the component-based quadratic similarity identification for multivariate Gaussian sources. The decorrelated original data is processed by a distinct D- A dmissible system for each component. For a special case, we characterize the component-based identification rate for a correlated Gaussian source. Furthermore, we derived the optimal bit allocation for a given total rate constraint.

  • 49.
    Wu, Hanwei
    et al.
    KTH, School of Electrical Engineering (EES).
    Flierl, Markus
    KTH, School of Electrical Engineering (EES).
    Transform-based compression for quadratic similarity queries2018In: Conference Record of 51st Asilomar Conference on Signals, Systems and Computers, ACSSC 2017, Institute of Electrical and Electronics Engineers (IEEE), 2018, p. 377-381Conference paper (Refereed)
    Abstract [en]

    This paper considers the problem of compression for similarity queries [1] and discusses transform-based compression schemes. Here, the focus is on the tradeoff between the rate of the compressed data and the reliability of the answers to a given query. We consider compression schemes that do not allow false negatives when answering queries. Hence, classical compression techniques need to be modified. We propose transform-based compression schemes which decorrelate original data and regard each transform component as a distinct D-admissible system. Both compression and retrieval will be performed in the transform domain. The transform-based schemes show advantages in terms of encoding speed and the ability of handling high-dimensional correlated data. In particular, we discuss component-based and vector-based schemes. We use P{maybe}, a probability that is related to the occurrence of false positives to assess our scheme. Our experiments show that component-based schemes offer both good performance and low search complexity.

  • 50.
    Wu, Hanwei
    et al.
    KTH, School of Electrical Engineering (EES), Information Science and Engineering.
    Li, Haopeng
    KTH, School of Electrical Engineering (EES), Information Science and Engineering.
    Flierl, Markus
    KTH, School of Electrical Engineering (EES), Information Science and Engineering.
    AN EMBEDDED 3D GEOMETRY SCORE FOR MOBILE 3D VISUAL SEARCH2016In: 2016 IEEE 18TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), Institute of Electrical and Electronics Engineers (IEEE), 2016Conference paper (Refereed)
    Abstract [en]

    The scoring function is a central component in mobile visual search. In this paper, we propose an embedded 3D geometry score for mobile 3D visual search (M3DVS). In contrast to conventional mobile visual search, M3DVS uses not only the visual appearance of query objects, but utilizes also the underlying 3D geometry. The proposed scoring function interprets visual search as a process that reduces uncertainty among candidate objects when observing a query. For M3DVS, the uncertainty is reduced by both appearance-based visual similarity and 3D geometric similarity. For the latter, we give an algorithm for estimating the query-dependent threshold for geometric similarity. In contrast to visual similarity, the threshold for geometric similarity is relative due to the constraints of image-based 3D reconstruction. The experimental results show that the embedded 3D geometry score improves the recall-datarate performance when compared to a conventional visual score or 3D geometry-based re-ranking.

12 1 - 50 of 55
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf