Change search
Refine search result
123 1 - 50 of 142
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Alhaj Moussa, Obada
    et al.
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Li, Minyue
    Kleijn, W. Bastiaan
    KTH, School of Electrical Engineering (EES), Communication Theory. Victoria University of Wellington, New Zealand .
    PITCH ENHANCEMENT MOTIVATED BY RATE-DISTORTION THEORY2014Conference paper (Refereed)
    Abstract [en]

    A pitch enhancement filter is designed with the objective to approach the optimal rate-distortion trade-off. The filter shows significant perceptual benefits, restating that information-theoretical and perceptual criteria are usually consistent. The filter is easy to implement and can be used as a complement to existing audio codecs. Our experiments show that it can improve the reconstruction quality of the AMR-WB standard.

  • 2.
    Alhaj Moussa, Obada
    et al.
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Li, Minyue
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Kleijn, W. Bastiaan
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Predictive Audio Coding Using Rate-Distortion-Optimal Pre-and-Post-Filtering2011In: Applications of Signal Processing to Audio and Acoustics (WASPAA), 2011 IEEE Workshop on, IEEE conference proceedings, 2011, p. 213-216Conference paper (Refereed)
    Abstract [en]

    A natural approach to audio coding is to use a rate-distortion optimal design combined with a perceptual model. While this approach is common in transform coding, existing predictive-coding based audio coders are generally not optimal and they benefit from heuristically motivated post-filtering. As delay requirements often force the use of predictive coding, we consider audio coding with a pre- and post-filtered predictive structure that was recently proven to be asymptotically optimal in the rate-distortion sense [1]. We show that this audio coding is efficient in achieving the state-of-the-art performance. We also show that the pre-filter plays a relatively minor role. This leads to an analytic approach for optimizing the post-filter and the predictor at each rate, eliminating the need for manual re-tuning whenever a different rate is called for. In a subjective test, the theoretically optimized post-filter provided a better performance than a conventional post-filter.

  • 3. Backstrom, T
    et al.
    Alku, P
    Paatero, T
    Kleijn, Bastiaan
    KTH, Superseded Departments, Signals, Sensors and Systems.
    A time-domain interpretation for the LSP decomposition2004In: IEEE transactions on speech and audio processing, ISSN 1063-6676, E-ISSN 1558-2353, Vol. 12, no 6, p. 554-560Article in journal (Refereed)
    Abstract [en]

    The line spectrum pair (LSP) decomposition is a widely used method in speech coding. In this article, we will show that the LSP polynomials, whose trivial zeros have been removed, are equivalent to two optimal (in the mean square sense) predictors in which a sample is predicted from linear combinations of its previous averaged and differentiated values.

  • 4.
    Chatterjee, Saikat
    et al.
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Kleijn, W. Bastiaan
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    AUDITORY MODEL BASED MODIFIED MFCC FEATURES2010In: 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, p. 4590-4593Conference paper (Refereed)
    Abstract [en]

    Using spectral and spectro-temporal auditory models, we develop a computationally simple feature vector based on the design architecture of existing mel frequency cepstral coefficients (MFCCs). Along with the use of an optimized static function to compress a set of filter bank energies, we propose to use a memory-based adaptive compression function to incorporate the behavior of human auditory response across time and frequency. We show that a significant improvement in automatic speech recognition (ASR) performance is obtained for any environmental condition, clean as well as noisy.

  • 5.
    Chatterjee, Saikat
    et al.
    KTH, School of Electrical Engineering (EES), Communication Theory. KTH, School of Electrical Engineering (EES), Centres, ACCESS Linnaeus Centre.
    Kleijn, W. Bastiaan
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Auditory Model-Based Design and Optimization of Feature Vectors for Automatic Speech Recognition2011In: IEEE Transactions on Audio, Speech, and Language Processing, ISSN 1558-7916, E-ISSN 1558-7924, Vol. 19, no 6, p. 1813-1825Article in journal (Refereed)
    Abstract [en]

    Using spectral and spectro-temporal auditory models along with perturbation-based analysis, we develop a new framework to optimize a feature vector such that it emulates the behavior of the human auditory system. The optimization is carried out in an offline manner based on the conjecture that the local geometries of the feature vector domain and the perceptual auditory domain should be similar. Using this principle along with a static spectral auditory model, we modify and optimize the static spectral mel frequency cepstral coefficients (MFCCs) without considering any feedback from the speech recognition system. We then extend the work to include spectro-temporal auditory properties into designing a new dynamic spectro-temporal feature vector. Using a spectro-temporal auditory model, we design and optimize the dynamic feature vector to incorporate the behavior of human auditory response across time and frequency. We show that a significant improvement in automatic speech recognition (ASR) performance is obtained for any environmental condition, clean as well as noisy.

  • 6.
    Chatterjee, Saikat
    et al.
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Koniaris, Christos
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Kleijn, W. Baastian
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Auditory model based optimization of MFCCs improves automatic speech recognition performance2009In: INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, 2009, p. 2943-2946Conference paper (Refereed)
    Abstract [en]

    Using a spectral auditory model along with perturbation based analysis, we develop a new framework to optimize a set of features such that it emulates the behavior of the human auditory system. The optimization is carried out in an off-line manner based on the conjecture that the local geometries of the feature domain and the perceptual auditory domain should be similar. Using this principle, we modify and optimize the static mel frequency cepstral coefficients (MFCCs) without considering any feedback from the speech recognition system. We show that improved recognition performance is obtained for any environmental condition, clean as well as noisy.

  • 7. Driesen, J.
    et al.
    Van Hamme, H.
    Kleijn, W. Bastiaan
    KTH, School of Electrical Engineering (EES), Sound and Image Processing (Closed 130101).
    Learning from images and speech with non-negative matrix factorization enhanced by input space scaling2010In: 2010 IEEE Workshop on Spoken Language Technology, SLT 2010 - Proceedings, IEEE , 2010, p. 1-6Conference paper (Refereed)
    Abstract [en]

    Computional learning from multimodal data is often done with matrix factorization techniques such as NMF (Non-negative Matrix Factorization), pLSA (Probabilistic Latent Semantic Analysis) or LDA (Latent Dirichlet Allocation). The different modalities of the input are to this end converted into features that are easily placed in a vectorized format. An inherent weakness of such a data representation is that only a subset of these data features actually aids the learning. In this paper, we first describe a simple NMF-based recognition framework operating on speech and image data. We then propose and demonstrate a novel algorithm that scales the inputs of this framework in order to optimize its recognition performance.

  • 8.
    Ekman, Anders
    et al.
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Grancharov, Volodya
    Multimedia Technologies, Ericsson Research.
    Kleijn, W. Bastiaan
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Double-Ended Quality Assessment System for Super-Wideband Speech2011In: IEEE TRANS AUDIO SPEECH LANG, ISSN 1558-7916, Vol. 19, no 3, p. 558-569Article in journal (Refereed)
    Abstract [en]

    This paper describes a double-ended quality assessment system for speech with a bandwidth of up to 14 kHz (so-called super-wideband speech). The quality assessment system is based on a combination of local and global features, where the local features are dependent on a time alignment procedure and the global features are not. The system is evaluated over a large set of subjectively scored narrowband, wideband and super-wideband speech databases. The system performs similarly to PESQ for narrowband speech and significantly better for wideband speech.

  • 9. Ekman, L. A.
    et al.
    Kleijn, W. Bastiaan
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Murthi, M. N.
    Regularized linear prediction of speech2008Article in journal (Refereed)
    Abstract [en]

    All-pole spectral envelope estimates based on linear prediction (LP) for speech signals often exhibit unnaturally sharp peaks, especially for high-pitch speakers. In this paper, regularization is used to penalize rapid changes in the spectral envelope, which improves the spectral envelope estimate. Based on extensive experimental evidence, we conclude that regularized linear prediction outperforms bandwidth-expanded linear prediction. The regularization approach gives lower spectral distortion on average, and fewer outliers, while maintaining a very low computational complexity.

  • 10.
    Ekman, L. Anders
    et al.
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Kleijn, W. Bastiaan
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Murthi, Manohar N.
    Spectral envelope estimation and regularization2006In: 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, 2006, p. 245-248Conference paper (Refereed)
    Abstract [en]

    A well-known problem with linear prediction is that its estimate of the spectral envelope often has sharp peaks for high-pitch speakers. These peaks are anomalies resulting from contamination of the spectral envelope by the spectral fine structure. We investigate the method of regularized linear prediction to find a better estimate of the spectral envelope and compare the method to the commonly used approach of bandwidth expansion. We present simulations over voiced frames of female speakers from the TINUT database, where the envelope modeling accuracy is measured using a log spectral distortion measure. We also investigate the coding properties of the methods. The results indicate that the new regularized LP method is superior to bandwidth expansion, with an insignificant increase in computational complexity.

  • 11. Falk, Tiago H.
    et al.
    Stadler, Svante
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Kleijn, W. Bastiaan
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Chan, Wai-Yip
    Noise Suppression Based on Extending a Speech-Dominated Modulation Band2007In: INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, p. 1469-1472Conference paper (Refereed)
    Abstract [en]

    Previous work on bandpass modulation filtering for noise suppression has resulted in unwanted perceptual artifacts and decreased speech clarity. Artifacts are introduced mainly due to half-wave rectification, which is employed to correct for negative power spectral values resultant from the filtering process. In this paper, modulation frequency estimation (i.e., bandwidth extension) is used to improve perceptual quality. Experiments demonstrate that speech-component lowpass modulation content can be reliably estimated from bandpass modulation content of speech-plus-noise components. Subjective listening tests corroborate that improved quality is attained when the removed speech lowpass modulation content is compensated for by the estimate.

  • 12. Faundez-Zanuy, M
    et al.
    Hagmuller, M
    Kubin, G
    Kleijn, W. Bastiaan
    KTH, School of Electrical Engineering (EES), Sound and Image Processing (Closed 130101).
    The COST-277 speech database2005In: NONLINEAR ANALYSES AND ALGORITHMS FOR SPEECH PROCESSING, 2005, Vol. 3817, p. 100-107Conference paper (Refereed)
    Abstract [en]

    Databases are fundamental for research investigations. This paper presents the speech database generated in the framework of COST-277 "Nonlinear speech processing" European project, as a result of European collaboration. This database lets to address two main problems: the relevance of bandwidth extension, and the usefulness of a watermarking with perceptual shaping at different Watermark to Signal ratios. It will be public available after the end of the COST-277 action, in January 2006.

  • 13. Faundez-Zanuy, M
    et al.
    Laine, U
    Kubin, G
    McLaughlin, S
    Kleijn, W. Bastiaan
    Chollet, G
    Petek, B
    Hussain, A
    The COST-277 European action: An overview2005In: NONLINEAR ANALYSES AND ALGORITHMS FOR SPEECH PROCESSING, 2005, Vol. 3817, p. 1-9Conference paper (Refereed)
    Abstract [en]

    This paper summarizes the rationale for proposing the COST-277 "nonlinear speech processing" action, and the work done during these last four years. In addition, future perspectives are described.

  • 14. Faundez-Zanuy, M.
    et al.
    Nilsson, Mattias
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Kleijn, Bastiaan
    KTH, School of Electrical Engineering (EES), Communication Theory.
    On the relevance of bandwidth extension for speaker identification2015In: European Retail Research, ISSN 1782-1029, E-ISSN 2219-5491, European Signal Processing Conference, ISSN 2219-5491, article id 7072183Article in journal (Refereed)
    Abstract [en]

    In this paper we discuss the relevance of bandwidth extension for speaker identification tasks. Mainly we want to study if it is possible to recognize voices that have been bandwith extended. For this purpose, we created two different databases (microphonic and ISDN) of speech signals that were bandwidth extended from telephone bandwidth ([300, 3400] Hz) to full bandwidth ([100, 8000] Hz). We have evaluated different parameterizations, and we have found that the MELCEPST parameterization can take advantage of the bandwidth extension algorithms in several situations.

  • 15. Feldbauer, C.
    et al.
    Kleijn, W. Bastiaan
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Scalable Coding with Side Information for Packet Loss Recovery2009In: IEEE Transactions on Communications, ISSN 0090-6778, E-ISSN 1558-0857, Vol. 57, no 8, p. 2309-2319Article in journal (Refereed)
    Abstract [en]

    This paper presents a packet loss recovery method that uses an incomplete secondary encoding based on scalar quantization as redundancy. The method is redundancy bit rate scalable and allows an adaptation to varying loss scenarios and a varying packeting strategy. The recovery is performed by minimum mean squared error estimation incorporating a statistical model for the quantizers to facilitate real.-time adaptation. A bit allocation algorithm is proposed that extends 'reverse water filling' to the problem of scalar encoding dependent variables for a decoder with a final estimation stage and available side information. We apply the method to the encoding of line-spectral frequencies (LSFs), which are commonly used in speech coding, illustrating the good performance of the method.

  • 16. Feldbauer, C.
    et al.
    Kubin, G.
    Kleijn, W. Bastiaan
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Anthropomorphic coding of speech and audio: A model inversion approach2005In: EURASTP journal an applied signal processing, ISSN 1110-8657, E-ISSN 1687-0433, Vol. 2005, no 9, p. 1334-1349Article in journal (Refereed)
    Abstract [en]

    Auditory modeling is a well-established methodology that provides insight into human perception and that facilitates the extraction of signal features that are most relevant to the listener. The aim of this paper is to provide a tutorial on perceptual speech and audio coding using an invertible auditory model. In this approach, the audio signal is converted into an auditory representation using an invertible auditory model. The auditory representation is quantized and coded. Upon decoding, it is then transformed back into the acoustic domain. This transformation converts a complex distortion criterion into a simple one, thus facilitating quantization with low complexity. We briefly review past work on auditory models and describe in more detail the components of our invertible model and its inversion procedure, that is, the method to reconstruct the signal from the output of the auditory model. We summarize attempts to use the auditory representation for low-bit-rate coding. Our approach also allows the exploitation of the inherent redundancy of the human auditory system for the purpose of multiple description (joint source-channel) coding.

  • 17.
    Feldbauer, Christian
    et al.
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Kleun, W. Bastiaan
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    An adaptive, scalable packet loss recovery method2007In: 2007 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2007, p. 1117-1120Conference paper (Other academic)
    Abstract [en]

    We propose a packet loss recovery method that uses an incomplete secondary encoding as redundancy. The recovery is performed by minimum mean squared error estimation. The method adapts to the loss scenario and is rate scalable. It incorporates a statistical model for the quantizers to facilitate real-time adaptation. We apply the method to the encoding of line-spectral frequencies, which are commonly used in speech coding, illustrating the good performance of the method.

  • 18.
    Grancharov, Volodya
    et al.
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Kleijn, W. Bastiaan
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Georgiev, Alexander
    Sub-pixel registration of noisy images2006In: 2006 IEEE International Conference on Acoustics, Speech and Signal Processing: image and multidimensional signal processing, signal processing education, bioimaging and signal processing, NEW YORK, NY: IEEE , 2006, p. 273-Conference paper (Refereed)
    Abstract [en]

    The accurate registration of images observed in additive noise is a challenging task. The noise increases the number of misregistered regions, and decreases the accuracy of subpixel registration. To address this problem, we propose an intensity-based algorithm that performs registration based only on regions that are least affccted by noise. We select these regions with a signal-to-noise ratio estimate that is obtained from an initial, less-accurate registration. Clur simulations demonstrate that the proposed noise-adaptive scheme significantly outperforms the conventional registration approach.

  • 19.
    Grancharov, Volodya
    et al.
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Plasberg, Jan H.
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Samuelsson, Jonas
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Kleijn, Bastiaan
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Generalized postfilter for speech quality enhancement2008In: IEEE Transactions on Audio, Speech and Language Processing, ISSN 1558-7916, Vol. 16, no 1, p. 57-64Article in journal (Refereed)
    Abstract [en]

    Postfilters are commonly used in speech coding for the attenuation of quantization noise. In the presence of acoustic background noise or distortion due to tandeming operations, the postfilter parameters are not adjusted and the performance is, therefore, not optimal. We propose a modification that consists of replacing the nonadaptive postfilter parameters with parameters that adapt to variations in spectral flatness, obtained from the noisy speech. This generalization of the postfiltering concept can handle a larger range of noise conditions, but has the same computational complexity and memory requirements as the conventional postfilter. Test results indicate that the presented algorithm improves on the standard postfilter, as well as on the combination of a noise attenuation preprocessor and the conventional postfilter.

  • 20.
    Grancharov, Volodya
    et al.
    KTH, Superseded Departments, Signals, Sensors and Systems.
    Samuelsson, Jonas
    KTH, Superseded Departments, Signals, Sensors and Systems.
    Bastiaan Kleijn, W.
    KTH, Superseded Departments, Signals, Sensors and Systems.
    Distortion measures for vector quantization of noisy spectrum2005In: Eur. Conf. Speech Commun. Technol., 2005, p. 3173-3176Conference paper (Refereed)
    Abstract [en]

    In this paper we address the problem of vector quantization of speech in a noisy environment. We show that the performance of a vector quantization system can be improved by adapting the distortion measure to the changing environmental conditions. The proposed method emphasizes the distortion in spectral regions where the speech signal dominates. The method functions well even when conventional pre-processor methods fail because the noise statistics cannot be estimated reliably from speech pauses (as, e.g., in tandeming operations). Objective tests confirm that the use of environmentally adaptive measures significantly improves estimation accuracy in noisy speech, while preserving the quality in the case of clean input.

  • 21.
    Grancharov, Volodya
    et al.
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Samuelsson, Jonas
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Kleijn, Bastiaan
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    On causal algorithms for speech enhancement2006In: IEEE Transactions on Speech and Audio Processing., ISSN 1558-7916, Vol. 14, p. 764-773Article in journal (Refereed)
    Abstract [en]

    Kalman filtering is a powerful technique for the estimation of a signal, observed in noise that can be used to enhance speech observed in the presence of acoustic background noise. In a speech communication system, the speech signal is typically buffered for a period of 10-40 ms and, therefore, the use of either a causal or a noncausal filter is possible. We show that the causal Kalman algorithm is in conflict with the basic properties of human perception and address the problem of improving its perceptual quality. We discuss two approaches to improve perceptual performance. The first is based on a new method that combines the causal Kalman algorithm with pre- and postfiltering to introduce perceptual shaping of the residual noise. The second is based on the conventional Kalman smoother. We show that a short lag removes the conflict resulting from the causality constraint and we quantify the minimum lag required for this purpose. The results of our objective and subjective evaluations confirm that both approaches significantly outperform the conventional causal implementation. Of the two approaches, the Kalman smoother performs better if the signal statistics are precisely known, if this is not the case the perceptually weighted Kalman filter performs better.

  • 22.
    Grancharov, Volodya
    et al.
    KTH, Superseded Departments, Signals, Sensors and Systems.
    Samuelsson, Jonas
    KTH, Superseded Departments, Signals, Sensors and Systems.
    Kleijn, W. Baastian
    KTH, Superseded Departments, Signals, Sensors and Systems.
    Noise-dependent postfiltering2004In: 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, NEW YORK: IEEE , 2004, p. 457-460Conference paper (Refereed)
    Abstract [en]

    This paper introduces a modification of the commonly used postfilter that improves performance when acoustic background noise is present. The modification consists of replacing the nonadaptive postfilter parameters that govern the degree of spectral emphasis (commonly denoted as gamma(1) and gamma(2)) with parameters that adapt to the noise statistics. We describe an effective mapping from the noise statistics to the emphasis parameters and provide a low complexity noise estimation algorithm that is sufficient for this application. The resulting noise-adaptive postfilter successfully attenuates the background noise and naturally converges to the conventional postfilter at high SNR conditions. Thus, the speech enhancement problem is solved with minimal modification of legacy codecs, since the existing structure of the speech codec is used. Test results indicate that the presented algorithm significantly outperforms the standard postfilter with non-adaptive parameters.

  • 23.
    Grancharov, Volodya
    et al.
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Samuelsson, Jonas
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Kleijn, W. Bastiaan
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Improved Kalman filtering for speech enhancement2005In: 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, p. 1109-1112Conference paper (Refereed)
    Abstract [en]

    The Kalman recursion is a powerful technique for reconstruction of tire speech signal observed in additive background noise. In contrast to Wiener filtering and spectral subtraction schemes, the Kalman algorithm can be easily implemented in both causal and noncausal form. After studying the perceptual differences between these two implementations we propose a novel algorithm that combines the low complexity and the robustness of the Kalman filter and the proper noise shaping of the Kalman smoother.

  • 24. Grancharov, Volodya
    et al.
    Srinivasan, S.
    Samuelsson, Jonas
    Kleijn, W. Bastian
    KTH, School of Electrical Engineering (EES), Signal Processing.
    Robust spectrum quantization for LP parameter enhancement2015In: European Signal Processing Conference, European Signal Processing Conference, EUSIPCO , 2015, p. 1951-1954Conference paper (Refereed)
    Abstract [en]

    In this paper, we investigate the denoising properties of robust vector quantization of the speech spectrum parameters in combination with a Kalman filter. The underlying assumption is that the high-energy speech regions can be used to reconstruct the low-energy regions destroyed by noise. This can be achieved through vector quantization with a properly weighted distortion measure. The performance of the proposed system, Kalman filtering with prior vector quantization, is compared with existing schemes for parameter estimation used in Kalman filtering. The results indicate significant improvement over the reference systems in both objective and subjective tests.

  • 25. Grancharov, Volodya
    et al.
    Zhao, David
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Lindblom, Jonas
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Kleijn, Bastiaan
    KTH, School of Electrical Engineering (EES).
    Low-complexity, non-intrusive speech quality assessment2006In: IEEE Transactions on Speech and Audio Processing., ISSN 1558-7916, Vol. 14, no 6, p. 1948-1956Article in journal (Refereed)
    Abstract [en]

    Monitoring of speech quality in emerging heterogeneous networks is of great interest to network operators. The most efficient way to satisfy such a need is through nonintrusive, objective speech quality assessment. In this paper, we describe a low-complexity algorithm for monitoring the speech quality over a network. The features used in the proposed algorithm can be computed from commonly used speech-coding parameters. Reconstruction and perceptual transformation of the signal is not performed. The critical advantage of the approach lies in generating quality assessment ratings without explicit distortion modeling. The results from the performed experiments indicate that the proposed nonintrusive objective quality measure performs better than the ITU-T P.563 standard.

  • 26.
    Grancharov, Volodya
    et al.
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Zhao, David Yuheng
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Lindblom, Jonas
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Kleijn, W. Bastiaan
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Non-Intrusive Speech Quality Assessment with Low Computational Complexity2006In: INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2006, p. 189-192Conference paper (Refereed)
    Abstract [en]

    We describe an algorithm for monitoring subjective speech quality without access to the original signal that has very low computational and memory requirements. The features used in the proposed algorithm can be computed from commonly used speech-coding parameters. Reconstruction and perceptual transformation of the signal are not performed. The algorithm generates quality assessment ratings without explicit distortion modeling. The simulation results indicate that the proposed non-intrusive objective quality measure performs better than the ITU-T P.563 standard despite its very low computational complexity.

  • 27.
    Guoqiang, Zhang
    et al.
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Dán, György
    KTH, School of Electrical Engineering (EES), Communication Networks.
    Kleijn, W. Bastiaan
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Lundin, Henrik
    Adaptive Playout Scheduling for Voice over IP: Event-Triggered Control PolicyIn: IEEE Multimedia, ISSN 1070-986X, E-ISSN 1941-0166Article in journal (Other academic)
    Abstract [en]

    We study adaptive-playout scheduling for Voice over IP using the frame-work of stochastic impulse control theory. We use the Wiener process tomodel the fluctuation of the buffer length in the absence of control. In thiscontext, the control signal consists of length units that correspond to insert-ing or dropping a pitch cycle. We define an optimality criterion that hasan adjustable trade-off between average buffering delay and average controlsignal (the length of the pitch cycles added plus the length of the pitch cyclesdropped), and show that a band control policy is optimal for this criterion.The band control policy maintains the buffer length within a band regionby imposing impulse control (inserted or dropped pitch cycles) whenever thebounds of the band are reached. One important property of the band controlpolicy is that it incurs no packet-loss through buffering if there are no out-of-order packet-arrivals. Experiments performed on both synthetic and realnetwork-delay traces show that the proposed playout scheduling algorithmoutperforms two recent algorithms in most cases.

  • 28.
    Guoqiang, Zhang
    et al.
    KTH, School of Electrical Engineering (EES), Sound and Image Processing. KTH, School of Electrical Engineering (EES), Centres, ACCESS Linnaeus Centre.
    W. Bastiaan, Kleijn
    KTH, School of Electrical Engineering (EES), Sound and Image Processing. KTH, School of Electrical Engineering (EES), Centres, ACCESS Linnaeus Centre.
    Autoregressive Model-based Speech Packet-Loss Concealment2008In: 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, 2008, p. 4797-4800Conference paper (Refereed)
    Abstract [en]

    We study packet-loss concealment for speech based on autoregressivemodelling using a rigorous minimum mean square error (MMSE) approach.The effect of the model estimation error on predicting the missing segment isstudied and an upper bound on the mean square error is derived. Our exper-iments show that the upper bound is tight when the estimation error is lessthan the signal variance. We also consider the usage of perceptual weightingon prediction to improve speech quality. A rigorous argument is presentedto show that perceptual weighting is not useful in this context. We createsimple and practical MMSE-based systems using two signal models: a basicmodel capturing the short-term correlation and a more sophisticated modelthat also captures the long-term correlation. Subjective quality comparisontests show that the proposed MMSE-based system provides state-of-the-artperformance.

  • 29.
    Henter, Gustav Eje
    et al.
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Frean, Marcus R.
    School of Engineering and Computer Science, Victoria University of Wellington, New Zealand.
    Kleijn, W. Bastiaan
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Gaussian process dynamical models for nonparametric speech representation and synthesis2012In: Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on, IEEE , 2012, p. 4505-4508Conference paper (Refereed)
    Abstract [en]

    We propose Gaussian process dynamical models (GPDMs) as a new, nonparametric paradigm in acoustic models of speech. These use multidimensional, continuous state-spaces to overcome familiar issues with discrete-state, HMM-based speech models. The added dimensions allow the state to represent and describe more than just temporal structure as systematic differences in mean, rather than as mere correlations in a residual (which dynamic features or AR-HMMs do). Being based on Gaussian processes, the models avoid restrictive parametric or linearity assumptions on signal structure. We outline GPDM theory, and describe model setup and initialization schemes relevant to speech applications. Experiments demonstrate subjectively better quality of synthesized speech than from comparable HMMs. In addition, there is evidence for unsupervised discovery of salient speech structure.

  • 30.
    Henter, Gustav Eje
    et al.
    KTH, School of Electrical Engineering (EES), Sound and Image Processing (Closed 130101).
    Kleijn, W. Bastiaan
    KTH, School of Electrical Engineering (EES), Sound and Image Processing (Closed 130101).
    Intermediate-State HMMs to Capture Continuously-Changing Signal Features2011In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2011, p. 1828-1831Conference paper (Refereed)
    Abstract [en]

    Traditional discrete-state HMMs are not well suited for describing steadily evolving, path-following natural processes like motion capture data or speech. HMMs cannot represent incremental progress between behaviors, and sequences sampled from the models have unnatural segment durations, unsmooth transitions, and excessive rapid variation. We propose to address these problems by permitting the state variable to occupy positions between the discrete states, and present a concrete left-right model incorporating this idea. We call this intermediate-state HMMs. The state evolution remains Markovian. We describe training using the generalized EM-algorithm and present associated update formulas. An experiment shows that the intermediate-state model is capable of gradual transitions, with more natural durations and less noise in sampled sequences compared to a conventional HMM.

  • 31.
    Henter, Gustav Eje
    et al.
    KTH, School of Electrical Engineering (EES), Communication Theory. The University of Edinburgh, United Kingdom.
    Kleijn, W. Bastiaan
    KTH, School of Electrical Engineering (EES), Communication Theory. Victoria University of Wellington, New Zealand.
    Minimum Entropy Rate Simplification of Stochastic ProcessesManuscript (preprint) (Other academic)
    Abstract [en]

    We propose minimum entropy rate simplification (MERS), an information-theoretic, representation-independent framework for simplifying generative models of stochastic processes. Applications include improving model quality for sampling tasks by concentrating the probability mass on the most characteristic and accurately described behaviors while de-emphasizing the tails, and obtaining clean models from corrupted data (nonparametric denoising). This is the opposite of the smoothing step commonly applied to classification models. Drawing on rate-distortion theory, MERS seeks the minimum entropy-rate process under a constraint on the dissimilarity between the original and simplified processes. We particularly investigate the Kullback-Leibler divergence rate as a dissimilarity measure, where, compatible with our assumption that the starting model is disturbed or inaccurate, the simplification rather than the starting model is used for the reference distribution of the divergence. This leads to analytic solutions for stationary and ergodic Gaussian processes and Markov chains. The same formulas are also valid for maximum entropy smoothing under the same divergence constraint. In experiments, MERS successfully simplifies and denoises Markov models from text, speech, and meteorology.

  • 32.
    Henter, Gustav Eje
    et al.
    KTH, School of Electrical Engineering (EES), Sound and Image Processing (Closed 130101).
    Kleijn, W. Bastiaan
    KTH, School of Electrical Engineering (EES), Sound and Image Processing (Closed 130101).
    Picking up the pieces: Causal states in noisy data, and how to recover them2013In: Pattern Recognition Letters, ISSN 0167-8655, E-ISSN 1872-7344, Vol. 34, no 5, p. 587-594Article in journal (Refereed)
    Abstract [en]

    Automatic structure discovery is desirable in many Markov model applications where a good topology (states and transitions) is not known a priori. CSSR is an established pattern discovery algorithm for stationary and ergodic stochastic symbol sequences that learns a predictively optimal Markov representation consisting of so-called causal states. By means of a novel algebraic criterion, we prove that the causal states of a simple process disturbed by random errors frequently are too complex to be learned fully, making CSSR diverge. In fact, the causal state representation of many hidden Markov models, representing simple but noise-disturbed data, has infinite cardinality. We also report that these problems can be solved by endowing CSSR with the ability to make approximations. The resulting algorithm, robust causal states (RCS), is able to recover the underlying causal structure from data corrupted by random substitutions, as is demonstrated both theoretically and in an experiment. The algorithm has potential applications in areas such as error correction and learning stochastic grammars.

  • 33.
    Henter, Gustav Eje
    et al.
    KTH, School of Electrical Engineering (EES), Sound and Image Processing (Closed 130101).
    Kleijn, W. Bastiaan
    KTH, School of Electrical Engineering (EES), Sound and Image Processing (Closed 130101).
    Simplified Probability Models for Generative Tasks: a Rate-Distortion Approach2010In: Proceedings of the European Signal Processing Conference, EUROPEAN ASSOC SIGNAL SPEECH & IMAGE PROCESSING-EURASIP , 2010, Vol. 18, p. 1159-1163Conference paper (Refereed)
    Abstract [en]

    We consider using sparse simplifications to denoise probabilistic sequence models for generative tasks such as speech synthesis. Our proposal is to find the least random model that remains close to the original one according to a KL-divergence constraint, a technique we call minimum entropy rate simplification (MERS). This produces a representation-independent framework for trading off simplicity and divergence, similar to rate-distortion theory. Importantly, MERS uses the cleaned model rather than the original one for the underlying probabilities in the KL-divergence, effectively reversing the conventional argument order. This promotes rather than penalizes sparsity, suppressing uncommon outcomes likely to be errors. We write down the MERS equations for Markov chains, and present an iterative solution procedure based on the Blahut-Arimoto algorithm and a bigram matrix Markov chain representation. We apply the procedure to a music-based Markov grammar, and compare the results to a simplistic thresholding scheme.

  • 34.
    Henter, Gustav Eje
    et al.
    KTH, School of Electrical Engineering (EES), Communication Theory. The University of Edinburgh, United Kingdom.
    Leijon, Arne
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Kleijn, W. Bastiaan
    KTH, School of Electrical Engineering (EES), Communication Theory. Victoria University of Wellington, New Zealand.
    Kernel Density Estimation-Based Markov Models with Hidden StateManuscript (preprint) (Other academic)
    Abstract [en]

    We consider Markov models of stochastic processes where the next-step conditional distribution is defined by a kernel density estimator (KDE), similar to certain time-series bootstrap schemes from the economic forecasting literature. The KDE Markov models (KDE-MMs) we discuss are nonlinear, nonparametric, fully probabilistic representations of stationary processes with strong asymptotic convergence properties. The models generate new data simply by concatenating points from the training data sequences in a context-sensitive manner, with some added noise. We present novel EM-type maximum-likelihood algorithms for data-driven bandwidth selection in KDE-MMs. Additionally, we augment the KDE-MMs with a hidden state, yielding a new model class, KDE-HMMs. The added state-variable enables long-range memory and signal structure representation, complementing the short-range correlations captured by the Markov process. This is compelling for modelling complex real-world processes such as speech and language data. The paper presents guaranteed-ascent EM-update equations for model parameters in the case of Gaussian kernels, as well as relaxed update formulas that greatly accelerate training in practice. Experiments demonstrate increased held-out set probability for KDE-HMMs on several challenging natural and synthetic data series, compared to traditional techniques such as autoregressive models, HMMs, and their combinations.

  • 35. Heusdens, R.
    et al.
    Jensen, J.
    Kleijn, W. Bastiaan
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Kot, V.
    Niamut, O. A.
    Van De Par, S.
    Van Schijndel, N. H.
    Vafin, R.
    Bit-rate scalable intraframe sinusoidal audio coding based on rate-distortion optimization2006In: Journal of the Audio Engineering Society, ISSN 1549-4950, Vol. 54, no 3, p. 167-188Article in journal (Refereed)
    Abstract [en]

    A coding methodology that aims at rate-distortion optimal sinusoid + noise coding of audio and speech signals is presented. The coder divides the input signal into variable-length time segments and distributes sinusoidal components over the segments such that the resulting distortion (as measured by a perceptual distortion measure) is minimized subject to a prespecified rate constraint. The coder is bit-rate scalable. For a given target bit budget it automatically adapts the segmentation and distribution of sinusoids in a rate-distortion optimal manner. The coder uses frequency-differential coding techniques in order to exploit intrasegment correlations for efficient quantization and encoding of the sinusoidal model parameters. This technique makes the coder more robust toward packet losses when used in a lossy-packet channel environment as compared to time-differential coding techniques, which are commonly used in audio or speech coders. In a subjective listening experiment the present coder showed similar or better performance than a set of four MPEG-4 coders operating at bit rates of 16, 24, 32, and 48 kbit/s, each of which was state of the art for the given target bit rate.

  • 36. Heusdens, R.
    et al.
    Vafin, R.
    Kleijn, W. Bastiaan
    KTH, Superseded Departments, Signals, Sensors and Systems.
    Sinusoidal modeling using psychoacoustic-adaptive matching pursuits2002In: IEEE Signal Processing Letters, ISSN 1070-9908, E-ISSN 1558-2361, Vol. 9, no 8, p. 262-265Article in journal (Refereed)
    Abstract [en]

    In this letter, we propose a segment-based matching-pursuit algorithm where the psychoacoustical properties of the human auditory system are taken into account. Rather than scaling the dictionary elements according to auditory perception, we define a psychoacoustic-adaptive norm on the signal space that can be used for assigning the dictionary elements to the individual segments in a rate-distortion optimal way. The new algorithm is asymptotically equal to signal-to-mask-ratio-based algorithms in the limit of infinite-analysis window length. However, the new algorithm provides a significantly improved selection of the dictionary elements for finite window length.

  • 37. Heusdens, Richard
    et al.
    Kleijn, W. Bastiaan
    KTH, School of Electrical Engineering (EES), Sound and Image Processing. KTH, School of Electrical Engineering (EES), Centres, ACCESS Linnaeus Centre.
    Ozerov, Alexey
    KTH, School of Electrical Engineering (EES), Sound and Image Processing. KTH, School of Electrical Engineering (EES), Centres, ACCESS Linnaeus Centre.
    Entropy-constrained high-resolution lattice vector quantization using a perceptually relevant distortion measure2007In: CONFERENCE RECORD OF THE FORTY-FIRST ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, VOLS 1-5, NEW YORK: IEEE , 2007, p. 2075-2079Conference paper (Other academic)
    Abstract [en]

    In this paper we study high-resolution entropy-constrained coding using multidimensional companding. To account for auditory perception, we introduce a perceptual relevant distortion measure. We will derive a multidimensional companding function which is asymptotically optimal in the sense that the rate loss introduced by the compander will vanish with increasing vector dimension. We compare the companding scheme to a scheme which is based on a perceptual weighting of the source, thereby transforming the perceptual distortion measure into a mean-squared error distortion measure. Experimental results show that even at low vector dimension, the rate loss introduced by the compander is low (less than 0.05 bit per dimension in case of two-dimensional vectors).

  • 38. Hofbauer, Konrad
    et al.
    Kubin, Gernot
    Kleijn, W. Bastiaan
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Speech Watermarking for Analog Flat-Fading Bandpass Channels2009In: IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, ISSN 1558-7916, Vol. 17, no 8, p. 1624-1637Article in journal (Refereed)
    Abstract [en]

    We present a blind speech watermarking algorithm that embeds the watermark data in the phase of non-voiced speech by replacing the excitation signal of an autoregressive speech signal representation. The watermark signal is embedded in a frequency subband, which facilitates robustness against bandpass filtering channels. We derive several sets of pulse shapes that prevent inter-symbol interference and that allow the passband watermark signal to be created by simple filtering. A marker-based synchronization scheme robustly detects the location of the embedded watermark data without the occurrence of insertions or deletions. In light of a potential application to analog aeronautical voice radio communication, we present experimental results for embedding a watermark in narrowband speech at a bit-rate of 450 bit/s. The recursive least-squares (RLS) equalization-based watermark detector not only compensates for the vocal tract filtering, but also recovers the watermark data in the presence of nonlinear phase and bandpass filtering, amplitude modulation, and additive white Gaussian noise (AWGN), making the watermarking scheme highly robust.

  • 39. Huang, F.
    et al.
    Lee, T.
    Kleijn, W. Bastiaan
    KTH, School of Electrical Engineering (EES), Sound and Image Processing (Closed 130101).
    A method of speech periodicity enhancement based on transform-domain signal decomposition2010In: 18th European Signal Processing Conference (EUSIPCO-2010), EUROPEAN ASSOC SIGNAL SPEECH & IMAGE PROCESSING-EURASIP , 2010, p. 984-988Conference paper (Refereed)
    Abstract [en]

    Periodicity is an important property of speech signals. It plays a critical role in speech communication, especially when strong background noise is present. This paper presents a novel framework of periodicity enhancement for noisy speech. The enhancement operates on the linear prediction error (residual) signal. The residual signal goes through a constant-pitch time warping process and two sequential lapped frequency transforms, by which the periodic component is concentrated in the first modulation band. By emphasizing the respective transform coefficients, periodicity enhancement of noisy residual signal is achieved. The enhanced residual signal and estimated linear prediction filter parameters are used to synthesize the speech output. The effectiveness of the proposed method is confirmed consistently by various objective measures and subjective listening tests. It is observed that the enhanced speech can restore the harmonic structure of the original speech.

  • 40. Huang, F.
    et al.
    Lee, T.
    Kleijn, W. Bastiaan
    KTH, School of Electrical Engineering (EES), Sound and Image Processing (Closed 130101).
    Transform-domain speech periodicity enhancement with adaptive coefficient weighting2011In: 2011 International Symposium on Intelligent Signal Processing and Communications Systems: "The Decade of Intelligent and Green Signal Processing and Communications", ISPACS 2011, 2011Conference paper (Refereed)
    Abstract [en]

    In our previous study of speech periodicity enhancement, the linear prediction residual signal was decomposed into periodic and aperiodic components using two-stage transforms. In the transform domain, the periodic component of the signal is concentrated and represented by a small portion of the coefficients. The respective coefficients were weighted and emphasized to enhance periodicity of the signal against noise. Fixed weights were used for different sets of the coefficients. With the fixed weights, it is observed that unvoiced and voiced-unvoiced transition signals are excessively attenuated and perceptible artificial periodicity are generated in these speech segments. In this study, we propose an adaptive weighting method. For voiced speech, the periodic component is strong and the respective transform coefficients shows a high energy level. In contrast, for unvoiced speech, periodicity is weak and the corresponding coefficients are small. The weights for the coefficients are adaptively adjusted according to the energy level of the periodic component. With the adaptive weights, the periodic component of voiced speech can be effectively emphasized and restored while the aperiodic parts in unvoiced speech are retained.

  • 41. Huang, F.
    et al.
    Lee, T.
    Kleijn, W. Bastiaan
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Transform-domain Wiener filter for speech periodicity enhancement2012In: Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on, IEEE , 2012, p. 4577-4580Conference paper (Refereed)
    Abstract [en]

    In this paper, we present a transform-domain Wiener filtering approach for enhancing speech periodicity. The enhancement is performed on the linear prediction residual signal. Two sequential lapped frequency transforms are applied to the residual in a pitch-synchronous manner. The residual signal is effectively represented by two separate sets of transform coefficients that correspond to the periodic and aperiodic components, respectively. A Wiener filter operating on the transform coefficients is developed to restore periodicity and reduce noise. Different filter parameters are designed for the transform coefficients of the periodic and aperiodic components. A template-driven method is used to estimate the filter parameters for the periodic component. For the aperiodic components, the filter parameters are computed based on a local SNR for effective noise reduction. Experimental results confirm that the harmonic structure of the signal can be effectively restored with the proposed approach.

  • 42. Karam, Lina J.
    et al.
    Kleijn, W. Bastiaan
    KTH, School of Electrical Engineering (EES), Communication Theory.
    MacLean, Karon
    Perception-Based Media Processing2013In: Proceedings of the IEEE, ISSN 0018-9219, E-ISSN 1558-2256, Vol. 101, no 9, p. 1900-1904Article in journal (Other academic)
    Abstract [en]

    The articles in this special issue provide a timely reviewof the state of the art in the areas of perception-based audio, visual, and haptic processing.

  • 43.
    Kim, Moo Young
    et al.
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Kleijn, W. Bastiaan
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Comparative rate-distortion performance of multiple description coding for real-time audiovisual communication over the Internet2006In: IEEE Transactions on Communications, ISSN 0090-6778, E-ISSN 1558-0857, Vol. 54, no 4, p. 625-636Article in journal (Refereed)
    Abstract [en]

    To facilitate real-time audiovisual communication through the Internet, forward error correction (FEC) and multiple description coding (MDC) can be used as low-delay packet-loss recovery techniques. We use both a Gilbert channel model and data obtained from real IP connections to compare the rate-distortion performance of different variants of FEC and MDC. Using identical overall rates with stringent delay constraints, we find that side-distortion optimized MDC generally performs better than Reed-Solomon-based FEC. If the channel condition is known from feedback, then channel-optimized MDC can be used to exploit this information, resulting in significantly improved performance. Our results confirm that two-independent-channel transmission is preferred to single-channel transmission, both for FEC and MDC.

  • 44.
    Kim, Moo young
    et al.
    KTH, Superseded Departments, Signals, Sensors and Systems.
    Kleijn, W. Bastiaan
    KTH, Superseded Departments, Signals, Sensors and Systems.
    Comparison of transmitter-based packet-loss recovery techniques for voice transmission2004In: 8th International Conference on Spoken Language Processing, ICSLP 2004, International Speech Communication Association, 2004, p. 641-644Conference paper (Refereed)
    Abstract [en]

    To facilitate real-time voice communication through the Internet, forward error correction (FEC) and multiple description coding (MDC) can be used as low-delay packet-loss recovery techniques. We use both a Gilbert channel model and data obtained from real IP connections to compare the rate-distortion performance of different variants of FEC and MDC. Using identical overall rates with stringent delay constraints, we find that side-distortion optimized MDC generally performs better than Reed-Solomon based FEC. If the channel condition is known from feedback through the Real-Time Control Protocol (RTCP), then channel-optimized MDC can be used to exploit this information, resulting in significantly improved performance.

  • 45.
    Kim, Moo Young
    et al.
    KTH, Superseded Departments, Signals, Sensors and Systems.
    Kleijn, W. Bastiaan
    KTH, Superseded Departments, Signals, Sensors and Systems.
    KLT-based adaptive classified VQ of the speech signal2004In: IEEE transactions on speech and audio processing, ISSN 1063-6676, E-ISSN 1558-2353, Vol. 12, no 3, p. 277-289Article in journal (Refereed)
    Abstract [en]

    Compared to scalar quantization (SQ), vector quantization (VQ) has memory space-filling, and shape advantages. If the signal statistics are known, direct vector quantization (DVQ) according to these statistics provides the highest coding efficiency, but requires unmanageable storage requirements if the statistics are time varying. In code-excited linear predictive (CELP) coding, a single compromise codebook is trained in the excitation-domain and the space-filling and shape advantages of VQ are utilized in a nonoptimal, average sense. In this paper, we propose Karhunen-Loeve transform (KLT)-based adaptive classified VQ (CVQ), where the space-filling advantage can be utilized since the Voronoi-region shape is not affected by the KLT. The memory and shape advantages can be also used, since each codebook is designed based on a narrow class of KLT-domain statistics. We further improve basic KLT-CVQ with companding. The companding utilizes the shape advantage of VQ more efficiently. Our experiments show that KLT-CVQ provides a higher SNR than basic CELP coding, and has a computational complexity similar to DVQ and much lower than CELP. With companding, even singie-class KLT-CVQ outperforms CELP, both in terms of SNR and codebook search complexity.

  • 46. Kim, Moo Young
    et al.
    Kleijn, W. Bastiaan
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Reduction of the Impact of Distortion Outliers and Source Mismatch in Resolution-Constrained Quantization2010In: IEEE TRANS AUDIO SPEECH LANG, ISSN 1558-7916, Vol. 18, no 6, p. 1218-1227Article in journal (Refereed)
    Abstract [en]

    The rate-distortion performance of conventional resolution-constrained quantization (RCQ) based on the mean-squared error criterion (MSE-RCQ) is generally compromised by the impact of distortion outliers and source mismatch. Not only the mean distortion, but also the number of distortion outliers should be considered in quantizer design. Thus, we propose the use of a design criterion that gives more importance to the tail of the source distribution, which leads to RCQ based on the second moment of distortion (SMD-RCQ). A continuous range of alternatives between MSE-RCQ and SMD-RCQ is also defined and implemented based on the weighted arithmetic-mean measure (WAM-RCQ). It can be used to control the centroid density in the tail of the source distribution. Experimental results with a Gaussian source and line spectral frequencies (LSFs) show that the proposed WAM-RCQ not only produces a similar mean distortion as conventional MSE-RCQ, but has a lower percentage of distortion outliers and a significantly reduced sensitivity to source mismatch.

  • 47. Kim, Moo Young
    et al.
    Kleijn, W. Bastiaan
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Resolution-constrained quantization with JND-based perceptual-distortion measures2006In: IEEE Signal Processing Letters, ISSN 1070-9908, E-ISSN 1558-2361, Vol. 13, no 11, p. 703-706Article in journal (Refereed)
    Abstract [en]

    When the squared error of observable signal parameters is below the just noticeable difference (JND), it is not registered by human perception. We modify commonly used distortion criteria to account for this phenomenon and study the implications for quantizer design and performance. Taking the JND into account in the design of the quantizer generally leads to improved performance in terms of mean distortion and the number of outliers. Moreover, the resulting quantizer exhibits better robustness against source mismatch.

  • 48.
    Kleijn, W. Bastiaan
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Representing speech2015In: European Signal Processing Conference, European Signal Processing Conference, EUSIPCO , 2015, no MarchConference paper (Refereed)
    Abstract [en]

    The properties of the speech production process and the auditory periphery have led to the usage of similar speech signal representations for various processing tasks such as speech and speaker recognition, speech synthesis, and speech coding. The representation is generally divided into a description of the vocal-tract transfer function and the excitation source. For recognition purposes, the biased characterization of the vocal-tract transfer function by a time sequence of low-dimension cepstral vectors performs well. For coding and synthesis, we argue that for the vocal-tract transfer function autoregressive (AR) models are more effective than filter banks, while for the excitation source pitch-synchronous filter banks and modulation-domain filters are most effective. A clear trend exists towards the exploitation of the time variation of both the vocal-tract transfer function and the excitation source.

  • 49.
    Kleijn, W. Bastiaan
    KTH, Superseded Departments, Signals, Sensors and Systems.
    Signal processing representations of speech2003In: IEICE transactions on information and systems, ISSN 0916-8532, E-ISSN 1745-1361, Vol. E86D, no 3, p. 359-376Article, review/survey (Refereed)
    Abstract [en]

    Synergies in processing requirements and knowledge of human speech production and perception have led to a similarity of the speech signal representations used for the tasks of recognition, coding, and modification. The representations are generally composed of a description of the vocal-tract transfer function and, in the case of coding and modification, a description of the excitation signal. This paper provides an overview of commonly used representations. For coding and modification, autoregressive models represented by line spectral frequencies perform well for the vocal tract, and pitch-synchronous filter banks and modulation-domain filters perform well for the excitation. For recognition, good representations are based on a smoothed magnitude response of the vocal tract.

  • 50.
    Kleijn, W. Bastiaan
    et al.
    KTH, Superseded Departments, Signals, Sensors and Systems.
    Backstrom, T.
    Alku, P.
    On line spectral frequencies2003In: IEEE Signal Processing Letters, ISSN 1070-9908, E-ISSN 1558-2361, Vol. 10, no 3, p. 75-77Article in journal (Refereed)
    Abstract [en]

    The commonly used line spectral frequencies form the roots of symmetric and antisymmetric polynomials constructed from a linear predictor. In this letter, we provide a new, simpler proof that the symmetric and antisymmetric polynomials can be regarded as optimal constrained predictors that correspond to predicting from the low-pass and high-pass filtered signal, respectively.

123 1 - 50 of 142
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf