Postfilters are commonly used in speech coding for the attenuation of quantization noise. In the presence of acoustic background noise or distortion due to tandeming operations, the postfilter parameters are not adjusted and the performance is, therefore, not optimal. We propose a modification that consists of replacing the nonadaptive postfilter parameters with parameters that adapt to variations in spectral flatness, obtained from the noisy speech. This generalization of the postfiltering concept can handle a larger range of noise conditions, but has the same computational complexity and memory requirements as the conventional postfilter. Test results indicate that the presented algorithm improves on the standard postfilter, as well as on the combination of a noise attenuation preprocessor and the conventional postfilter.
A framework for flexible and efficient coding of general stereo audio signals is proposed. Methods based on the framework can be used together with an arbitrary single channel (mono) coder to achieve seamless transition from pure parametric stereo coding to waveform approximating coding as the bitrate is increased. The idea, based on sum-difference encoding of time-aligned signal components, is presented as a general framework. An example implementation is demonstrated to have the desired convergence properties towards transparent quality.
Continuous monitoring of audio-visual context on mobile devices requires algorithms with gentle demands on computational resources. Existing feature selection strategies for classification do not account for the complexity associated with feature extraction. We present a complexity-constrained feature selection algorithm that is independent of the classifier architecture and demonstrate that it leads to superior feature sets if the allowed computational complexity is limited.
Classification on mobile devices is often done in an uninterrupted fashion. This requires algorithms with gentle demands on the computational complexity. The performance of a classifier depends heavily on the set of features used as input variables. Existing feature selection strategies for classification aim at finding a "best" set of features that performs well in terms of classification accuracy, but are not designed to handle constraints on the computational complexity. We demonstrate that an extension of the performance measures used in state-of-the-art feature selection algorithms with a penalty on the feature extraction complexity leads to superior feature sets if the allowed computational complexity is limited. Our solution is independent of a particular classification algorithm.
Perceptually optimal processing of speech and audio signals demands distortion measures that are based on sophisticated auditory models. High-rate theory can simplify these models by means of a sensitivity matrix. We present a method to derive the sensitivity matrix for distortion measures based on spectro-temporal auditory models under the assumption of small errors. This method is applied to an example auditory model and the region of validity of the approximation as well as a way to analyze the characteristics of the model with subspace methods are discussed.
An algorithm for multiple description coding (MDC) based on Gaussian mixture models (GMMs) is presented. Based on the parameters of the GMM, the algorithm combines MDC scalar quantizers, yielding a source-optimized vector MDC system. The performance is evaluated on a speech spectrum source in terms of mean-squared error and log spectral distortion. It is demonstrated experimentally that the proposed system outperforms single description coding and repetition coding over a wide range of channel failure probabilities. The proposed algorithm has a complexity that is linear in rate and dimension while retaining a near optimal vector quantizer point density.
Traditionally, sound codecs have been developed with a particular application in mind, their performance being optimized for specific types of input signals, such as speech or audio (music), and application constraints, such as low bit rate, high quality, or low delay. There is, however, an increasing need for more generic sound codecs, created by the emergence of heterogeneous networks and the convergence of communication and entertainment devices. To obtain such versatility, this study employs hybrid sound coding based on operational rate-distortion (RD) optimization principles. Applying this concept, a prototype coder has been implemented with emphasis on (dynamic) adaptation to the input and to application constraints. With this prototype, listening tests have been performed for different application scenarios. The results demonstrate the versatility of the concept while keeping competitive sound quality compared to dedicated state-of-the-art codecs.