kth.sePublications
Change search
Refine search result
12 1 - 50 of 79
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1. Berninger, Erik
    et al.
    Olofsson, Åke
    Leijon, Arne
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Analysis of Click-Evoked Auditory Brainstem Responses Using Time Domain Cross-Correlations Between Interleaved Responses2014In: Ear and Hearing, ISSN 0196-0202, E-ISSN 1538-4667, Vol. 35, no 3, p. 318-329Article in journal (Refereed)
    Abstract [en]

    Objectives: The rapidly evolving field of early diagnostics after the introduction of newborn hearing screening requires rapid, valid, and objective methods, which have to be thoroughly evaluated in adults before use in infants. The aim was to study cross-correlation analysis of interleaved auditory brainstem responses (ABRs) in a wide dynamic range in normal-hearing adults. Off-line analysis allowed for comparison with psychoacoustical click threshold (PCT), pure-tone threshold, and determination of ABR input/output function. Specifically, nonfiltered and band-pass filtered ABRs were studied in various time segments along with time elapsed for ensemble of sweeps reaching a specific detection criterion. Design: Fourteen healthy normal-hearing subjects (18 to 35 years of age, 50% females) without any history of noise exposure participated. They all had pure-tone thresholds better than 20 dB HL (125 to 8000 Hz). ABRs were recorded in both ears using 100 sec clicks, from 71.5 dB nHL down to -18.5 dB nHL, in 10 dB steps (repetition rate, 39 Hz; time window, 15 msec; filter, 30 to 8000 Hz). The number of sweeps increased from 2000 at 71.5 dB nHL, up to 30000 at -18.5 dB nHL. Each sweep was stored in a data base for off-line analysis. Cross-correlation analysis between two subaverages of interleaved responses was performed in the time domain for nonfiltered and digitally band-pass filtered (300 to 1500 Hz) entire and time-windowed (1 to 11 and 5 to 11 msec) responses. PCTs were measured using a Bekesy technique with the same insert phone and stimulus as used for the ABR (repetition rate, 20 Hz). Time elapsed (approximate to number of accepted sweeps/repetition rate) for the ensemble of sweeps needed to reach a cross-correlation coefficient () of 0.70 (=3.7 dB signal-to-noise ratio [SNR]) was analyzed. Results: Mean cross-correlation coefficients exceeded 0.90 in both ears at stimulus levels 11.5 dB nHL for the entire nonfiltered ABR. At 1.5 dB nHL, mean(SD) was 0.53(0.32) and 0.44(0.40) for left and right ears, respectively (n = 14) (=0 dB SNR). In comparison, mean(SD) PCT was -1.9(2.9) and -2.5(3.2) dB nHL for left and right ears, respectively (n = 14), while mean pure-tone average (500 to 2000 Hz) was 2.5 dB HL (n = 28). Almost no effect of band-pass filtering or reduced analysis time window existed. Average time elapsed needed to reach = 0.70 was approximately 20 seconds or less at stimulus levels 41.5 dB nHL, and approximate to 30 seconds at 31.5 dB nHL. The average (interpolated) stimulus level corresponding to =0.70 for the entire nonfiltered ABR was 6.5 dB nHL (n = 28), which coincided with the estimated psychoacoustical threshold for single clicks. Conclusions: ABR could be identified in a short period of time using cross-correlation analysis between interleaved responses. The average stimulus level corresponding to 0 dB SNR in the entire nonfiltered ABR occurred at 1.5 dB nHL, 4 dB above the average PCT. The mean input/output function for the ensemble of sweeps required to reach = 0.70 increased monotonically with increasing stimulus level, in parallel with the ABR based on all sweeps (1.5 dB nHL). Time domain cross-correlation analysis of ABR might form the basis for automatic response identification and future threshold-seeking procedures.

  • 2. Dahlquist, M.
    et al.
    Lutman, M. E.
    Wood, S.
    Leijon, Arne
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Methodology for quantifying perceptual effects from noise suppression systems2005In: International Journal of Audiology, ISSN 1499-2027, E-ISSN 1708-8186, Vol. 44, no 12, p. 721-732Article in journal (Refereed)
    Abstract [en]

    Methodology is proposed for perceptual assessment of both subjective sound quality and speech recognition in such way that results can be compared between these two aspects, Validation is performed with a noise suppression system applied to hearing instruments. A method termed Interpolated Paired Comparison Rating (IPCR) was developed for time efficient assessment of subjective impression of different aspects of sound quality for a variety of noise conditions. The method is based on paired comparisons between processed and unprocessed stimuli, and the results are expressed as the difference in signal-to-noise ratio (dB) between these that give equal subjective impression. For tests of speech recognition in noise, validated adaptive test methods can be used that give results in terms of speech-to-noise ratio. The methodology was shown to be sensitive enough to detect significant mean differences between processed and unprocessed speech in noise, both regarding subjective sound quality and speech recognition ability in groups consisting of 30 subjects. An effect on sound quality from the noise suppression equivalent to about 3-4 dB is required to be statistically significant for a single subject. A corresponding effect of 3-6 dB is required for speech recognition (one-sided test). The magnitude of difference that occurred in the present study for sound quality was sufficient to show significant differences for sound quality within individuals, but this was not the case for speech recognition.

  • 3. Eneman, K.
    et al.
    Luts, H.
    Wouters, J.
    Büchler, M.
    Dillier, N.
    Dreschler, W.
    Froehlich, M.
    Grimm, G.
    Hohmann, V.
    Houben, Rolph
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Leijon, Arne
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Lombard, A.
    Mauler, D.
    Moonen, M.
    Puder, H.
    Schulte, M.
    Spriet, A.
    Vormann, M.
    Evaluation of signal enhancement algorithms for hearing instruments2008Conference paper (Refereed)
    Abstract [en]

    In the frame of the HearCom1 project five promising signal enhancement algorithms are validated for future use in hearing instrument devices. To assess the algorithm performance solely based on simulation experiments, a number of physical evaluation measures have been proposed that incorporate basic aspects of normal and impaired human hearing. Additionally, each of the algorithms has been implemented on a common real-time hardware/software platform, which facilitates a profound subjective validation of the algorithm performance. Recently, a multicenter study has been set up across five different test centers in Belgium, the Netherlands, Germany and Switzerland to perceptually evaluate the selected signal enhancement approaches with normally hearing and hearing impaired listeners.

  • 4.
    Eneman, Koen
    et al.
    Katholieke Univ Leuven.
    Leijon, Arne
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Doclo, Simon
    Univ. Oldenburg.
    Spriet, Ann
    Katholieke Univ Leuven.
    Moonen, Marc
    Kathlieke Univ Leuven.
    Wouters, Jan
    Katholieke Univ Leuven.
    Auditory-Profile-Based Physical Evaluation of Multi-Microphone Noise Reduction Techniques in Hearing Instruments2008In: Advances in Digital Speech Transmission / [ed] Rainer Martin, Ulrich Heute, Christiane Antweiler, John Wiley & Sons, 2008, p. 431-458Chapter in book (Other academic)
  • 5.
    Henter, Gustav Eje
    et al.
    KTH, School of Electrical Engineering (EES), Communication Theory. The University of Edinburgh, United Kingdom.
    Leijon, Arne
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Kleijn, W. Bastiaan
    KTH, School of Electrical Engineering (EES), Communication Theory. Victoria University of Wellington, New Zealand.
    Kernel Density Estimation-Based Markov Models with Hidden StateManuscript (preprint) (Other academic)
    Abstract [en]

    We consider Markov models of stochastic processes where the next-step conditional distribution is defined by a kernel density estimator (KDE), similar to certain time-series bootstrap schemes from the economic forecasting literature. The KDE Markov models (KDE-MMs) we discuss are nonlinear, nonparametric, fully probabilistic representations of stationary processes with strong asymptotic convergence properties. The models generate new data simply by concatenating points from the training data sequences in a context-sensitive manner, with some added noise. We present novel EM-type maximum-likelihood algorithms for data-driven bandwidth selection in KDE-MMs. Additionally, we augment the KDE-MMs with a hidden state, yielding a new model class, KDE-HMMs. The added state-variable enables long-range memory and signal structure representation, complementing the short-range correlations captured by the Markov process. This is compelling for modelling complex real-world processes such as speech and language data. The paper presents guaranteed-ascent EM-update equations for model parameters in the case of Gaussian kernels, as well as relaxed update formulas that greatly accelerate training in practice. Experiments demonstrate increased held-out set probability for KDE-HMMs on several challenging natural and synthetic data series, compared to traditional techniques such as autoregressive models, HMMs, and their combinations.

  • 6.
    Hongmei, Hu
    et al.
    ISVR, University of Southampton.
    Mohammadiha, Nasser
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Taghia, Jalil
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Leijon, Arne
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Lutman, Mark E
    ISVR, University of Southampton.
    Wang, Shouyan
    ISVR, University of Southampton.
    Sparsity level in a non-negative matrix factorization based speech strategy in cochlear implants2012In: 2012 Proceedings Of The 20th European Signal Processing Conference (EUSIPCO), IEEE Computer Society, 2012, p. 2432-2436Conference paper (Refereed)
    Abstract [en]

    Non-negative matrix factorization (NMF) has increasinglybeen used as a tool in signal processing in the last years, butit has not been used in the cochlear implants (CIs). Toimprove the performance of CIs in noisy environments, anovel sparse strategy is proposed by applying NMF onenvelops of 22 channels. In the new algorithm, the noisyspeech is first transferred to the time-frequency domain viaa 22- channel filter bank and the envelope in each frequencychannel is extracted; secondly, NMF is applied to theenvelope matrix (envelopegram); finally, the sparsitycondition is applied to the coefficient matrix to get moresparse representation. Speech reception threshold (SRT)subjective experiment was performed in combination withfive objective measurements in order to choose the properparameters for the sparse NMF model.

    Download full text (pdf)
    fulltext
  • 7.
    Jalil, Taghia
    et al.
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Arne, Leijon
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Variational Inference for Watson Mixture ModelIn: IEEE Transactions on Pattern Analysis and Machine Intelligence, ISSN 0162-8828, E-ISSN 1939-3539Article in journal (Other academic)
    Abstract [en]

    This paper addresses modelling data using the multivariate Watson distributions. The Watson distribution is one of thesimplest distributions for analyzing axially symmetric data. This distribution has gained some attention in recent years due to itsmodeling capability. However, its Bayesian inference is fairly understudied due to difficulty in handling the normalization factor. Recentdevelopment of Monte-Carlo Markov chain (MCMC) sampling methods can be applied for this purpose. However, these methods canbe prohibitively slow for practical applications. A deterministic alternative is provided by variational methods that convert inferenceproblems into optimization problems. In this paper, we present a variational inference for Watson mixture model. First, the variationalframework is used to side-step the intractability arising from the coupling of latent states and parameters. Second, the variational freeenergy is further lower bounded in order to avoid intractable moment computation. The proposed approach provides a lower bound onthe log marginal likelihood and retains distributional information over all parameters. Moreover, we show that it can regulate its owncomplexity by pruning unnecessary mixture components while avoiding over-fitting. We discuss potential applications of the modelingwith Watson distributions in the problem of blind source separation, and clustering gene expression data sets.

  • 8.
    Leijon, Arne
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Articulation index and Shannon mutual information2007In: HEARING: FROM SENSORY PROCESSING TO PERCEPTION / [ed] Kollmeier, B; Hohmann, V; Mauermann, M; Verhey, J; Klump, G; Langemann, U; Uppenkamp, S, BERLIN: SPRINGER-VERLAG BERLIN , 2007, p. 525-532Conference paper (Refereed)
    Abstract [en]

    The Articulation Index (AI), later revised and standardized as the Speech Intelligibility Index (SII), and the Speech Transmission Index (STI) have been successful in predicting speech intelligibility from acoustic measurements. Both approaches calculate the index as sum of additive audibility contributions from different frequency bands. Allen (2003) noted that a similar additivity property also holds for Shannon’s information-theoretic concept of Channel Capacity. Allen showed that the contributions to channel capacity are (approximately) linearly related to the signal-to-noise ratio (in dB), just like the audibility contributions to the AI, and suggested that the AI is actually a kind of channel-capacity measure. This would be a fundamental information-theoretical basis for the empirical success of AI theory.

  • 9.
    Leijon, Arne
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Bayesian learning of Gaussian mixtures: Variational "over-pruning" revisited2013Report (Other academic)
    Abstract [en]

    This study reconsiders two simple toy data examples proposed by MacKay (2001) to illustrate what he called “symmetry-breaking” and inappropriate “over-pruning” by the variational inference (VI) approximation in Bayesian learning of probabilistic mixture models.

    The exact Bayesian solution is derived formally, including the effects of parameter values in the prior distribution of mixture weights. The exact solution is then compared to the results of VI approximation.

    In both toy examples both the exact solution and the VI approxi- mation normally assigned each data cluster entirely to its own mixture component. In both methods the number of active mixture components is normally the same as the number of data clusters. In this sense, the VI approach causes no “over-pruning”. In one extreme example with two clusters with only 1 and 3 samples, and very small parameter values in the prior Dirichlet distribution of mixture weights, the exact Bayesian solution assigned all samples to the same component, i.e., with “over-pruning”, whereas the VI approximation still converged to a solution using both mixture components, i.e., with no “over-pruning”. Thus, if inappropriate over-pruning occurs, it is probably caused by inappropriate selection of prior model parameters, and not by the VI approach.

    The VI approximation shows “symmetry-breaking” because it converges to one of the arbitrary and equivalent permutations of the indices of mixture components. The “symmetric” exact solution formally in- cludes all these permutations, but this is precisely what makes the exact Bayesian solution computationally impractical. Thus, in these toy examples, we must conclude that “symmetry-breaking” is not the same thing as “over-pruning”. The VI approximation shows “symmetry-breaking” but no “over-pruning”.

    Download full text (pdf)
    fulltext
  • 10.
    Leijon, Arne
    KTH, School of Electrical Engineering (EES).
    Comment on Ohlenforst et al. (2016) Exploring the Relationship Between Working Memory, Compressor Speed, and Background Noise Characteristics, Ear Hear 37, 137-1432017In: Ear and Hearing, ISSN 0196-0202, E-ISSN 1538-4667, Vol. 38, no 5, p. 643-644Article in journal (Refereed)
  • 11.
    Leijon, Arne
    KTH, Superseded Departments (pre-2005), Signals, Sensors and Systems.
    Estimation of sensory information transmission using a hidden Markov model of speech stimuli2002In: Acta Acustica United with Acustica, ISSN 1436-7947, Vol. 88, no 3, p. 423-432Article in journal (Refereed)
    Abstract [en]

    A method is presented which gives good approximate estimates of the rate. of information (in bits/s) successfully transmitted from a speech source to the modelled neural output of the peripheral sensory system. This information rate sets definite upper limits on the listener's speech-recognition performance. The performance limits depend on the entropy and vocabulary size of the speech material. The estimates of sensory information rate can be used to evaluate to what extent a listeners' performance is limited by peripheral loss of information or by suboptimal central processing. Calculations for a Swedish sentence test material, with an excitation-pattern auditory model, were consistent with human speech recognition results in speech-shaped masking noise. This suggests that the scarcity of sensory information may be the primary limiting factor in this test condition. Similar calculations for low-pass- and high-pass-filtered clean speech indicated a higher sensory information rate than required for the listeners' actual performance. These results suggest that the speech recognition performance under masking and filtering may be limited by different mechanisms. The analysis also showed that the information in adjacent frequency bands is not additive.

  • 12.
    Leijon, Arne
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Dahlquist, Martin
    Widex AS, ORCA Europe, Bjorns Tradgardsgrand 1, SE-11621 Stockholm, Sweden..
    Smeds, Karolina
    Widex AS, ORCA Europe, Bjorns Tradgardsgrand 1, SE-11621 Stockholm, Sweden..
    Bayesian analysis of paired-comparison sound quality ratings2019In: Journal of the Acoustical Society of America, ISSN 0001-4966, E-ISSN 1520-8524, Vol. 146, no 5, p. 3174-3183Article in journal (Refereed)
    Abstract [en]

    This paper presents a method to analyze paired-comparison data including either binary or graded ordinal responses, with or without ties. The proposed method can use either of two classical choice models: (1) Thurstone case V, which assumes a Gaussian distribution of the sensory variables underlying listener decisions, or (2) the Bradley-Terry-Luce (BTL) model, which assumes a logistic distribution. The analysis method was validated using simulated paired-comparison experiments with known distributions of the sound-quality parameters in the simulated population from which "participants" were generated at random. The validation indicated that the Thurstone and BTL models give similar results close to the true values. The estimated credibility of a quality difference was slightly higher with the BTL model. The analysis results showed dramatically better precision when the response data included graded ordinal judgments instead of binary responses. Allowing tied responses also tended to improve precision. The method was also applied to data from a real evaluation of hearing-aid programs. The analysis revealed clinically interesting results with high statistical credibility, although the amount of test data was limited.

  • 13.
    Leijon, Arne
    et al.
    KTH, School of Computer Science and Communication (CSC).
    Dillon, H.
    Hickson, L.
    Kinkel, M.
    Kramer, S. E.
    Nordqvist, Peter
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Analysis of data from the International Outcome Inventory for Hearing Aids (IOI-HA) using Bayesian Item Response Theory2020In: International Journal of Audiology, ISSN 1499-2027, E-ISSN 1708-8186Article in journal (Refereed)
    Abstract [en]

    Objective: IOI-HA response data are conventionally analysed assuming that the ordinal responses have interval-scale properties. This study critically considers this assumption and compares the conventional approach with a method using Item Response Theory (IRT). Design: A Bayesian IRT analysis model was implemented and applied to several IOI-HA data sets. Study sample: Anonymised IOI-HA responses from 13273 adult users of one or two hearing aids in 11 data sets using the Australian English, Dutch, German and Swedish versions of the IOI-HA. Results: The raw ordinal responses to IOI-HA items do not represent values on interval scales. Using the conventional rating sum as an overall score introduces a scale error corresponding to about 10 − 15% of the true standard deviation in the population. Some interesting and statistically credible differences were demonstrated among the included data sets. Conclusions: It is questionable to apply conventional statistical measures like mean, variance, t-tests, etc., on the raw IOI-HA ratings. It is recommended to apply only nonparametric statistical test methods for comparisons of IOI-HA results between groups. The scale error can sometimes cause incorrect conclusions when individual results are compared. The IRT approach is recommended for analysis of individual results.

  • 14.
    Leijon, Arne
    et al.
    KTH, School of Electrical Engineering (EES). ORCA Europe Widex, Sweden.
    Henter, Gustav Eje
    Dahlquist, Martin
    Bayesian Analysis of Phoneme Confusion Matrices2016In: IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, ISSN 2329-9290, Vol. 24, no 3Article in journal (Refereed)
    Abstract [en]

    This paper presents a parametric Bayesian approach to the statistical analysis of phoneme confusion matrices measured for groups of individual listeners in one or more test conditions. Two different bias problems in conventional estimation of mutual information are analyzed and explained theoretically. Evaluations with synthetic datasets indicate that the proposed Bayesian method can give satisfactory estimates of mutual information and response probabilities, even for phoneme confusion tests using a very small number of test items for each phoneme category. The proposed method can reveal overall differences in performance between two test conditions with better power than conventional Wilcoxon significance tests or conventional confidence intervals. The method can also identify sets of confusion-matrix cells that are credibly different between two test conditions, with better power than a similar approximate frequentist method.

  • 15.
    Leijon, Arne
    et al.
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Stadler, Svante
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Fast amplitude compression in hearing aids improves audibility but degrades speech information transmission2008Conference paper (Refereed)
    Abstract [en]

    Common types of hearing impairment are caused mainly by a loss of nearly instantaneous compressive amplification in the inner ear. Therefore, it seems plausible that the loss might be compensated by fast frequency-dependent compression in the hearing aid. We simulated impaired listeners' auditory analysis of hearing-aid processed speech in noise using a functional auditory model. Using hidden Markov signal models, we estimated the mutual information between the phonetic structure of clean speech and the neural output from the auditory model, with fast and slow versions of hearing-aid compression. The long-term speech spectrum of amplified sound was identical in both systems, as specified individually by the widely accepted NAL prescription for the gain frequency response. The calculation showed clearly better speech-to-auditory information transmission with slow quasi-linear amplification than with fast hearing-aid compression, for speech in speech-shaped noise at signal-to-noise ratios ranging from -10 to +20 dB.

  • 16.
    Leijon, Arne
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    von Gablenz, Petra
    Institute of Hearing Technology and Audiology, Jade University of Applied Sciences, Oldenburg, Germany.
    Holube, Inga
    Institute of Hearing Technology and Audiology, Jade University of Applied Sciences, Oldenburg, Germany.
    Taghia, Jalil
    KTH, School of Electrical Engineering and Computer Science (EECS).
    Smeds, Karolina
    ORCA Europe, WS Audiology, Stockholm, Sweden.
    Bayesian analysis of Ecological Momentary Assessment (EMA) data collected in adults before and after hearing rehabilitation2023In: Frontiers in Digital Health, E-ISSN 2673-253X, Vol. 5, article id 1100705Article in journal (Refereed)
    Abstract [en]

    This paper presents a new Bayesian method for analyzing Ecological Momentary Assessment (EMA) data and applies this method in a re-analysis of data from a previous EMA study. The analysis method has been implemented as a freely available Python package EmaCalc, RRID:SCR 022943. The analysis model can use EMA input data including nominal categories in one or more situation dimensions, and ordinal ratings of several perceptual attributes. The analysis uses a variant of ordinal regression to estimate the statistical relation between these variables. The Bayesian method has no requirements related to the number of participants or the number of assessments by each participant. Instead, the method automatically includes measures of the statistical credibility of all analysis results, for the given amount of data. For the previously collected EMA data, the analysis results demonstrate how the new tool can handle heavily skewed, scarce, and clustered data that were collected on ordinal scales, and present results on interval scales. The new method revealed results for the population mean that were similar to those obtained in the previous analysis by an advanced regression model. The Bayesian approach automatically estimated the inter-individual variability in the population, based on the study sample, and could show some statistically credible intervention results also for an unseen random individual in the population. Such results may be interesting, for example, if the EMA methodology is used by a hearing-aid manufacturer in a study to predict the success of a new signal-processing method among future potential customers.

  • 17. Luts, Heleen
    et al.
    Eneman, Koen
    Wouters, Jan
    Schulte, Michael
    Vormann, Matthias
    Buechler, Michael
    Dillier, Norbert
    Houben, Rolph
    Dreschler, Wouter A.
    Froehlich, Matthias
    Puder, Henning
    Grimm, Giso
    Hohmann, Volker
    Leijon, Arne
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Lombard, Anthony
    Mauler, Dirk
    Spriet, Ann
    Multicenter evaluation of signal enhancement algorithms for hearing aids2010In: Journal of the Acoustical Society of America, ISSN 0001-4966, E-ISSN 1520-8524, Vol. 127, no 3, p. 1491-1505Article in journal (Refereed)
    Abstract [en]

    In the framework of the European HearCom project, promising signal enhancement algorithms were developed and evaluated for future use in hearing instruments. To assess the algorithms' performance, five of the algorithms were selected and implemented on a common real-time hardware/software platform. Four test centers in Belgium, The Netherlands, Germany, and Switzerland perceptually evaluated the algorithms. Listening tests were performed with large numbers of normal-hearing and hearing-impaired subjects. Three perceptual measures were used: speech reception threshold (SRT), listening effort scaling, and preference rating. Tests were carried out in two types of rooms. Speech was presented in multitalker babble arriving from one or three loudspeakers. In a pseudo-diffuse noise scenario, only one algorithm, the spatially preprocessed speech-distortion-weighted multi-channel Wiener filtering, provided a SRT improvement relative to the unprocessed condition. Despite the general lack of improvement in SRT, some algorithms were preferred over the unprocessed condition at all tested signal-to-noise ratios (SNRs). These effects were found across different subject groups and test sites. The listening effort scores were less consistent over test sites. For the algorithms that did not affect speech intelligibility, a reduction in listening effort was observed at 0 dB SNR. (C) 2010 Acoustical Society of America. [DOI: 10.1121/1.3299168]

  • 18.
    Ma, Zhanyu
    et al.
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Leijon, Arne
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    A model-based collaborative filtering method for bounded support data2012In: Proceedings - 2012 3rd IEEE International Conference on Network Infrastructure and Digital Content, IC-NIDC 2012, IEEE , 2012, p. 545-548Conference paper (Refereed)
    Abstract [en]

    Collaborative filtering (CF) is an important technique used in some recommendation systems. The task of CF is to estimate the persons' preferences (e.g., ratings) or to predict the preferences for the future, based on some already known persons' preferences. In general, the model-based CF performs better than the memory-based CF, especially for highly sparse data. In this paper, we present a new model-based CF method for bounded support data, which takes into account the facts that the ratings are usually in a limited interval. A nonnegative matrix factorization (NMF) model is applied to investigate and learn the patterns hidden in the observed data matrix. Each rating value is assumed to be beta distributed and we assign the gamma prior to the parameters in a beta distribution for the purpose of Bayesian estimation. With variation inference framework and some lower bound approximations, an analytically tractable solution can be obtained for the proposed NMF model. By comparing with several existing low-rank matrix approximation methods, the good performance of the proposed method is demonstrated.

  • 19.
    Ma, Zhanyu
    et al.
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Leijon, Arne
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    A Probabilistic Principal Component Analysis Based Hidden Markov Model For Audio-Visual Speech Recognition.2008In: CONF REC ASILOMAR CONF SIGNAL, 2008, p. 2170-2173Conference paper (Refereed)
    Abstract [en]

    Lipreading is an efficient method among those proposed to improve the performance of speech recognition systems, especially in acoustic noisy environments. This paper proposes a simple audio-visual speech recognition (AVSR) system, which could improve the robustness and accuracy of audio speech recognition by integrating the synchronous audio and visual information. We propose a hidden Markov model (HMM) based on the probabilistic principal component analysis (PCA) for the visual-only speech recognition and the visual modality of the audio-visual speech recognition. The probabilistic PCA based HMM directly uses the images which only contain the speaker's mouth region without pre-processing (mouth corner detection, contour marking, etc), and takes probabilistic PCA as the observation probability density function (PDF). Then we integrate these two modalities information (audio and visual) together and obtain a multi-stream hidden Markov model (MSHMM). We found that, without extracting the specialized features before processing, probabilistic PCA could capture the principal components during the training and describe the visual part of the materials. It is also verified by the experiments that the integration of the audio and visual information could help to improve the recognition accuracy even at a low acoustic signal-to-noisy ratio (SNR).

  • 20.
    Ma, Zhanyu
    et al.
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Leijon, Arne
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Approximating the predictive distribution of the beta distribution with the local variational method2011In: IEEE Intl. Workshop on Machine Learning for Signal Processing, 2011Conference paper (Refereed)
    Abstract [en]

    In the Bayesian framework, the predictive distribution is obtained by averaging over the posterior parameter distribution. When there is a small amount of data, the uncertainty of the parameters is high. Thus with the predictive distribution, a more reliable result can be obtained in the applications as classification, recognition, etc. In the previous works, we have utilized the variational inference framework to approximate the posterior distribution of the parameters in the beta distribution by minimizing the Kullback-Leibler divergence of the true posterior distribution from the approximating one. However, the predictive distribution of the beta distribution was approximated by a plug-in approximation with the posterior mean, regardless of the parameter uncertainty. In this paper, we carry on the factorized approximation introduced in the previous work and approximate the beta function by its first order Taylor expansion. Then the upper bound of the predictive distribution is derived by exploiting the local variational method. By minimizing the upper bound of the predictive distribution and after normalization, we approximate the predictive distribution by a probability density function in a closed form. Experimental results shows the accuracy and efficiency of the proposed approximation method.

  • 21.
    Ma, Zhanyu
    et al.
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Leijon, Arne
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Bayesian Estimation of Beta Mixture Models with Variational Inference2011In: IEEE Transactions on Pattern Analysis and Machine Intelligence, ISSN 0162-8828, E-ISSN 1939-3539, Vol. 33, no 11, p. 2160-2173Article in journal (Refereed)
    Abstract [en]

    Bayesian estimation of the parameters in beta mixture models (BMM) is analytically intractable. The numerical solutionsto simulate the posterior distribution are available, but incur high computational cost. In this paper, we introduce an approximation tothe prior/posterior distribution of the parameters in the beta distribution and propose an analytically tractable (closed-form) Bayesianapproach to the parameter estimation. The approach is based on the variational inference (VI) framework. Following the principles ofthe VI framework and utilizing the relative convexity bound, the extended factorized approximation method is applied to approximate thedistribution of the parameters in BMM. In a fully Bayesian model where all the parameters of the BMM are considered as variables andassigned proper distributions, our approach can asymptotically find the optimal estimate of the parameters posterior distribution. Also,the model complexity can be determined based on the data. The closed-form solution is proposed so that no iterative numericalcalculation is required. Meanwhile, our approach avoids the drawback of overfitting in the conventional expectation maximizationalgorithm. The good performance of this approach is verified by experiments with both synthetic and real data.

  • 22.
    Ma, Zhanyu
    et al.
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Leijon, Arne
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Beta Mixture Models And The Application To Image Classification2009In: 2009 16th IEEE International Conference On Image Processing, vols 1-6, 2009, p. 2021-2024Conference paper (Refereed)
    Abstract [en]

    Statistical pattern recognition is one of the most studied and applied approaches in the area of pattern recognition. Mixture modelling of densities is an efficient statistical pattern recognition method for continuous data. We propose a classifier based on the beta mixture models for strictly bounded and asymmetrically distributed data. Due to the property of the mixture modelling, the statistical dependence in a multi-dimensional variable is captured, even with the conditional independence assumption in each mixture component. A synthetic example and the USPS handwriting digit data was used to verify the effectiveness of this approach. Compared to the conventional Gaussian mixture models (GMM), the beta mixture models has a better performance on data which has strictly bounded value and asymmetric distribution. The performance of beta mixture models is about equivalent to that of GMM applied to data transformed via a strictly increasing link function.

  • 23.
    Ma, Zhanyu
    et al.
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Leijon, Arne
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    BG-NMF: a variational Bayesian NMF model for bounded support data2011Article in journal (Other academic)
    Abstract [en]

    In this paper, we present a new Bayesian nonnegative matrix factor-ization (NMF) method for bounded support data. The distribution of thebounded support data is modelled with the beta distribution. The parametersof the beta density function are considered as latent variables and factorizedinto two matrices (the basis matrix and the excitation matrix). Further-more, each entry in the factorized matrices is assigned with a gamma prior.Thus, we name this method as beta-gamma NMF (BG-NMF). Usually, theestimation of the posterior distribution does not have a closed-form solu-tion. With the variational inference framework and by taking the relativeconvexity property of the log-inverse-beta function, we derive a closed-formsolution to approximate the posterior distribution of the entries in the basisand the excitation matrices. Also, a sparse BG-NMF can be carried outby adding the sparseness constraint to the gamma prior. Evaluations withsynthetic data and real life data demonstrate that the proposed method isefficient for source separation, missing data prediction, and collaborativefiltering problems.

  • 24.
    Ma, Zhanyu
    et al.
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Leijon, Arne
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Coding bounded support data with beta distribution2010In: Proceedings - 2010 2nd IEEE International Conference on Network Infrastructure and Digital Content, IC-NIDC 2010, 2010, p. 246-250Conference paper (Refereed)
    Abstract [en]

    The probability density function (PDF) optimized quantization has been shown to be more efficient than the conventional quantization methods. In practical application, the data with bounded support can be modelled better with bounded support distribution (e.g. beta distribution, Dirichlet distribution) and a better quantization performance could be achieved by a more reasonable modelling. In this paper, we study the distortion rate (D-R) performance and the high rate quantization performance of the beta distribution. To implement a quantizer efficiently, a practical quantization scheme is proposed. The proposed scheme takes the advantages of conventional compander and exhaustive training. The advantage of the proposed scheme is verified with both theoretical experiment and practical application.

  • 25.
    Ma, Zhanyu
    et al.
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Leijon, Arne
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Expectation propagation for estimating the parameters of the beta distribution2010In: 2010 IEEE International Conference On Acoustics, Speech, And Signal Processing, 2010, p. 2082-2085Conference paper (Refereed)
    Abstract [en]

    Parameter estimation for the beta distribution is analytically intractable due to the integration expression in the normalization constant. For maximum likelihood estimation, numerical methods can be used to calculate the parameters. For Bayesian estimation, we can utilize different approximations to the posterior parameter distribution. A method based on the variational inference (VI) framework reported the posterior mean of the parameters analytically but the approximating distribution violated the correlation between the parameters. We now propose a method via the expectation propagation (EP) framework to approximate the posterior distribution analytically and capture the correlation between the parameters. Compared to the method based on VI, the EP based algorithm performs better with small amounts of data and is more stable.

  • 26.
    Ma, Zhanyu
    et al.
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Leijon, Arne
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Human Audio-Visual Consonant Recognition Analyzed with Three Bimodal Integration Models2009In: INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2009, p. 812-815Conference paper (Refereed)
    Abstract [en]

    With A-V recordings. ten normal hearing people took recognition tests at different signal-to-noise ratios (SNR). The AV recognition results are predicted by the fuzzy logical model of perception (FLMP) and the post-labelling integration model (POSTL). We also applied hidden Markov models (HMMs) and multi-stream HMMs (MSHMMs) for the recognition. As expected, all the models agree qualitatively with the results that the benefit gained from the visual signal is larger at lower acoustic SNRs. However, the FLMP severely overestimates the AV integration result, while the POSTL model underestimates it. Our automatic speech recognizers integrated the audio and visual stream efficiently. The visual automatic speech recognizer could be adjusted to correspond to human visual performance. The MSHMMs combine the audio and visual streams efficiently, but the audio automatic speech recognizer must be further improved to allow precise quantitative comparisons with human audio-visual performance.

  • 27.
    Ma, Zhanyu
    et al.
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Leijon, Arne
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Human skin color detection in RGB space with Bayesian estimation of beta mixture models2010In: EUSIPCO 2010, EUROPEAN ASSOC SIGNAL SPEECH & IMAGE PROCESSING-EURASIP , 2010, p. 1204-1208Conference paper (Refereed)
    Abstract [en]

    Human skin color detection plays an important role in the applicationsof skin segmentation, face recognition, and tracking. To builda robust human skin color classifier is an essential step. This paperpresents a classifier based on beta mixture models (BMM), whichuses the pixel values in RGB space as the features. We proposea Bayesian estimation method based on the variational inferenceframework to approximate the posterior distribution of the parametersin the BMM and take the posterior mean as a point estimateof the parameters. The well-known Compaq image database is usedto evaluate the performance of our BMM based classifier. Comparedto some other skin color detection methods, our BMM basedclassifier shows a better recognition performance.

  • 28.
    Ma, Zhanyu
    et al.
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Leijon, Arne
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Modelling Speech Line Spectral Frequencies with Dirichlet Mixture Models2010In: 11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All, INTERSPEECH 2010, 2010, p. 2370-2373Conference paper (Refereed)
    Abstract [en]

    In this paper, we model the underlying probability density function(PDF) of the speech line spectral frequencies (LSF) parameterswith a Dirichlet mixture model (DMM). The LSF parametershave two special features: 1) the LSF parameters havea bounded range; 2) the LSF parameters are in an increasingorder. By transforming the LSF parameters to the ΔLSF parameters,the DMM can be used to model the ΔLSF parametersand take advantage of the features mentioned above. Thedistortion-rate (D-R) relation is derived for the Dirichlet distributionwith the high rate assumption. A bit allocation strategyfor DMM is also proposed. In modelling the LSF parametersextracted from the TIMIT database, the DMM shows a betterperformance compared to the Gaussian mixture model, in termsof D-R relation, likelihood and model complexity. Since modellingis the essential and prerequisite step in the PDF-optimizedvector quantizer design, better modelling results indicate a superiorquantization performance.

  • 29.
    Ma, Zhanyu
    et al.
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Leijon, Arne
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    PDF-optimized LSF vector quantization based on beta mixture models2010In: Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010, 2010, p. 2374-2377Conference paper (Refereed)
    Abstract [en]

    The line spectral frequencies (LSF) are known to be the mostefficient representation of the linear predictive coding (LPC) parametersfrom both the distortion and perceptual point of view.By considering the bounded property of the LSF parameters,we apply beta mixture models (BMM) to model the distributionof the LSF parameters. Meanwhile, by following the principlesof probability density function (PDF) optimized vector quantization(VQ), we derive the bit allocation strategy for the BMM.The LSF parameters are obtained from the TIMIT database anda practical VQ is designed. By taking the Bayesian informationcriterion (BIC), the square error (SE) and the spectral distortion(SD) as the criteria, the BMM based VQ outperforms theGaussian mixture model based VQ with uncorrelated Gaussiancomponent (UGMVQ) by about 1-2 bits/vector.

  • 30.
    Ma, Zhanyu
    et al.
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Leijon, Arne
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Super-Dirichlet Mixture Models using Differential Line Spectral Frequences for Text-Independent Speaker Identification2011In: INTERSPEECH 2011, 2011Conference paper (Refereed)
    Abstract [en]

    A new text-independent speaker identification (SI) system is proposed. This system utilizes the line spectral frequencies (LSFs) as alternative feature set for capturing the speaker characteristics. The boundary and ordering properties of the LSFs are considered and the LSF are transformed to the differential LSF (DLSF) space. Since the dynamic information is useful for speaker recognition, we represent the dynamic information of the DLSFs by considering two neighbors of the current frame, one from the past frames and the other from the following frames. The current frame with the neighbor frames together are cascaded into a supervector. The statistical distribution of this supervector is modelled by the so-called super-Dirichlet mixture model, which is an extension from the Dirichlet mixture model. Compared to the conventional SI system, which is using the mel-frequency cepstral coefficients and based on the Gaussian mixture model, the proposed SI system shows a promising improvement.

  • 31.
    Ma, Zhanyu
    et al.
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Leijon, Arne
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Super-Dirichlet Mixture Models using Differential Line Spectral Frequencies for Text-Independent Speaker Identification2011In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2011, p. 2360-2363Conference paper (Refereed)
    Abstract [en]

    A new text-independent speaker identification (SI) system is proposed. This system utilizes the line spectral frequencies (LSFs) as alternative feature set for capturing the speaker char.: acteristics. The boundary and ordering properties of the LSFs are considered and the LSF are transformed to the differential LSF (DLSF) space. Since the dynamic information is useful for speaker recognition, we represent the dynamic information of the DLSFs by considering two neighbors of the current frame, one from the past frames and the other from the following frames. The current frame with the neighbor frames together are cascaded into a supervector. The statistical distribution of this supervector is modelled by the so-called super-Dirichlet mixture model, which is an extension from the Dirichlet mixture model. Compared to the conventional SI system, which is using the mel-frequency cepstral coefficients and based on the Gaussian mixture model, the proposed SI system shows a promising improvement.

  • 32.
    Ma, Zhanyu
    et al.
    Beijing University of Posts and Telecommunications, China.
    Leijon, Arne
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Kleijn, W. Bastiaan
    School of Engineering and Computer Science, Victoria University of Wellington, New Zealand.
    Vector Quantization of LSF Parameters With a Mixture of Dirichlet Distributions2013In: IEEE Transactions on Audio, Speech, and Language Processing, ISSN 1558-7916, E-ISSN 1558-7924, Vol. 21, no 9, p. 1777-1790Article in journal (Refereed)
    Abstract [en]

    Quantization of the linear predictive coding parameters is an important part in speech coding. Probability density function (PDF)-optimized vector quantization (VQ) has been previously shown to be more efficient than VQ based only on training data. For data with bounded support, some well-defined bounded-support distributions (e.g., the Dirichlet distribution) have been proven to outperform the conventional Gaussian mixture model (GMM), with the same number of free parameters required to describe the model. When exploiting both the boundary and the order properties of the line spectral frequency (LSF) parameters, the distribution of LSF differences (Delta LSF) can be modelled with a Dirichlet mixture model (DMM). We propose a corresponding DMM based VQ. The elements in a Dirichlet vector variable are highly mutually correlated. Motivated by the Dirichlet vector variable's neutrality property, a practical non-linear transformation scheme for the Dirichlet vector variable can be obtained. Similar to the Karhunen-Loeve transform for Gaussian variables, this non-linear transformation decomposes the Dirichlet vector variable into a set of independent beta-distributed variables. Using high rate quantization theory and by the entropy constraint, the optimal inter-and intra-component bit allocation strategies are proposed. In the implementation of scalar quantizers, we use the constrained-resolution coding to approximate the derived constrained-entropy coding. A practical coding scheme for DVQ is designed for the purpose of reducing the quantization error accumulation. The theoretical and practical quantization performance of DVQ is evaluated. Compared to the state-of-the-art GMM-based VQ and recently proposed beta mixture model (BMM) based VQ, DVQ performs better, with even fewer free parameters and lower computational cost.

  • 33. Ma, Zhanyu
    et al.
    Leijon, Arne
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Tan, Zheng-Hua
    Gao, Sheng
    Predictive Distribution of the Dirichlet Mixture Model by Local Variational Inference2014In: Journal of Signal Processing Systems, ISSN 1939-8018, E-ISSN 1939-8115, Vol. 74, no 3, p. 359-374Article in journal (Refereed)
    Abstract [en]

    In Bayesian analysis of a statistical model, the predictive distribution is obtained by marginalizing over the parameters with their posterior distributions. Compared to the frequently used point estimate plug-in method, the predictive distribution leads to a more reliable result in calculating the predictive likelihood of the new upcoming data, especially when the amount of training data is small. The Bayesian estimation of a Dirichlet mixture model (DMM) is, in general, not analytically tractable. In our previous work, we have proposed a global variational inference-based method for approximately calculating the posterior distributions of the parameters in the DMM analytically. In this paper, we extend our previous study for the DMM and propose an algorithm to calculate the predictive distribution of the DMM with the local variational inference (LVI) method. The true predictive distribution of the DMM is analytically intractable. By considering the concave property of the multivariate inverse beta function, we introduce an upper-bound to the true predictive distribution. As the global minimum of this upper-bound exists, the problem is reduced to seek an approximation to the true predictive distribution. The approximated predictive distribution obtained by minimizing the upper-bound is analytically tractable, facilitating the computation of the predictive likelihood. With synthesized data and real data evaluations, the good performance of the proposed LVI based method is demonstrated by comparing with some conventionally used methods.

  • 34. Ma, Zhanyu
    et al.
    Rana, Pravin Kumar
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Taghia, Jalil
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Flierl, Markus
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Leijon, Arne
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Bayesian estimation of Dirichlet mixture model with variational inference2014In: Pattern Recognition, ISSN 0031-3203, E-ISSN 1873-5142, Vol. 47, no 9, p. 3143-3157Article in journal (Refereed)
    Abstract [en]

    In statistical modeling, parameter estimation is an essential and challengeable task. Estimation of the parameters in the Dirichlet mixture model (DMM) is analytically intractable, due to the integral expressions of the gamma function and its corresponding derivatives. We introduce a Bayesian estimation strategy to estimate the posterior distribution of the parameters in DMM. By assuming the gamma distribution as the prior to each parameter, we approximate both the prior and the posterior distribution of the parameters with a product of several mutually independent gamma distributions. The extended factorized approximation method is applied to introduce a single lower-bound to the variational objective function and an analytically tractable estimation solution is derived. Moreover, there is only one function that is maximized during iterations and, therefore, the convergence of the proposed algorithm is theoretically guaranteed. With synthesized data, the proposed method shows the advantages over the EM-based method and the previously proposed Bayesian estimation method. With two important multimedia signal processing applications, the good performance of the proposed Bayesian estimation method is demonstrated.

  • 35. Ma, Zhanyu
    et al.
    Taghia, Jalil
    Kleijn, W. Bastiaan
    Leijon, Arne
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Guo, Jun
    Line spectral frequencies modeling by a mixture of von Mises-Fisher distributions2015In: Signal Processing, ISSN 0165-1684, E-ISSN 1872-7557, Vol. 114, p. 219-224Article in journal (Refereed)
    Abstract [en]

    Efficient quantization of the linear predictive coding (LPC) parameters plays a key role in parametric speech coding. The line spectral frequency (LSF) representation of the LPC parameters has found its applications in speech model quantization. In practical implementations of vector quantization (VQ), probability density function optimized VQ has been shown to be more efficient than the VQ based on training data. In this paper, we present the LSF parameters by a unit vector form, which has directional characteristics. The underlying distribution of this unit vector variable is modeled by a von Mises-Fisher mixture model (VMM). An optimal inter-component bit allocation strategy is proposed based on high rate theory and a distortion-rate (D-R) relation is derived for the VMM based-VQ (VVQ). Experimental results show that the VVQ outperforms the recently introduced Dirichlet mixture model-based VQ and the conventional Gaussian mixture model-based VQ in terms of modeling performance and D-R relation.

  • 36. Ma, Zhanyu
    et al.
    Teschendorff, Andrew E.
    Leijon, Arne
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Qiao, Yuanyuan
    Zhang, Honggang
    Guo, Jun
    Variational Bayesian Matrix Factorization for Bounded Support Data2015In: IEEE Transactions on Pattern Analysis and Machine Intelligence, ISSN 0162-8828, E-ISSN 1939-3539, Vol. 37, no 4, p. 876-889Article in journal (Refereed)
    Abstract [en]

    A novel Bayesian matrix factorization method for bounded support data is presented. Each entry in the observation matrix is assumed to be beta distributed. As the beta distribution has two parameters, two parameter matrices can be obtained, which matrices contain only nonnegative values. In order to provide low-rank matrix factorization, the nonnegative matrix factorization (NMF) technique is applied. Furthermore, each entry in the factorized matrices, i.e., the basis and excitation matrices, is assigned with gamma prior. Therefore, we name this method as beta-gamma NMF (BG-NMF). Due to the integral expression of the gamma function, estimation of the posterior distribution in the BG-NMF model can not be presented by an analytically tractable solution. With the variational inference framework and the relative convexity property of the log-inverse-beta function, we propose a new lower-bound to approximate the objective function. With this new lower-bound, we derive an analytically tractable solution to approximately calculate the posterior distributions. Each of the approximated posterior distributions is also gamma distributed, which retains the conjugacy of the Bayesian estimation. In addition, a sparse BG-NMF can be obtained by including a sparseness constraint to the gamma prior. Evaluations with synthetic data and real life data demonstrate the good performance of the proposed method.

  • 37. Ma, Zhanyu
    et al.
    Xue, Jing-Hao
    Leijon, Arne
    KTH, School of Electrical Engineering (EES).
    Tan, Zheng-Hua
    Yang, Zhen
    Guo, Jun
    Decorrelation of Neutral Vector Variables: Theory and Applications2018In: IEEE Transactions on Neural Networks and Learning Systems, ISSN 2162-237X, E-ISSN 2162-2388, Vol. 29, no 1, p. 129-143Article in journal (Refereed)
    Abstract [en]

    In this paper, we propose novel strategies for neutral vector variable decorrelation. Two fundamental invertible transformations, namely, serial nonlinear transformation and parallel nonlinear transformation, are proposed to carry out the decorrelation. For a neutral vector variable, which is not multivariate-Gaussian distributed, the conventional principal component analysis cannot yield mutually independent scalar variables. With the two proposed transformations, a highly negatively correlated neutral vector can be transformed to a set of mutually independent scalar variables with the same degrees of freedom. We also evaluate the decorrelation performances for the vectors generated from a single Dirichlet distribution and a mixture of Dirichlet distributions. The mutual independence is verified with the distance correlation measurement. The advantages of the proposed decorrelation strategies are intensively studied and demonstrated with synthesized data and practical application evaluations.

  • 38. Magnusson, L.
    et al.
    Karlsson, M.
    Leijon, Arne
    KTH, Superseded Departments (pre-2005), Signals, Sensors and Systems.
    Predicted and measured speech recognition performance in noise with linear amplification2001In: Ear and Hearing, ISSN 0196-0202, E-ISSN 1538-4667, Vol. 22, no 1, p. 46-57Article in journal (Refereed)
    Abstract [en]

    Objective: The purpose of this study was to investigate the applicability of the Speech Intelligibility Index (SII) in hearing aid fitting. It was hypothesized that estimated speech intelligibility, based on the SII, could be a more reliable measure than real speech recognition results for comparing hearing aid characteristics. Design: The test subjects were 29 elderly persons (66 to 80 yr) with mild-to-moderate hearing loss, who were using monaurally fitted linear hearing aids. They were selected from the files at Sahlgrenska hearing clinic. Speech recognition scores were obtained at fixed speech-to-noise ratios with Phonemically Balanced (PB) words in speech-weighted noise and in low-frequency noise. A Just-Follow-Conversation (JFC) test was performed with connected speech presented in the same background noises. The subjects were tested without hearing aid and with their hearing aids set at three different frequency responses. Predicted speech recognition scores were calculated for each condition based on the SII, complemented with a correction for sensorineural hearing impairment. The calculations involved speech and noise spectra, pure tone thresholds and insertion gain responses. Results: For each condition, the measured speech recognition scores mere, on average, well predicted by the calculated scores. The intra-individual standard deviation of the predicted scores was estimated to be about one percent unit. The group results of the JFC test were in agreement with the word recognition results for the aided conditions, but a floor effect was observed for the unaided conditions. Conclusions: Speech intelligibility prediction based on the modified SII is a valid estimate of speech recognition performance of hearing-impaired persons with mild-to-moderate hearing lass. Estimated intelligibility based on the SII is more reliable than actually measured speech recognition performance, for comparing amplification conditions within subjects.

  • 39.
    Mohammadiha, Nasser
    et al.
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Arne, Leijon
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Nonnegative HMM for Babble Noise Derived from Speech HMM: Application to Speech Enhancement2013In: IEEE Transactions on Audio, Speech, and Language Processing, ISSN 1558-7916, E-ISSN 1558-7924, Vol. 21, no 5, p. 998-1011Article in journal (Refereed)
    Abstract [en]

    Deriving a good model for multitalker babble noise can facilitate different speech processing algorithms,e.g. noise reduction, to reduce the so-called cocktail party difficulty. In the available systems, thefact that the babble waveform is generated as a sum of N different speech waveforms is not exploitedexplicitly. In this paper, first we develop a gamma hidden Markov model for power spectra of the speechsignal, and then formulate it as a sparse nonnegative matrix factorization (NMF). Second, the sparse NMFis extended by relaxing the sparsity constraint, and a novel model for babble noise (gamma nonnegativeHMM) is proposed in which the babble basis matrix is the same as the speech basis matrix, and only theactivation factors (weights) of the basis vectors are different for the two signals over time. Finally, a noisereduction algorithm is proposed using the derived speech and babble models. All of the stationary modelparameters are estimated using the expectation-maximization (EM) algorithm, whereas the time-varyingparameters, i.e. the gain parameters of speech and babble signals, are estimated using a recursive EMalgorithm. The objective and subjective listening evaluations show that the proposed babble model andthe final noise reduction algorithm significantly outperform the conventional methods.

    Download full text (pdf)
    fulltext
  • 40.
    Mohammadiha, Nasser
    et al.
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Gerkmann, Timo
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Leijon, Arne
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    A New Approach for Speech Enhancement Based on a Constrained Nonnegative Matrix Factorization2011In: IEEE International Symposium on Intelligent Signal Processing and Communication Systems, ISPACS 2011, IEEE , 2011Conference paper (Refereed)
    Abstract [en]

    In this paper, a new approach is presented for singlechannelspeech enhancement which is based on NonnegativeMatrix Factorization (NMF). The proposed scheme combinesthe noise Power Spectral Density (PSD) estimation based ona constrained NMF and Wiener filtering to enhance the noisyspeech. The imposed constraint is motivated by the time correlationof the underlying observations and enforces the NMF togive smoother estimates of the nonnegative factors. Comparedto the standard NMF approach and Wiener filtering based ona recently developed noise PSD estimator, Source to DistortionRatio (SDR) is

    Download full text (pdf)
    fulltext
  • 41.
    Mohammadiha, Nasser
    et al.
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Gerkmann, Timo
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Leijon, Arne
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    A New Linear MMSE Filter for Single Channel Speech Enhancement Based on Nonnegative Matrix Factorization2011In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2011, IEEE , 2011Conference paper (Refereed)
    Abstract [en]

    In this paper, a linear MMSE filter is derived for single-channelspeech enhancement which is based on Nonnegative Matrix Factorization(NMF). Assuming an additive model for the noisy observation,an estimator is obtained by minimizing the mean square errorbetween the clean speech and the estimated speech components inthe frequency domain. In addition, the noise power spectral density(PSD) is estimated using NMF and the obtained noise PSD is usedin a Wiener filtering framework to enhance the noisy speech. Theresults of the both algorithms are compared to the result of the sameWiener filtering framework in which the noise PSD is estimatedusing a recently developed MMSE-based method. NMF based approachesoutperform the Wiener filter with the MMSE-based noisePSD tracker for different measures. Compared to the NMF-basedWiener filtering approach, Source to Distortion Ratio (SDR) is improvedfor the evaluated noise types for different input SNRs usingthe proposed linear MMSE filter.

    Download full text (pdf)
    fulltext
  • 42.
    Mohammadiha, Nasser
    et al.
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Kleijn, W. Bastiaan
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Leijon, Arne
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Gamma Hidden Markov Model as a Probabilistic Nonnegative Matrix Factorization2013In: 2013 Proceedings of the 21st European Signal Processing Conference (EUSIPCO), European Signal Processing Conference , 2013, p. 6811626-Conference paper (Refereed)
    Abstract [en]

    Among different Nonnegative Matrix Factorization (NMF) approaches, probabilistic NMFs are particularly valuable when dealing with stochastic signals, like speech. In the current literature, little attention has been paid to develop NMF methods that take advantage of the temporal dependencies of data. In this paper, we develop a hidden Markov model (HMM) with a gamma distribution as output density function. Then, we reformulate the gamma HMM as a probabilistic NMF. This shows the analogy of the proposed HMM and NMF, and will lead to a new probabilistic NMF approach in which the temporal dependencies are also captured inherently by the model. Furthermore, we propose an expectation maximization (EM) algorithm to estimate all the model parameters. Compared to the available probabilistic NMFs that model data with Poisson, multinomial, or exponential distributions, the proposed NMF is more suitable to be used with continuous-valued data. Our experiments using speech signals shows that the proposed approach leads to a better compromise between sparsity, goodness of fit, and temporal modeling compared to state-of-the-art.

    Download full text (pdf)
    fulltext
  • 43.
    Mohammadiha, Nasser
    et al.
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Leijon, Arne
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    MODEL ORDER SELECTION FOR NON-NEGATIVE MATRIX FACTORIZATIONWITH APPLICATION TO SPEECH ENHANCEMENT2011Report (Other academic)
    Abstract [en]

    This report deals with the application of non-negative matrixfactorization (NMF) in speech processing. A Bayesian NMFis used to find the optimal number of basis vectors for thespeech signal. The result is validated by performing a speechenhancement task for a set of different number of basis vec-tors. The algorithm performance is measured with the Sourceto Distortion Ratio (SDR) that represents the overall qualityof speech. The results show that for medium input SNRs,60 basis vectors for each speaker are sufficient to model thespeech spectrogram. NMF produced better SDR results thana recently developed version of Spectral Subtraction algo-rithm. The window length was found to have a great effecton the results, but zero padding did not influence the results.

    Download full text (pdf)
    MOS Using BNMF
  • 44.
    Mohammadiha, Nasser
    et al.
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Leijon, Arne
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Nonnegative Matrix Factorization Using Projected Gradient Algorithms with Sparseness Constraints2009In: 2009 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT 2009), NEW YORK: IEEE conference proceedings, 2009, p. 418-423Conference paper (Refereed)
    Abstract [en]

    Recently projected gradient (PG) approaches have found many applications in solving the minimization problems underlying nonnegative matrix factorization (NMF). NMF is a linear representation of data that could lead to sparse result of natural images. To improve the parts-based representation of data some sparseness constraints have been proposed. In this paper the efficiency and execution time of five different PG algorithms and the basic multiplicative algorithm for NMF are compared. The factorization is done for an existing and proposed sparse NMF and the results are compared for all these PG methods. To compare the algorithms the resulted factorizations are used for a hand-written digit classifier

    Download full text (pdf)
    fulltext
  • 45.
    Mohammadiha, Nasser
    et al.
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Martin, Rainer
    Ruhr-University Bochum.
    Leijon, Arne
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Spectral Domain Speech Enhancement Using HMM State-Dependent Super-Gaussian Priors2013In: IEEE Signal Processing Letters, ISSN 1070-9908, E-ISSN 1558-2361, Vol. 20, no 3, p. 253-256Article in journal (Refereed)
    Abstract [en]

    The derivation of MMSE estimators for the DFT coefficients of speech signals, given an observed noisy signal and super-Gaussian prior distributions, has received a lot of interest recently. In this letter, we look at the distribution of the periodogram coefficients of different phonemes, and show that they have a gamma distribution with shape parameters less than one. This verifies that the DFT coefficients for not only the whole speech signal but also for individual phonemes have super-Gaussian distributions. We develop a spectral domain speech enhancement algorithm, and derive hidden Markov model (HMM) based MMSE estimators for speech periodogram coefficients under this gamma assumption in both a high uniform resolution and a reduced-resolution Mel domain. The simulations show that the performance is improved using a gamma distribution compared to the exponential case. Moreover, we show that, even though beneficial in some aspects, the Mel-domain processing does not lead to better results than the algorithms in the high-resolution domain.

    Download full text (pdf)
    fulltext
  • 46.
    Mohammadiha, Nasser
    et al.
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Smaragdis, Paris
    University of Illinois at Urbana-Champaign.
    Arne, Leijon
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Supervised and unsupervised speech enhancement using nonnegative matrix factorization2013In: IEEE Transactions on Audio, Speech, and Language Processing, ISSN 1558-7916, E-ISSN 1558-7924, Vol. 21, no 10, p. 2140-2151Article in journal (Refereed)
    Abstract [en]

    Reducing the interference noise in a monaural noisy speech signal has been a challenging task for many years. Compared to traditional unsupervised speech enhancement methods, e. g., Wiener filtering, supervised approaches, such as algorithms based on hidden Markov models (HMM), lead to higher-quality enhanced speech signals. However, the main practical difficulty of these approaches is that for each noise type a model is required to be trained a priori. In this paper, we investigate a new class of supervised speech denoising algorithms using nonnegative matrix factorization (NMF). We propose a novel speech enhancement method that is based on a Bayesian formulation of NMF (BNMF). To circumvent the mismatch problem between the training and testing stages, we propose two solutions. First, we use an HMM in combination with BNMF (BNMF-HMM) to derive a minimum mean square error (MMSE) estimator for the speech signal with no information about the underlying noise type. Second, we suggest a scheme to learn the required noise BNMF model online, which is then used to develop an unsupervised speech enhancement system. Extensive experiments are carried out to investigate the performance of the proposed methods under different conditions. Moreover, we compare the performance of the developed algorithms with state-of-the-art speech enhancement schemes using various objective measures. Our simulations show that the proposed BNMF-based methods outperform the competing algorithms substantially.

    Download full text (pdf)
    fulltext
  • 47.
    Mohammadiha, Nasser
    et al.
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Smaragdis, Paris
    University of Illinois at Urbana-Champaign.
    Leijon, Arne
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Low-artifact Source Separation Using Probabilistic Latent Component Analysis2013In: 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), IEEE Signal Processing Society, 2013, p. 6701837-Conference paper (Refereed)
    Abstract [en]

    We propose a method based on the probabilistic latent componentanalysis (PLCA) in which we use exponential distributions as priorsto decrease the activity level of a given basis vector. A straightforwardapplication of this method is when we try to extract a desiredsource from a mixture with low artifacts. For this purpose, we proposea maximum a posteriori (MAP) approach to identify the commonbasis vectors between two sources. A low-artifact estimate cannow be obtained by using a constraint such that the common basisvectors in the interfering signal’s dictionary tend to remain inactive.We discuss applications of this method in source separationwith similar-gender speakers and in enhancing a speech signal thatis contaminated with babble noise. Our simulations show that theproposed method not only reduces the artifacts but also increasesthe overall quality of the estimated signal.

    Download full text (pdf)
    fulltext
  • 48.
    Mohammadiha, Nasser
    et al.
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Smaragdis, Paris
    University of Illinois at Urbana-Champaign.
    Leijon, Arne
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Prediction Based Filtering and Smoothing to Exploit Temporal Dependencies in NMF2013In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE Signal Processing Society, 2013, p. 873-877Conference paper (Refereed)
    Abstract [en]

    Nonnegative matrix factorization is an appealing technique for many audio applications. However, in it's basic form it does not use temporal structure, which is an important source of information in speech processing. In this paper, we propose NMF-based filtering and smoothing algorithms that are related to Kalman filtering and smoothing. While our prediction step is similar to that of Kalman filtering, we develop a multiplicative update step which is more convenient for nonnegative data analysis and in line with existing NMF literature. The proposed smoothing approach introduces an unavoidable processing delay, but the filtering algorithm does not and can be readily used for on-line applications. Our experiments using the proposed algorithms show a significant improvement over the baseline NMF approaches. In the case of speech denoising with factory noise at 0 dB input SNR, the smoothing algorithm outperforms NMF with 3.2 dB in SDR and around 0.5 MOS in PESQ, likewise source separation experiments result in improved performance due to taking advantage of the temporal regularities in speech.

    Download full text (pdf)
    fulltext
  • 49.
    Mohammadiha, Nasser
    et al.
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Smaragdis, Paris
    University of Illinois at Urbana-Champaign.
    Leijon, Arne
    KTH, School of Electrical Engineering (EES), Communication Theory.
    Simultaneous Noise Classification and Reduction Using a Priori Learned Models2013In: 2013 IEEE International Workshop on Machine Learning for Signal Processing (MLSP), IEEE Signal Processing Society, 2013, p. 6661951-Conference paper (Refereed)
    Abstract [en]

    Classifying the acoustic environment is an essential part of a practical supervised source separation algorithm where a model is trained for each source offline. In this paper, we present a classification scheme that is combined with a probabilistic nonnegative matrix factorization (NMF) based speech denoising algorithm. We model the acoustic environment with a hidden Markov model (HMM) whose emission distributions are assumed to be of NMF type. We derive a minimum mean square error (MMSE) estimator of clean speech signal in which the state-dependent speech estimators are weighted according to the state posterior probabilities (or probabilities of different noise environments) and are summed. Our experiments show that the proposed method outperforms state-of-the-art substantially and that its performance is very close to an oracle case where the noise type is known in advance.

    Download full text (pdf)
    fulltext
  • 50.
    Mohammadiha, Nasser
    et al.
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Taghia, Jalil
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Leijon, Arne
    KTH, School of Electrical Engineering (EES), Sound and Image Processing.
    Single channel speech enhancement using Bayesian NMF with recursive temporal updates of prior distributions2012In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2012, IEEE conference proceedings, 2012, p. 4561-4564Conference paper (Refereed)
    Abstract [en]

    We present a speech enhancement algorithm which is based on a Bayesian Nonnegative Matrix Factorization (NMF). Both Minimum Mean Square Error (MMSE) and Maximum a-Posteriori (MAP) estimates of the magnitude of the clean speech DFT coefficients are derived. To exploit the temporal continuity of the speech and noise signals, a proper prior distribution is introduced by widening the posterior distribution of the NMF coefficients at the previous time frames. To do so, a recursive temporal update scheme is proposed to obtain the mean value of the prior distribution; also, the uncertainty of the prior information is governed by the shape parameter of the distribution which is learnt automatically based on the nonstationarity of the signals. Simulations show a considerable improvement compared to the maximum likelihood NMF based speech enhancement algorithm for different input SNRs.

    Download full text (pdf)
    fulltext
12 1 - 50 of 79
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf