Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Entropy and Speech
KTH, School of Electrical Engineering (EES).
2006 (English)Doctoral thesis, comprehensive summary (Other scientific)
Abstract [en]

In this thesis, we study the representation of speech signals and the estimation of information-theoretical measures from observations containing features of the speech signal. The main body of the thesis consists of four research papers.

Paper A presents a compact representation of the speech signal that facilitates perfect reconstruction. The representation is constituted of models, model parameters, and signal coefficients. A difference compared to existing speech representations is that we seek a compact representation by adapting the models to maximally concentrate the energy of the signal coefficients according to a selected energy concentration criterion. The individual parts of the representation are closely related to speech signal properties such as spectral envelope, pitch, and voiced/unvoiced signal coefficients, bene cial for both speech coding and modi cation.

From the information-theoretical measure of entropy, performance limits in coding and classi cation can be derived. Papers B and C discuss the estimation of di erential entropy. Paper B describes a method for estimation of the di erential entropies in the case when the set of vector observations (from the representation) lie on a lower-dimensional surface (manifold) in the embedding space. In contrast to the method presented in Paper B, Paper C introduces a method where the manifold structures are destroyed by constraining the resolution of the observation space. This facilitates the estimation of bounds on classi cation error rates even when the manifolds are of varying dimensionality within the embedding space.

Finally, Paper D investigates the amount of shared information between spectral features of narrow-band (0.3-3.4 kHz) and high-band (3.4-8 kHz) speech. The results in Paper D indicate that the information shared between the high-band and the narrow-band is insufficient for high-quality wideband speech coding (0.3-8 kHz) without transmission of extra information describing the high-band.

Place, publisher, year, edition, pages
Stockholm: KTH , 2006. , xii, 38 p.
Series
Trita-EE, ISSN 1653-5146 ; 2006:014
Keyword [en]
speech representation, energy concentration, entropy estimation, manifolds
National Category
Fluid Mechanics and Acoustics
Identifiers
URN: urn:nbn:se:kth:diva-3990ISBN: 91-628-6861-6 (print)OAI: oai:DiVA.org:kth-3990DiVA: diva2:10285
Public defence
2006-06-08, D3, Lindstedtsvägen 5, Stockholm, 14:00
Opponent
Supervisors
Note
QC 20100914Available from: 2006-05-23 Created: 2006-05-23 Last updated: 2010-09-14Bibliographically approved
List of papers
1. A canonical representation of speech
Open this publication in new window or tab >>A canonical representation of speech
2007 (English)In: 2007 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol IV, Pts 1-3, 2007, 849-852 p.Conference paper, Published paper (Refereed)
Abstract [en]

It is well known that usage of an appropriate representation of the speech signal improves the performance of speech coders, recognizers, and synthesizers. In this paper we present a representation of speech that has the efficiency, in terms of being compact, similar to that of parametric modeling, but additionally has the completeness property of signal expansions. The resulting canonical representation of speech is suited for a wide range of speech processing applications and we demonstrate this through experiments related to coding and prosodic modification.

Series
International Conference on Acoustics Speech and Signal Processing (ICASSP), ISSN 1520-6149
Keyword
speech representation, perfect reconstruction, frame theory, energy concentration, best basis selection
National Category
Other Engineering and Technologies
Identifiers
urn:nbn:se:kth:diva-9086 (URN)10.1109/ICASSP.2007.367046 (DOI)000248909200213 ()2-s2.0-34547517485 (Scopus ID)
Note
QC 20100914Available from: 2006-02-10 Created: 2006-02-10 Last updated: 2010-09-14Bibliographically approved
2. On the Estimation of Differential Entropy from Data Located on Embedded Manifolds
Open this publication in new window or tab >>On the Estimation of Differential Entropy from Data Located on Embedded Manifolds
2007 (English)In: IEEE Transactions on Information Theory, ISSN 0018-9448, E-ISSN 1557-9654, Vol. 53, no 7, 2330-2341 p.Article in journal (Refereed) Published
Abstract [en]

Estimation of the differential entropy from observations of a random variable is of great importance for a wide range of signal processing applications such as source coding, pattern recognition, hypothesis testing, and blind source separation. In this paper, we present a method for estimation of the Shannon differential entropy that accounts for embedded manifolds. The method is based on high-rate quantization theory and forms an extension of the classical nearest-neighbor entropy estimator. The estimator is consistent in the mean square sense and an upper bound on the rate of convergence of the estimator is given. Because of the close connection between compression and Shannon entropy, the proposed method has an advantage over methods estimating the Renyi entropy. Through experiments on uniformly distributed data on known manifolds and real-world speech data we show the accuracy and usefulness of our proposed method.

Keyword
convergence rate, manifolds, nearest-neighbor distance, Shannon differential entropy
National Category
Fluid Mechanics and Acoustics
Identifiers
urn:nbn:se:kth:diva-5787 (URN)10.1109/TIT.2007.899533 (DOI)000247606300002 ()2-s2.0-34447316139 (Scopus ID)
Note
QC 20100914Available from: 2006-05-23 Created: 2006-05-23 Last updated: 2017-12-14Bibliographically approved
3. Intrinsic Dimensionality and its Implication for Performance Prediction in Pattern Classification
Open this publication in new window or tab >>Intrinsic Dimensionality and its Implication for Performance Prediction in Pattern Classification
2006 (English)In: IEEE Transactions on Pattern Analysis and Machine Intelligence, ISSN 0162-8828Article in journal (Other academic) Submitted
Place, publisher, year, edition, pages
IEEE, 2006
National Category
Computer and Information Science
Identifiers
urn:nbn:se:kth:diva-5788 (URN)
Note

QS 20120316

Available from: 2006-05-23 Created: 2006-05-23 Last updated: 2016-12-06Bibliographically approved
4. Gaussian Mixture Model based Mutual Information Estimation between Frequency Bands in Speech
Open this publication in new window or tab >>Gaussian Mixture Model based Mutual Information Estimation between Frequency Bands in Speech
2002 (English)In: 2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, 525-528 p.Conference paper, Published paper (Refereed)
Abstract [en]

In this paper, we investigate the dependency between the spectral envelopes of speech in disjoint frequency bands, one covering the telephone bandwidth from 0.3 kHz to 3.4 kHz and one covering the frequencies from 3.7 kHz to 8 kHz. The spectral envelopes are jointly modeled with a Gaussian mixture model based on mel-frequency cepstral coefficients and the log-energy-ratio of the disjoint frequency bands. Using this model, we quantify the dependency between bands through their mutual information and the perceived entropy of the high frequency band. Our results indicate that the mutual information is only a small fraction of the perceived entropy of the high band. This suggests that speech bandwidth extension should not rely only on mutual information between narrow- and high-band spectra. Rather, such methods need to make use of perceptual properties to ensure that the extended signal sounds pleasant.

Series
International Conference on Acoustics Speech and Signal Processing (ICASSP), ISSN 1520-6149
Keyword
Bandwidth, Estimation, Mathematical models, Speech intelligibility, Speech transmission
National Category
Fluid Mechanics and Acoustics
Identifiers
urn:nbn:se:kth:diva-5789 (URN)000177510400132 ()0-7803-7402-9 (ISBN)
Conference
IEEE International Conference on Acoustics, Speech, and Signal Processing ORLANDO, FL, MAY 13-17, 2002
Note
QC 20100914Available from: 2006-05-23 Created: 2006-05-23 Last updated: 2010-09-14Bibliographically approved

Open Access in DiVA

fulltext(659 kB)2755 downloads
File information
File name FULLTEXT01.pdfFile size 659 kBChecksum MD5
e9ab8862f6e42efb8fc6ce878974f236627709d6f56ef4e8750fd6d1f2c499a3b4357c64
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Nilsson, Mattias
By organisation
School of Electrical Engineering (EES)
Fluid Mechanics and Acoustics

Search outside of DiVA

GoogleGoogle Scholar
Total: 2755 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 2435 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf