Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A canonical representation of speech
KTH, School of Electrical Engineering (EES), Sound and Image Processing.
KTH, School of Electrical Engineering (EES).
KTH, School of Electrical Engineering (EES).
KTH, School of Electrical Engineering (EES), Sound and Image Processing.
2007 (English)In: 2007 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol IV, Pts 1-3, 2007, 849-852 p.Conference paper, Published paper (Refereed)
Abstract [en]

It is well known that usage of an appropriate representation of the speech signal improves the performance of speech coders, recognizers, and synthesizers. In this paper we present a representation of speech that has the efficiency, in terms of being compact, similar to that of parametric modeling, but additionally has the completeness property of signal expansions. The resulting canonical representation of speech is suited for a wide range of speech processing applications and we demonstrate this through experiments related to coding and prosodic modification.

Place, publisher, year, edition, pages
2007. 849-852 p.
Series
International Conference on Acoustics Speech and Signal Processing (ICASSP), ISSN 1520-6149
Keyword [en]
speech representation, perfect reconstruction, frame theory, energy concentration, best basis selection
National Category
Other Engineering and Technologies
Identifiers
URN: urn:nbn:se:kth:diva-9086DOI: 10.1109/ICASSP.2007.367046ISI: 000248909200213Scopus ID: 2-s2.0-34547517485OAI: oai:DiVA.org:kth-9086DiVA: diva2:14646
Note
QC 20100914Available from: 2006-02-10 Created: 2006-02-10 Last updated: 2010-09-14Bibliographically approved
In thesis
1. On Prosodic Modification of Speech
Open this publication in new window or tab >>On Prosodic Modification of Speech
2006 (English)Licentiate thesis, comprehensive summary (Other scientific)
Abstract [en]

Prosodic modification has become of major theoretical and practical interest in the field of speech processing research over the last decades. Algorithms for time and pitch scaling are used both for speech modification and for speech synthesis. The thesis consists of an introduction providing an overview and discussion of existing techniques for time and pitch scaling and of three research papers in this area.

In paper A a system for time synchronization of speech is presented. It performs an alignment of two utterances of the same sentence, where one of the utterances is modified in time scale so as to be synchronized with the other utterance. The system is based on Dynamic Time Warping (DTW) and the Waveform Similarity Overlap and Add (WSOLA) method, a technique for time scaling of speech signals. Paper B and C complement each other and present a novel speech representation system that facilitates both time and pitch scaling of speech signals. Paper A describes a method to warp a signal with time-varying pitch to a signal with constant pitch. For this an accurate continuous pitch track is needed. The continuous pitch track is described as a B-spline expansion with coefficients that are selected to maximize a periodicity criterion. The warping to a constant pitch corresponds to the first stage of the system presented in paper C, which describes a two-stage transform that exploits long-term periodicity to obtain a sparse representation of speech. The new system facilitates a decomposition into a voiced and unvoiced component.

Place, publisher, year, edition, pages
Stockholm: KTH, 2006. ix, 38 p.
Series
Trita-EE, ISSN 1653-5146 ; 2006:002
Identifiers
urn:nbn:se:kth:diva-621 (URN)91-7178-267-2 (ISBN)
Presentation
2006-02-20, seminarierum S3, 3 tr, Osquldas 10, Stockholm, 09:30
Opponent
Supervisors
Note
QC 20101123Available from: 2006-02-10 Created: 2006-02-10 Last updated: 2010-11-23Bibliographically approved
2. Entropy and Speech
Open this publication in new window or tab >>Entropy and Speech
2006 (English)Doctoral thesis, comprehensive summary (Other scientific)
Abstract [en]

In this thesis, we study the representation of speech signals and the estimation of information-theoretical measures from observations containing features of the speech signal. The main body of the thesis consists of four research papers.

Paper A presents a compact representation of the speech signal that facilitates perfect reconstruction. The representation is constituted of models, model parameters, and signal coefficients. A difference compared to existing speech representations is that we seek a compact representation by adapting the models to maximally concentrate the energy of the signal coefficients according to a selected energy concentration criterion. The individual parts of the representation are closely related to speech signal properties such as spectral envelope, pitch, and voiced/unvoiced signal coefficients, bene cial for both speech coding and modi cation.

From the information-theoretical measure of entropy, performance limits in coding and classi cation can be derived. Papers B and C discuss the estimation of di erential entropy. Paper B describes a method for estimation of the di erential entropies in the case when the set of vector observations (from the representation) lie on a lower-dimensional surface (manifold) in the embedding space. In contrast to the method presented in Paper B, Paper C introduces a method where the manifold structures are destroyed by constraining the resolution of the observation space. This facilitates the estimation of bounds on classi cation error rates even when the manifolds are of varying dimensionality within the embedding space.

Finally, Paper D investigates the amount of shared information between spectral features of narrow-band (0.3-3.4 kHz) and high-band (3.4-8 kHz) speech. The results in Paper D indicate that the information shared between the high-band and the narrow-band is insufficient for high-quality wideband speech coding (0.3-8 kHz) without transmission of extra information describing the high-band.

Place, publisher, year, edition, pages
Stockholm: KTH, 2006. xii, 38 p.
Series
Trita-EE, ISSN 1653-5146 ; 2006:014
Keyword
speech representation, energy concentration, entropy estimation, manifolds
National Category
Fluid Mechanics and Acoustics
Identifiers
urn:nbn:se:kth:diva-3990 (URN)91-628-6861-6 (ISBN)
Public defence
2006-06-08, D3, Lindstedtsvägen 5, Stockholm, 14:00
Opponent
Supervisors
Note
QC 20100914Available from: 2006-05-23 Created: 2006-05-23 Last updated: 2010-09-14Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textScopus

Search in DiVA

By author/editor
Nilsson, MattiasResch, BarbaraKim, Moo-YoungKleijn, Bastiaan
By organisation
Sound and Image ProcessingSchool of Electrical Engineering (EES)
Other Engineering and Technologies

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 166 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf