Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
On Prosodic Modification of Speech
KTH, School of Electrical Engineering (EES).
2006 (English)Licentiate thesis, comprehensive summary (Other scientific)
Abstract [en]

Prosodic modification has become of major theoretical and practical interest in the field of speech processing research over the last decades. Algorithms for time and pitch scaling are used both for speech modification and for speech synthesis. The thesis consists of an introduction providing an overview and discussion of existing techniques for time and pitch scaling and of three research papers in this area.

In paper A a system for time synchronization of speech is presented. It performs an alignment of two utterances of the same sentence, where one of the utterances is modified in time scale so as to be synchronized with the other utterance. The system is based on Dynamic Time Warping (DTW) and the Waveform Similarity Overlap and Add (WSOLA) method, a technique for time scaling of speech signals. Paper B and C complement each other and present a novel speech representation system that facilitates both time and pitch scaling of speech signals. Paper A describes a method to warp a signal with time-varying pitch to a signal with constant pitch. For this an accurate continuous pitch track is needed. The continuous pitch track is described as a B-spline expansion with coefficients that are selected to maximize a periodicity criterion. The warping to a constant pitch corresponds to the first stage of the system presented in paper C, which describes a two-stage transform that exploits long-term periodicity to obtain a sparse representation of speech. The new system facilitates a decomposition into a voiced and unvoiced component.

Place, publisher, year, edition, pages
Stockholm: KTH , 2006. , ix, 38 p.
Series
Trita-EE, ISSN 1653-5146 ; 2006:002
Identifiers
URN: urn:nbn:se:kth:diva-621ISBN: 91-7178-267-2 (print)OAI: oai:DiVA.org:kth-621DiVA: diva2:14647
Presentation
2006-02-20, seminarierum S3, 3 tr, Osquldas 10, Stockholm, 09:30
Opponent
Supervisors
Note
QC 20101123Available from: 2006-02-10 Created: 2006-02-10 Last updated: 2010-11-23Bibliographically approved
List of papers
1. Time synchronization of speech
Open this publication in new window or tab >>Time synchronization of speech
2003 (English)In: Models and Analysis Of Vocal Emissions for Biomedical Applications:   3rd International Workshop / [ed] Claudia Manfredi, Firenze University Press 2003 , 2003, 215-218 p.Conference paper, Published paper (Refereed)
Abstract [en]

A time synchronization system is a helpful tool for different applications, such as language education and speech therapy. We present a system that performs temporal alignment of two utterances of the same phrase. The system consists of two parts. In the first part the time warping function is determined with Dynamic Time Warping (DTW). In the second part the time scale of one utterance is modified according to the time warping function. To obtain good performance, the dynamic time warping algorithm required significant modifications. Our listening test confirms that our time synchronization system has high precision and the resulting speech utterances are of natural quality.

 

Place, publisher, year, edition, pages
Firenze University Press 2003, 2003
Keyword
Time Synchronization, Time Scale Modification, DTW, WSOLA
National Category
Engineering and Technology
Identifiers
urn:nbn:se:kth:diva-9084 (URN)88-8453-154-3 (ISBN)
Conference
3rd International Workshop MAVEBA 2003, December 10-12, 2003, Firenze, Italy
Note
QC 20101123Available from: 2006-02-10 Created: 2006-02-10 Last updated: 2010-11-23Bibliographically approved
2. Estimation of the instantaneous pitch in speech
Open this publication in new window or tab >>Estimation of the instantaneous pitch in speech
2007 (English)In: IEEE Transactions on Audio, Speech and Language Processing, ISSN 1558-7916, Vol. 15, no 3, 813-822 p.Article in journal (Refereed) Published
Abstract [en]

 An accurate estimation of the pitch is essential for many speech processing applications, such as speech synthesis, speech coding, and speech enhancement. A widely used assumption in most common pitch estimation methods is that pitch is constant over a segment of short duration. This assumption does not apply in reality and leads to inaccurate pitch estimates. In this paper, we present a method for continuous pitch estimation that is able to track fast changes. In the presented framework, the pitch is modeled by a B-spline expansion and optimized in a multistage procedure for increased robustness. The performance of the continuous optimization procedure is compared to state-of-the-art pitch estimation methods and is evaluated both for artificial speech-like signals with known pitch, and for real speech signals. The results of the experiments show that our method leads to a higher accuracy of the estimate of the pitch than state-of-the-art methods.

Keyword
instantaneous pitch, pitch estimation, pitch-synchronous processing, splines
National Category
Engineering and Technology
Identifiers
urn:nbn:se:kth:diva-9085 (URN)10.1109/TASL.2006.885242 (DOI)000244318600007 ()2-s2.0-37649002185 (Scopus ID)
Note
QC 20100914. Uppdaterad från Submitted till Published (20100914)Available from: 2006-02-10 Created: 2006-02-10 Last updated: 2010-09-14Bibliographically approved
3. A canonical representation of speech
Open this publication in new window or tab >>A canonical representation of speech
2007 (English)In: 2007 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol IV, Pts 1-3, 2007, 849-852 p.Conference paper, Published paper (Refereed)
Abstract [en]

It is well known that usage of an appropriate representation of the speech signal improves the performance of speech coders, recognizers, and synthesizers. In this paper we present a representation of speech that has the efficiency, in terms of being compact, similar to that of parametric modeling, but additionally has the completeness property of signal expansions. The resulting canonical representation of speech is suited for a wide range of speech processing applications and we demonstrate this through experiments related to coding and prosodic modification.

Series
International Conference on Acoustics Speech and Signal Processing (ICASSP), ISSN 1520-6149
Keyword
speech representation, perfect reconstruction, frame theory, energy concentration, best basis selection
National Category
Other Engineering and Technologies
Identifiers
urn:nbn:se:kth:diva-9086 (URN)10.1109/ICASSP.2007.367046 (DOI)000248909200213 ()2-s2.0-34547517485 (Scopus ID)
Note
QC 20100914Available from: 2006-02-10 Created: 2006-02-10 Last updated: 2010-09-14Bibliographically approved

Open Access in DiVA

fulltext(531 kB)1352 downloads
File information
File name FULLTEXT01.pdfFile size 531 kBChecksum MD5
527089b766f0dc966e56f25463d755dbae4b5690bf9f86f2fc4ff91973aee652afbdf60c
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Resch, Barbara
By organisation
School of Electrical Engineering (EES)

Search outside of DiVA

GoogleGoogle Scholar
Total: 1352 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 511 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf