Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Tracking pitch contours using minimum jerk trajectories
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.ORCID iD: 0000-0002-0397-6442
2011 (English)In: INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association, 2011, 2056-2059 p.Conference paper, Published paper (Refereed)
Abstract [en]

This paper proposes a fundamental frequency tracker, with the specific purpose of comparing the automatic estimates with pitch contours that are sketched by trained phoneticians. The method uses a frequency domain approach to estimate pitch tracks that form minimum jerk trajectories. This method tries to mimic motor movements of the hand made while sketching. When the fundamental frequency tracked by the proposed method on the oral and laryngograph signals were compared using the MOCHA-TIMIT database, the correlation was 0.98 and the root mean squared error was 4.0 Hz, which was slightly better than a state-of-the-art pitch tracking algorithm includedin the ESPS. We also demonstrate how the proposed algorithm could to be applied when comparing with sketches made by phoneticians for the variations in accent II among the Swedish dialects.

Place, publisher, year, edition, pages
2011. 2056-2059 p.
Keyword [en]
pitch tracking, Constant-Q, Swedish accent II
National Category
Computer Science Language Technology (Computational Linguistics)
Identifiers
URN: urn:nbn:se:kth:diva-52192ISI: 000316502201003Scopus ID: 2-s2.0-84865794085ISBN: 978-1-61839-270-1 (print)OAI: oai:DiVA.org:kth-52192DiVA: diva2:465490
Conference
INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association. Florence, Italy. 28-31 August 2011
Note

tmh_import_11_12_14. QC 20111222

Available from: 2011-12-14 Created: 2011-12-14 Last updated: 2014-01-16Bibliographically approved
In thesis
1. Modelling Paralinguistic Conversational Interaction: Towards social awareness in spoken human-machine dialogue
Open this publication in new window or tab >>Modelling Paralinguistic Conversational Interaction: Towards social awareness in spoken human-machine dialogue
2012 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Parallel with the orthographic streams of words in conversation are multiple layered epiphenomena, short in duration and with a communicativepurpose. These paralinguistic events regulate the interaction flow via gaze,gestures and intonation. This thesis focus on how to compute, model, discoverand analyze prosody and it’s applications for spoken dialog systems.Specifically it addresses automatic classification and analysis of conversationalcues related to turn-taking, brief feedback, affective expressions, their crossrelationshipsas well as their cognitive and neurological basis. Techniques areproposed for instantaneous and suprasegmental parameterization of scalarand vector valued representations of fundamental frequency, but also intensity and voice quality. Examples are given for how to engineer supervised learned automata’s for off-line processing of conversational corpora as well as for incremental on-line processing with low-latency constraints suitable as detector modules in a responsive social interface. Specific attention is given to the communicative functions of vocal feedback like "mhm", "okay" and "yeah, that’s right" as postulated by the theories of grounding, emotion and a survey on laymen opinions. The potential functions and their prosodic cues are investigated via automatic decoding, data-mining, exploratory visualization and descriptive measurements.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2012. xiv, 86 p.
Series
Trita-CSC-A, ISSN 1653-5723 ; 2012:08
National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-102335 (URN)978-91-7501-467-8 (ISBN)
Public defence
2012-09-28, Sal F3, Lindstedtsvägen 26, KTH, Stockholm, 13:00 (English)
Opponent
Supervisors
Note

QC 20120914

Available from: 2012-09-14 Created: 2012-09-14 Last updated: 2012-09-14Bibliographically approved

Open Access in DiVA

No full text

Scopus

Authority records BETA

Gustafson, Joakim

Search in DiVA

By author/editor
Neiberg, DanielAnanthakrishnan, GopalGustafson, Joakim
By organisation
Speech Communication and Technology
Computer ScienceLanguage Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 70 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf