kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
An instantaneous vector representation of delta pitch for speaker-change prediction in conversational dialogue systems
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.ORCID iD: 0000-0001-9327-9482
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
2008 (English)In: 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, New York: IEEE , 2008, p. 5041-5044Conference paper, Published paper (Refereed)
Abstract [en]

As spoken dialogue systems become deployed in increasingly complex domains, they face rising demands on the naturalness of interaction. We focus on system responsiveness, aiming to mimic human-like dialogue flow control by predicting speaker changes as observed in real human-human conversations. We derive an instantaneous vector representation of pitch variation and show that it isamenable to standard acoustic modeling techniques. Using a small amount of automatically labeled data, we train models which significantly outperform current state-of-the-art pause-only systems, and replicate to within 1% absolute the performance of our previously published hand-crafted baseline. The new system additionally offers scope for run-time control over the precision or recall of locations at which to speak.

Place, publisher, year, edition, pages
New York: IEEE , 2008. p. 5041-5044
Series
International Conference on Acoustics Speech and Signal Processing (ICASSP), ISSN 1520-6149 ; 1-12
Keywords [en]
Frequency domain analysis, Signal representation, Speech communication, Speech processing, User interfaces
National Category
Computer Sciences Language Technology (Computational Linguistics)
Identifiers
URN: urn:nbn:se:kth:diva-51994DOI: 10.1109/ICASSP.2008.4518791ISI: 000257456703244Scopus ID: 2-s2.0-51449093800ISBN: 978-1-4244-1483-3 (print)OAI: oai:DiVA.org:kth-51994DiVA, id: diva2:465288
Conference
2008 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP; Las Vegas, NV; 31 March 2008 through 4 April 2008
Note
tmh_import_11_12_14 QC 20111227Available from: 2011-12-14 Created: 2011-12-14 Last updated: 2022-06-24Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Edlund, Jens

Search in DiVA

By author/editor
Edlund, JensHeldner, Mattias
By organisation
Speech Communication and TechnologyCentre for Speech Technology, CTT
Computer SciencesLanguage Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 269 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf