An instantaneous vector representation of delta pitch for speaker-change prediction in conversational dialogue systems
2008 (English)In: 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, New York: IEEE , 2008, p. 5041-5044Conference paper, Published paper (Refereed)
Abstract [en]
As spoken dialogue systems become deployed in increasingly complex domains, they face rising demands on the naturalness of interaction. We focus on system responsiveness, aiming to mimic human-like dialogue flow control by predicting speaker changes as observed in real human-human conversations. We derive an instantaneous vector representation of pitch variation and show that it isamenable to standard acoustic modeling techniques. Using a small amount of automatically labeled data, we train models which significantly outperform current state-of-the-art pause-only systems, and replicate to within 1% absolute the performance of our previously published hand-crafted baseline. The new system additionally offers scope for run-time control over the precision or recall of locations at which to speak.
Place, publisher, year, edition, pages
New York: IEEE , 2008. p. 5041-5044
Series
International Conference on Acoustics Speech and Signal Processing (ICASSP), ISSN 1520-6149 ; 1-12
Keywords [en]
Frequency domain analysis, Signal representation, Speech communication, Speech processing, User interfaces
National Category
Computer Sciences Language Technology (Computational Linguistics)
Identifiers
URN: urn:nbn:se:kth:diva-51994DOI: 10.1109/ICASSP.2008.4518791ISI: 000257456703244Scopus ID: 2-s2.0-51449093800ISBN: 978-1-4244-1483-3 (print)OAI: oai:DiVA.org:kth-51994DiVA, id: diva2:465288
Conference
2008 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP; Las Vegas, NV; 31 March 2008 through 4 April 2008
Note
tmh_import_11_12_14 QC 20111227
2011-12-142011-12-142022-06-24Bibliographically approved