An instantaneous vector representation of delta pitch for speaker-change prediction in conversational dialogue systems
2008 (English)In: 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, New York: IEEE , 2008, 5041-5044 p.Conference paper (Refereed)
As spoken dialogue systems become deployed in increasingly complex domains, they face rising demands on the naturalness of interaction. We focus on system responsiveness, aiming to mimic human-like dialogue flow control by predicting speaker changes as observed in real human-human conversations. We derive an instantaneous vector representation of pitch variation and show that it isamenable to standard acoustic modeling techniques. Using a small amount of automatically labeled data, we train models which significantly outperform current state-of-the-art pause-only systems, and replicate to within 1% absolute the performance of our previously published hand-crafted baseline. The new system additionally offers scope for run-time control over the precision or recall of locations at which to speak.
Place, publisher, year, edition, pages
New York: IEEE , 2008. 5041-5044 p.
, International Conference on Acoustics Speech and Signal Processing (ICASSP), ISSN 1520-6149 ; 1-12
Frequency domain analysis, Signal representation, Speech communication, Speech processing, User interfaces
Computer Science Language Technology (Computational Linguistics)
IdentifiersURN: urn:nbn:se:kth:diva-51994DOI: 10.1109/ICASSP.2008.4518791ISI: 000257456703244ScopusID: 2-s2.0-51449093800ISBN: 978-1-4244-1483-3OAI: oai:DiVA.org:kth-51994DiVA: diva2:465288
2008 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP; Las Vegas, NV; 31 March 2008 through 4 April 2008
tmh_import_11_12_14 QC 201112272011-12-142011-12-142011-12-27Bibliographically approved