Online Detection Of Vocal Listener Responses With Maximum Latency Constraints
2011 (English)In: Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on, Prague, Czech Republic, 2011, 5836-5839 p.Conference paper (Refereed)
When human listeners utter Listener Responses (e.g. back-channels or acknowledgments) such as `yeah' and `mmhmm', interlocutors commonly continue to speak or resume their speech even before the listener has finished his/her response. This type of speech interactivity results in frequent speech overlap which is common in human-human conversation. To allow for this type of speech interactivity to occur between humans and spoken dialog systems, which will result in more human-like continuous and smoother human-machine interaction, we propose an on-line classifier which can classify incoming speech as Listener Responses. We show that it is possible to detect vocal Listener Responses using maximum latency thresholds of 100-500 ms, thereby obtaining equal error rates ranging from 34% to 28% by using an energy based voice activity detector.
Place, publisher, year, edition, pages
Prague, Czech Republic, 2011. 5836-5839 p.
, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, ISSN 1520-6149
speech analysis, Speech processing
Computer Science Language Technology (Computational Linguistics)
IdentifiersURN: urn:nbn:se:kth:diva-52177DOI: 10.1109/ICASSP.2011.5947688ISI: 000296062406136ScopusID: 2-s2.0-80051612462ISBN: 978-145770539-7OAI: oai:DiVA.org:kth-52177DiVA: diva2:465472
36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011; Prague; 22 May 2011 through 27 May 2011
tmh_import_11_12_14 QC 201112192011-12-142011-12-142011-12-28Bibliographically approved