kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Towards a General, Continuous Model of Turn-taking in Spoken Dialogue using LSTM Recurrent Neural Networks
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.ORCID iD: 0000-0002-8579-1790
2017 (English)In: Proceedings of SIGDIAL 2017 - 18th Annual Meeting of the Special Interest Group on Discourse and Dialogue, Proceedings of the Conference, Saarbrucken, Germany, 2017Conference paper, Published paper (Refereed)
Abstract [en]

Previous models of turn-taking have mostly been trained for specific turn-taking decisions, such as discriminating between turn shifts and turn retention in pauses. In this paper, we present a predictive, continuous model of turn-taking using Long Short-Term Memory (LSTM) Recurrent Neural Networks (RNN). The model is trained on human-human dialogue data to predict upcoming speech activity in a future time window. We show how this general model can be applied to two different tasks that it was not specifically trained for. First, to predict whether a turn-shift will occur or not in pauses, where the model achieves a better performance than human observers, and better than results achieved with more traditional models. Second, to make a prediction at speech onset whether the utterance will be a short backchannel or a longer utterance. Finally, we show how the hidden layer in the network can be used as a feature vector for turn-taking decisions in a human-robot interaction scenario. 

Place, publisher, year, edition, pages
Saarbrucken, Germany, 2017.
National Category
Computer Sciences Natural Language Processing
Identifiers
URN: urn:nbn:se:kth:diva-214443ISI: 000708086400027Scopus ID: 2-s2.0-85054995406OAI: oai:DiVA.org:kth-214443DiVA, id: diva2:1141130
Conference
SIGDIAL 2017 - 18th Annual Meeting of the Special Interest Group on Discourse and Dialogue, Proceedings of the Conference
Note

QC 20220927

Available from: 2017-09-14 Created: 2017-09-14 Last updated: 2025-02-01Bibliographically approved

Open Access in DiVA

fulltext(766 kB)685 downloads
File information
File name FULLTEXT01.pdfFile size 766 kBChecksum SHA-512
eade755477580de4e270daff4f41b9cfee409365a82fc006d6a7d8f49bc8daccaaf92f6ca03d67e8e9b7688b3791fc5fb8f9219527a7750eedd84c0bea87b580
Type fulltextMimetype application/pdf

Other links

ScopusConference

Authority records

Skantze, Gabriel

Search in DiVA

By author/editor
Skantze, Gabriel
By organisation
Speech Communication and Technology
Computer SciencesNatural Language Processing

Search outside of DiVA

GoogleGoogle Scholar
Total: 688 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 263 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf