Syllabification of conversational speech using bidirectional long-short-term memory neural networks
2011 (English)In: Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on, Prague, Czech Republic, 2011, 5256-5259 p.Conference paper (Refereed)
Segmentation of speech signals is a crucial task in many types of speech analysis. We present a novel approach at segmentation on a syllable level, using a Bidirectional Long-Short-Term Memory Neural Network. It performs estimation of syllable nucleus positions based on regression of perceptually motivated input features to a smooth target function. Peak selection is performed to attain valid nuclei positions. Performance of the model is evaluated on the levels of both syllables and the vowel segments making up the syllable nuclei. The general applicability of the approach is illustrated by good results for two common databases - Switchboard and TIMIT - for both read and spontaneous speech, and a favourable comparison with other published results.
Place, publisher, year, edition, pages
Prague, Czech Republic, 2011. 5256-5259 p.
, International Conference on Acoustics Speech and Signal Processing ICASSP, ISSN 1520-6149
Computer Science Language Technology (Computational Linguistics)
IdentifiersURN: urn:nbn:se:kth:diva-52175DOI: 10.1109/ICASSP.2011.5947543ISI: 000296062405216ScopusID: 2-s2.0-80051628297OAI: oai:DiVA.org:kth-52175DiVA: diva2:465470
Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on
tmh_import_11_12_14. QC 201112282011-12-142011-12-142011-12-28Bibliographically approved