kth.sePublications
Change search
Link to record
Permanent link

Direct link
Bollepalli, Bajibabu
Publications (8 of 8) Show all publications
Bollepalli, B., Urbain, J., Raitio, T., Gustafson, J. & Cakmak, H. (2014). A COMPARATIVE EVALUATION OF VOCODING TECHNIQUES FOR HMM-BASED LAUGHTER SYNTHESIS. In: : . Paper presented at IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), MAY 04-09, 2014, Florence, ITALY (pp. 255-259).
Open this publication in new window or tab >>A COMPARATIVE EVALUATION OF VOCODING TECHNIQUES FOR HMM-BASED LAUGHTER SYNTHESIS
Show others...
2014 (English)Conference paper, Published paper (Refereed)
Abstract [en]

This paper presents an experimental comparison of various leading vocoders for the application of HMM-based laughter synthesis. Four vocoders, commonly used in HMM-based speech synthesis, are used in copy-synthesis and HMM-based synthesis of both male and female laughter. Subjective evaluations are conducted to assess the performance of the vocoders. The results show that all vocoders perform relatively well in copy-synthesis. In HMM-based laughter synthesis using original phonetic transcriptions, all synthesized laughter voices were significantly lower in quality than copy-synthesis, indicating a challenging task and room for improvements. Interestingly, two vocoders using rather simple and robust excitation modeling performed the best, indicating that robustness in speech parameter extraction and simple parameter representation in statistical modeling are key factors in successful laughter synthesis.

Series
International Conference on Acoustics Speech and Signal Processing ICASSP, ISSN 1520-6149
Keywords
Laughter synthesis, vocoder, mel-cepstrum, STRAIGHT, DSM, GlottHMM, HTS, HMM
National Category
Fluid Mechanics
Identifiers
urn:nbn:se:kth:diva-158336 (URN)10.1109/ICASSP.2014.6853597 (DOI)000343655300052 ()2-s2.0-84905269196 (Scopus ID)978-1-4799-2893-4 (ISBN)978-147992892-7 (ISBN)
Conference
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), MAY 04-09, 2014, Florence, ITALY
Note

QC 20150123

Available from: 2015-01-23 Created: 2015-01-07 Last updated: 2025-02-09Bibliographically approved
Bollepalli, B. & Raito, T. (2014). Effect of MPEG audio compression on vocoders used in statistical parametric speech synthesis. In: 2014 Proceedings of the 22nd European Signal Processing Conference (EUSIPCO): . Paper presented at 22nd European Signal Processing Conference, EUSIPCO 2014, 1 September 2014 through 5 September 2014, Lisbon; Portugal (pp. 1237-1241). European Signal Processing Conference, EUSIPCO
Open this publication in new window or tab >>Effect of MPEG audio compression on vocoders used in statistical parametric speech synthesis
2014 (English)In: 2014 Proceedings of the 22nd European Signal Processing Conference (EUSIPCO), European Signal Processing Conference, EUSIPCO , 2014, p. 1237-1241Conference paper, Published paper (Refereed)
Abstract [en]

This paper investigates the effect of MPEG audio compression on HMM-based speech synthesis using two state-of-the-art vocoders. Speech signals are first encoded with various compression rates and analyzed using the GlottHMM and STRAIGHT vocoders. Objective evaluation results show that the parameters of both vocoders gradually degrade with increasing compression rates, but with a clear increase in degradation with bit-rates of 32 kbit/s or less. Experiments with HMM-based synthesis with the two vocoders show that the degradation in quality is already perceptible with bit-rates of 32 kbit/s and both vocoders show similar trend in degradation with respect to compression ratio. The most perceptible artefacts induced by the compression are spectral distortion and reduced bandwidth, while prosody is better preserved.

Place, publisher, year, edition, pages
European Signal Processing Conference, EUSIPCO, 2014
Series
European Signal Processing Conference, ISSN 2219-5491
Keywords
GlottHMM, HMM, MP3, MPEG, Statistical parametric speech synthesis, STRAIGHT
National Category
Signal Processing
Identifiers
urn:nbn:se:kth:diva-157960 (URN)000393420200249 ()2-s2.0-84911897440 (Scopus ID)9780992862619 (ISBN)
Conference
22nd European Signal Processing Conference, EUSIPCO 2014, 1 September 2014 through 5 September 2014, Lisbon; Portugal
Note

QC 20141219

Available from: 2014-12-19 Created: 2014-12-18 Last updated: 2024-03-18Bibliographically approved
Al Moubayed, S., Beskow, J., Bollepalli, B., Gustafson, J., Hussen-Abdelaziz, A., Johansson, M., . . . Varol, G. (2014). Human-robot Collaborative Tutoring Using Multiparty Multimodal Spoken Dialogue. In: : . Paper presented at 9th Annual ACM/IEEE International Conference on Human-Robot Interaction, Bielefeld, Germany. IEEE conference proceedings
Open this publication in new window or tab >>Human-robot Collaborative Tutoring Using Multiparty Multimodal Spoken Dialogue
Show others...
2014 (English)Conference paper, Published paper (Refereed)
Abstract [en]

In this paper, we describe a project that explores a novel experi-mental setup towards building a spoken, multi-modally rich, and human-like multiparty tutoring robot. A human-robotinteraction setup is designed, and a human-human dialogue corpus is collect-ed. The corpus targets the development of a dialogue system platform to study verbal and nonverbaltutoring strategies in mul-tiparty spoken interactions with robots which are capable of spo-ken dialogue. The dialogue task is centered on two participants involved in a dialogueaiming to solve a card-ordering game. Along with the participants sits a tutor (robot) that helps the par-ticipants perform the task, and organizes and balances their inter-action. Differentmultimodal signals captured and auto-synchronized by different audio-visual capture technologies, such as a microphone array, Kinects, and video cameras, were coupled with manual annotations. These are used build a situated model of the interaction based on the participants personalities, their state of attention, their conversational engagement and verbal domi-nance, and how that is correlated with the verbal and visual feed-back, turn-management, and conversation regulatory actions gen-erated by the tutor. Driven by the analysis of the corpus, we will show also the detailed design methodologies for an affective, and multimodally rich dialogue system that allows the robot to meas-ure incrementally the attention states, and the dominance for each participant, allowing the robot head Furhat to maintain a well-coordinated, balanced, and engaging conversation, that attempts to maximize the agreement and the contribution to solve the task. This project sets the first steps to explore the potential of us-ing multimodal dialogue systems to build interactive robots that can serve in educational, team building, and collaborative task solving applications.

Place, publisher, year, edition, pages
IEEE conference proceedings, 2014
Keywords
Furhat robot; Human-robot collaboration; Human-robot interaction; Multiparty interaction; Spoken dialog
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-145511 (URN)10.1145/2559636.2563681 (DOI)000455229400029 ()2-s2.0-84896934381 (Scopus ID)
Conference
9th Annual ACM/IEEE International Conference on Human-Robot Interaction, Bielefeld, Germany
Note

QC 20161018

Available from: 2014-05-21 Created: 2014-05-21 Last updated: 2024-03-15Bibliographically approved
Koutsombogera, M., Al Moubayed, S., Bollepalli, B., Abdelaziz, A. H., Johansson, M., Aguas Lopes, J. D., . . . Varol, G. (2014). The Tutorbot Corpus - A Corpus for Studying Tutoring Behaviour in Multiparty Face-to-Face Spoken Dialogue. In: : . Paper presented at 9th International Conference on Language Resources and Evaluation, Reykjavik, Iceland. EUROPEAN LANGUAGE RESOURCES ASSOC-ELRA
Open this publication in new window or tab >>The Tutorbot Corpus - A Corpus for Studying Tutoring Behaviour in Multiparty Face-to-Face Spoken Dialogue
Show others...
2014 (English)Conference paper, Published paper (Refereed)
Abstract [en]

This paper describes a novel experimental setup exploiting state-of-the-art capture equipment to collect a multimodally rich game-solving collaborative multiparty dialogue corpus. The corpus is targeted and designed towards the development of a dialogue system platform to explore verbal and nonverbal tutoring strategies in multiparty spoken interactions. The dialogue task is centered on two participants involved in a dialogue aiming to solve a card-ordering game. The participants were paired into teams based on their degree of extraversion as resulted from a personality test. With the participants sits a tutor that helps them perform the task, organizes and balances their interaction and whose behavior was assessed by the participants after each interaction. Different multimodal signals captured and auto-synchronized by different audio-visual capture technologies, together with manual annotations of the tutor’s behavior constitute the Tutorbot corpus. This corpus is exploited to build a situated model of the interaction based on the participants’ temporally-changing state of attention, their conversational engagement and verbal dominance, and their correlation with the verbal and visual feedback and conversation regulatory actions generated by the tutor.

Place, publisher, year, edition, pages
EUROPEAN LANGUAGE RESOURCES ASSOC-ELRA, 2014
Keywords
Multimodal corpus; Multiparty Interaction; Tutor
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-173469 (URN)000355611005138 ()2-s2.0-84990228583 (Scopus ID)
Conference
9th International Conference on Language Resources and Evaluation, Reykjavik, Iceland
Note

QC 20161017

Available from: 2015-09-15 Created: 2015-09-11 Last updated: 2024-03-15Bibliographically approved
Al Moubayed, S., Beskow, J., Bollepalli, B., Hussen-Abdelaziz, A., Johansson, M., Koutsombogera, M., . . . Varol, G. (2014). Tutoring Robots: Multiparty Multimodal Social Dialogue With an Embodied Tutor. In: : . Paper presented at 9th International Summer Workshop on Multimodal Interfaces, Lisbon, Portugal. Springer Berlin/Heidelberg
Open this publication in new window or tab >>Tutoring Robots: Multiparty Multimodal Social Dialogue With an Embodied Tutor
Show others...
2014 (English)Conference paper, Published paper (Refereed)
Abstract [en]

This project explores a novel experimental setup towards building spoken, multi-modally rich, and human-like multiparty tutoring agent. A setup is developed and a corpus is collected that targets the development of a dialogue system platform to explore verbal and nonverbal tutoring strategies in multiparty spoken interactions with embodied agents. The dialogue task is centered on two participants involved in a dialogue aiming to solve a card-ordering game. With the participants sits a tutor that helps the participants perform the task and organizes and balances their interaction. Different multimodal signals captured and auto-synchronized by different audio-visual capture technologies were coupled with manual annotations to build a situated model of the interaction based on the participants personalities, their temporally-changing state of attention, their conversational engagement and verbal dominance, and the way these are correlated with the verbal and visual feedback, turn-management, and conversation regulatory actions generated by the tutor. At the end of this chapter we discuss the potential areas of research and developments this work opens and some of the challenges that lie in the road ahead.

Place, publisher, year, edition, pages
Springer Berlin/Heidelberg, 2014
Keywords
Conversational Dominance; Embodied Agent; Multimodal; Multiparty; Non-verbal Signals; Social Robot; Spoken Dialogue; Turn-taking; Tutor; Visual Attention
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-158149 (URN)000349440300004 ()2-s2.0-84927643008 (Scopus ID)
Conference
9th International Summer Workshop on Multimodal Interfaces, Lisbon, Portugal
Note

QC 20161018

Available from: 2014-12-30 Created: 2014-12-30 Last updated: 2024-03-15Bibliographically approved
Bollepalli, B., Raitio, T. & Alku, P. (2013). Effect of MPEG audio compression on HMM-based speech synthesis. In: Proceedings of the 14th Annual Conference of the International Speech Communication Association: Interspeech 2013. International Speech Communication Association (ISCA), 2013. Paper presented at 14th Annual Conference of the International Speech Communication Association, INTERSPEECH 2013, 25 August 2013 through 29 August 2013, Lyon, France (pp. 1062-1066).
Open this publication in new window or tab >>Effect of MPEG audio compression on HMM-based speech synthesis
2013 (English)In: Proceedings of the 14th Annual Conference of the International Speech Communication Association: Interspeech 2013. International Speech Communication Association (ISCA), 2013, 2013, p. 1062-1066Conference paper, Published paper (Refereed)
Abstract [en]

In this paper, the effect of MPEG audio compression on HMMbased speech synthesis is studied. Speech signals are encoded with various compression rates and analyzed using the GlottHMM vocoder. Objective evaluation results show that the vocoder parameters start to degrade from encoding with bitrates of 32 kbit/s or less, which is also confirmed by the subjective evaluation of the vocoder analysis-synthesis quality. Experiments with HMM-based speech synthesis show that the subjective quality of a synthetic voice trained with 32 kbit/s speech is comparable to a voice trained with uncompressed speech, but lower bit rates induce clear degradation in quality.

Series
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, ISSN 2308-457X
Keywords
GlottHMM, HMM, MP3, Speech synthesis, Audio signal processing, Motion Picture Experts Group standards, Vocoders, Analysis-synthesis, HMM-based speech synthesis, Objective evaluation, Subjective evaluations, Subjective quality, Quality control
National Category
Natural Language Processing
Identifiers
urn:nbn:se:kth:diva-150864 (URN)000395050000225 ()2-s2.0-84906262154 (Scopus ID)
Conference
14th Annual Conference of the International Speech Communication Association, INTERSPEECH 2013, 25 August 2013 through 29 August 2013, Lyon, France
Note

QC 20140911

Available from: 2014-09-11 Created: 2014-09-11 Last updated: 2025-02-07Bibliographically approved
Bollepalli, B., Beskow, J. & Gustafsson, J. (2013). Non-Linear Pitch Modification in Voice Conversion using Artificial Neural Networks. In: Advances in nonlinear speech processing: 6th International Conference, NOLISP 2013, Mons, Belgium, June 19-21, 2013 : proceedings. Paper presented at 6th International Conference on Advances in Nonlinear Speech Processing, NOLISP 2013; Mons; Belgium; 19 June 2013 through 21 June 2013 (pp. 97-103). Springer Berlin/Heidelberg
Open this publication in new window or tab >>Non-Linear Pitch Modification in Voice Conversion using Artificial Neural Networks
2013 (English)In: Advances in nonlinear speech processing: 6th International Conference, NOLISP 2013, Mons, Belgium, June 19-21, 2013 : proceedings, Springer Berlin/Heidelberg, 2013, p. 97-103Conference paper, Published paper (Refereed)
Abstract [en]

Majority of the current voice conversion methods do not focus on the modelling local variations of pitch contour, but only on linear modification of the pitch values, based on means and standard deviations. However, a significant amount of speaker related information is also present in pitch contour. In this paper we propose a non-linear pitch modification method for mapping the pitch contours of the source speaker according to the target speaker pitch contours. This work is done within the framework of Artificial Neural Networks (ANNs) based voice conversion. The pitch contours are represented with Discrete Cosine Transform (DCT) coefficients at the segmental level. The results evaluated using subjective and objective measures confirm that the proposed method performed better in mimicking the target speaker's speaking style when compared to the linear modification method.

Place, publisher, year, edition, pages
Springer Berlin/Heidelberg, 2013
Series
Lecture Notes in Computer Science, ISSN 0302-9743 ; 7911
Keywords
Discrete cosine transform coefficients, Local variations, Modification methods, Pitch modification, Speaking styles, Standard deviation, Subjective and objective measures, Voice conversion
National Category
Computer Sciences Natural Language Processing
Identifiers
urn:nbn:se:kth:diva-137386 (URN)10.1007/978-3-642-38847-7_13 (DOI)2-s2.0-84888246669 (Scopus ID)
Conference
6th International Conference on Advances in Nonlinear Speech Processing, NOLISP 2013; Mons; Belgium; 19 June 2013 through 21 June 2013
Note

QC 20210511

Available from: 2013-12-13 Created: 2013-12-13 Last updated: 2025-02-01Bibliographically approved
Bollepalli, B., Beskow, J. & Gustafson, J. (2012). HMM based speech synthesis system for Swedish Language. In: The Fourth Swedish Language Technology Conference. Paper presented at The Fourth Swedish Language Technology Conference. Lund, Sweden
Open this publication in new window or tab >>HMM based speech synthesis system for Swedish Language
2012 (English)In: The Fourth Swedish Language Technology Conference, Lund, Sweden, 2012Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
Lund, Sweden: , 2012
National Category
Computer Sciences Natural Language Processing
Identifiers
urn:nbn:se:kth:diva-109393 (URN)
Conference
The Fourth Swedish Language Technology Conference
Funder
ICT - The Next Generation
Note

tmh_import_13_01_02, tmh_id_3803. QC 20130114

Available from: 2013-01-02 Created: 2013-01-02 Last updated: 2025-02-01Bibliographically approved
Organisations

Search in DiVA

Show all publications