A Swedish text-to-speech system has been developed at the Swedish Library of Talking Books and Braille (TPB). The system, named Filibuster, is open and extensible and makes it possible to generate synthetic speech with a high degree of control. The Filibuster system is used in the production of talking books in TPBs service for print handicapped students at university level. Through the use of text-to-speech, students can receive their talking books much faster. Also, each book costs less to produce. The system was deployed in February 2007 and during this year the plan is to produce a total of 200 titles. The system has been designed specifically for creating talking book versions of university textbooks. It has a large lexicon, covering some 573,000 words and names. Filibuster includes a comprehensive text pre-processor to write out nonword entities, such as numbers, characters and expressions. So far, one male voice, Folke, has been created, but more are planned.
To be able to build acoustic models for children, that can beused in spoken dialogue systems, speech data has to be collected. Commercial recognizers available for Swedish are trained on adult speech, which makes them less suitable for children’s computer-directed speech. This paper describes some experiments with on-the-fly voice transformation of children’s speech. Two transformation methods were tested, one inspired by the Phase Vocoder algorithm and another by the Time-Domain Pitch-Synchronous Overlap-Add (TD-PSOLA)algorithm. The speech signal is transformed before being sent to the speech recognizer for adult speech. Our results show that this method reduces the error rates in the order of thirty to fortyfive percent for children users.