Ändra sökning
Avgränsa sökresultatet
1234567 151 - 200 av 1064
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Träffar per sida
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
Markera
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 151.
    Blomberg, Mats
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Within-utterance correlation for speech recognition1999Konferensbidrag (Refereegranskat)
    Abstract [en]

    Relations between non-adjacent parts of an utterance are commonly regarded as an important source of information for speech recognition. However, they have not been very much used in speech recognition systems. In this paper, we include this information by joint distributions of pairs of phones occurring in the same utterance. In addition to relations between acoustic events, we also have incorporated relations between spectral and prosodically oriented information, such as phone duration, position in utterance and funda-mental frequency. Preliminary recognition results on N-best rescoring show 10% word error reduction compared to a baseline Viterbi decoder.

  • 152.
    Blomberg, Mats
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Within-utterance correlation in automatic speech recognition1999Konferensbidrag (Refereegranskat)
    Abstract [en]

    Information on relations between separate parts of an utterance can be used to improve the performance of speech recognition systems. In this paper, examples of relations are discussed and some measured data on phone pair correlation is presented. In addition to relations between acoustic events in an utterance, it is also possible to represent relations between acoustic and non-acoustic information. In this way, covariance matrices can express some relations similar to phonetic-acoustic rules. Two alternative recognition methods are proposed to account for these relations. Some correlation data are presented and discussed.

  • 153.
    Blomberg, Mats
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Carlson, R
    Labeling of speech given its text representation1993Konferensbidrag (Refereegranskat)
  • 154.
    Blomberg, Mats
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Carlson, R
    Elenius, K
    Galyas, K
    Granström, B
    Hunnicutt, S
    Neovius, L
    Speech synthesis and recognition in technical aids1986Ingår i: STL-QPSR, Vol. 27, nr 4, s. 45-65Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    A number of speech-producing technical aids arenowavailable for use by disabled individuals. One system which produces synthetic speech is described and its application in technical aids discussed. These applications inclde a communication aid, a symbol-to-speech system, talking terminals and a daily newspaper. A pattern-matching speech recognition system is also described and its future in the area of technical aids discussed.

  • 155.
    Blomberg, Mats
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Carlson, R
    Elenius, K
    Granström, B
    Auditory models as front ends in speech-recognition systems1986Konferensbidrag (Refereegranskat)
    Abstract [en]

    Includes comments by Stefanie Seneff and Nelson Kiang. (PsycINFO Database Record (c) 2010 APA, all rights reserved)

  • 156.
    Blomberg, Mats
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Carlson, R
    Elenius, K
    Granström, B
    Auditory models in isolated word recognition1984Konferensbidrag (Refereegranskat)
    Abstract [en]

    A straightforward isolated word recognition system has been used to test different auditory models in acoustic front end processing. The models include BARK, PHON and SONE. The PHONTEMP model is based on PHON but also includes temporal forward masking. We also introduce a model, DOMIN, which is intended to measure the dominating frequency at each point along the 'basilar membrane.' All the above models were derived from an FFT-analysis, and the FFT processing is also used as a reference model. One male and one female speaker were used to test the recognition performance of the different models on a difficult vocabulary consisting of 18 Swedish consonants and 9 Swedish vowels. The results indicate that the performance of the models decreases as they become more complex. The overall recognition accuracy of FFT is 97% while it is 87% for SONE. However, the DOMIN model which is sensitive to dominant frequencies (formants) performs very well for vowels.

  • 157.
    Blomberg, Mats
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Carlson, R
    Elenius, K
    Granström, B
    Experiments with auditory models in speech recognition1982Konferensbidrag (Refereegranskat)
  • 158.
    Blomberg, Mats
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Carlson, R
    Elenius, K
    Granström, B
    Speech research at KTH - two projects and technology transfer1985Konferensbidrag (Refereegranskat)
  • 159.
    Blomberg, Mats
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Carlson, R
    Elenius, K
    Granström, B
    Hunnicutt, S
    Some current projects at KTH related to speech recognition1986Konferensbidrag (Refereegranskat)
    Abstract [en]

    Understanding and modelling the human speech understanding process requires knowledge in several domains, from auditory analysis of speech to higher linguistic processes. Integrating this knowledge into a coherent model is not the scope of this paper. Rather we want to present some projects that may add to the understanding of some components that eventually could be built into a knowledge-based speech recognition system. One project is concerned with a framework to formulate and experiment w i t h the earlier levels of speech analysis. Others deal with different kindsof auditory representationsand methods for comparing speech sounds. S t i l l another project studies the p h e t i c and  ortbgraphic properties of different European  languages.

  • 160.
    Blomberg, Mats
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Carlson, R
    Elenius, K
    Granström, B
    Hunnicutt, S
    Taligenkänning baserad på ett text-till-talsystem1987Konferensbidrag (Refereegranskat)
  • 161.
    Blomberg, Mats
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Carlson, R
    Elenius, K
    Granström, B
    Hunnicutt, S
    Word recognition using synthesized reference templates1988Konferensbidrag (Refereegranskat)
    Abstract [en]

    A major problem in large‐vocabulary speech recognition is the collection of reference data and speaker normalization. In this paper, the use of synthetic speech is proposed as a means of handling this problem. An experimental scheme for such a speech recognition system will be described. A rule‐based speech synthesis procedure is used for generating the reference data. Ten male subjects participated in an experiment using a 26‐word test vocabulary recorded in a normal office room. The subjects were asked to read the words from a list with little instruction except to pronounce each word separately. The synthesis was used to build the references. No adjustments were done to the synthesis in this first stage. All the human speakers served better as reference than the synthesis. Differences between natural and synthetic speech have been analyzed in detail at the segmental level. Methods for updating the synthetic speech parameters from natural speech templates will be described. [This work has been supported by the Swedish Board of Technical Development.]

  • 162.
    Blomberg, Mats
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Carlson, R
    Elenius, K
    Granström, B
    Hunnicutt, S
    Word recognition using synthesized templates1988Konferensbidrag (Refereegranskat)
    Abstract [en]

    With the ultimate aim of creating a knowledge based speech understanding system, we have set up a conceptual framework named NEBULA. In this paper we briefly describe some of the components of this framework and also report on some experiments where we use a production component for generating reference data for the recognition. The production component in the form of a speech synthesis system will ideally make the collection of training data unnecessary. Preliminary results of an isolated word recognition experiment will be presented and discussed. Several methods of interfacing the production component to the recognition/evaluation component have been pursued.

  • 163.
    Blomberg, Mats
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Carlson, R
    Elenius, K
    Granström, B
    Hunnicutt, S
    Lindell, R
    Neovius, L
    An experimental dialog system: WAXHOLM1993Konferensbidrag (Refereegranskat)
    Abstract [en]

    Recently we have begun to build the basic tools for a generic speech-dialogue system, WAXHOLM. The main modules, their function and internal communication have been specified. The different components are connected through a computer network. A preliminary version of the system has been tested, using simplified versions of the modules. We will give a general overview of the system and describe some of the components in more detail. Application specific data are collected with the help of Wizard-of-Oz techniques. The dialogue system is used during the data collection and the wizard only replaces the speechrecognition module.

  • 164.
    Blomberg, Mats
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Carlson, R
    Elenius, K
    Granström, B
    Hunnicutt, S
    Lindell, R
    Neovius, L
    Speech recognition based on a text-to-speech synthesis system1987Konferensbidrag (Refereegranskat)
    Abstract [en]

    A major problem in large-vocabulary speech recognition is the collection of reference data and speaker normalization. In this paper we propose the use of synthetic speech as a means of handling this problem. An experimental scheme for such a system will be described.

  • 165.
    Blomberg, Mats
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Carlson, R
    Elenius, K.O.E
    Granström, B
    Auditory models and isolated word recognition1983Ingår i: STL-QPSR, Vol. 24, nr 4, s. 1-15Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    A straightforward isolated word recognition system has been used to test different auditory models in acoustic front end processing. The models include BARK, PHON and SONE. The PHONTEMP model is based on PHON but also includes temporal forward masking. We also introduce a model, DOMIN, which is intended to measure the dominating frequency at each point along the 'basilar membrane.' All the above models were derived from an FFT-analysis, and the FFT processing is also used as a reference model. One male and one female speaker were used to test the recognition performance of the different models on a difficult vocabulary consisting of 18 Swedish consonants and 9 Swedish vowels. The results indicate that the performance of the models decreases as they become more complex. The overall recognition accuracy of FFT is 97% while it is 87% for SONE. However, the DOMIN model which is sensitive to dominant frequencies (formants) performs very well for vowels.

  • 166.
    Blomberg, Mats
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Carlsson, R
    Elenius, K
    Granström, B
    Hunnicutt, S
    Word recognition using synthesized templates1988Ingår i: STL-QPSR, Vol. 29, nr 2-3, s. 069-081Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    With the ultimate aim of creating a knowledge based speech understanding system, we have set up a conceptual framework named NEBULA. In this paper we briefly describe some of the components of this framework and also report on some experiments where we use a production component for generating reference data for the recognition. The production component in the form of a speech synthesis system will ideally make the collection of training data unnecessary. Preliminary results of an isolated word recognition experiment will be presented and discussed. Several methods of interfacing the production component to the recognition/evaluation component have been pursued.

  • 167.
    Blomberg, Mats
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Elenius, Daniel
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Estimating speaker characteristics for speech recognition2009Ingår i: Proceedings of Fonetik 2009 / [ed] Peter Branderud, Hartmut Traunmüller, Stockholm: Stockholm University, 2009, s. 154-158Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    A speaker-characteristic-based hierarchic tree of speech recognition models is designed. The leaves of the tree contain model sets, which are created by transforming a conventionally trained set using leaf-specific speaker profile vectors. The non-leaf models are formed by merging the models of their child nodes. During recognition, a maximum likelihood criterion is followed to traverse the tree from the root to a leaf. The computational load for estimating one- (vocal tract length) and fourdimensional speaker profile vectors (vocal tractlength, two spectral slope parameters andmodel variance scaling) is reduced to a fraction compared to that of an exhaustive search among all leaf nodes. Recognition experiments on children’s connected digits using adult models exhibit similar recognition performance for the exhaustive and the one-dimensional tree search. Further error reduction is achieved with the four-dimensional tree. The estimated speaker properties are analyzed and discussed.

  • 168.
    Blomberg, Mats
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Elenius, Daniel
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Investigating Explicit Model Transformations for Speaker Normalization2008Ingår i: Proceedings of ISCA ITRW Speech Analysis and Processing for Knowledge Discovery / [ed] Paul Dalsgaard, Christian Fischer Pedersen, Ove Andersen, Aalborg, Denmark: ISCA/AAU , 2008Konferensbidrag (Refereegranskat)
    Abstract [en]

    In this work we extend the test utterance adaptation techniqueused in vocal tract length normalization to a larger number ofspeaker characteristic features. We perform partially joint estimation of four features: the VTLN warping factor, the corner position of the piece-wise linear warping function, spectral tilt in voiced segments, and model variance scaling. In experiments on the Swedish PF-Star children database, joint estimation of warping factor and variance scaling lowered the recognition error rate compared to warping factor alone.

  • 169.
    Blomberg, Mats
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Elenius, Daniel
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Knowledge-Rich Model Transformations for SpeakerKnowledge-Rich Model Transformations for Speaker Normalization in Speech Recognition2008Ingår i: Proceedings, FONETIK 2008, Department of Linguistics, University of Gothenburg / [ed] Anders Eriksson, Jonas Lindh, 2008, s. 37-40Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    In this work we extend the test utterance adaptationtechnique used in vocal tract length normalizationto a larger number of speaker characteristicfeatures. We perform partially jointestimation of four features: the VTLN warpingfactor, the corner position of the piece-wise linearwarping function, spectral tilt in voicedsegments, and model variance scaling. In experimentson the Swedish PF-Star children database,joint estimation of warping factor andvariance scaling lowers the recognition errorrate compared to warping factor alone.

  • 170.
    Blomberg, Mats
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Elenius, Daniel
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Tree-Based Estimation of Speaker Characteristics for Speech Recognition2009Ingår i: INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2009, s. 580-583Konferensbidrag (Refereegranskat)
    Abstract [en]

    Speaker adaptation by means of adjustment of speaker characteristic properties, such as vocal tract length, has the important advantage compared to conventional adaptation techniques that the adapted models are guaranteed to be realistic if the description of the properties are. One problem with this approach is that the search procedure to estimate them is computationally heavy. We address the problem by using a multi-dimensional, hierarchical tree of acoustic model sets. The leaf sets are created by transforming a conventionally trained model set using leaf-specific speaker profile vectors. The model sets of non-leaf nodes are formed by merging the models of their child nodes, using a computationally efficient algorithm. During recognition, a maximum likelihood criterion is followed to traverse the tree. Studies of one- (VTLN) and four-dimensional speaker profile vectors (VTLN, two spectral slope parameters and model variance scaling) exhibit a reduction of the computational load to a fraction compared to that of an exhaustive grid search. In recognition experiments on children's connected digits using adult and male models, the one-dimensional tree search performed as well as the exhaustive search. Further reduction was achieved with four dimensions. The best recognition results are 0.93% and 10.2% WER in TIDIGITS and PF-Star-Sw, respectively, using adult models.

  • 171.
    Blomberg, Mats
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Elenius, K
    A device for automatic speech recognition1982Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper is a translation of a paper originally published in the proceedings of the 1982 meeting of "Nordiska akustistka sällskapet" (The Nordic Acoustical Society), pp. 383-386. 2 DESCRIPTION OF THE RECOGNITION SYSTEM

  • 172.
    Blomberg, Mats
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Elenius, K
    Creation of unseen triphones from diphones and monophones using a speech production approach1996Konferensbidrag (Refereegranskat)
    Abstract [en]

    With limited training data, infrequent triphone models for speech recognition will not be observed in sufficient number. In this report, a speech production approach is used to predict the characteristics of unseen triphones by concatenating diphones and/or monophones in the parametric representation of a formant speech synthesiser. The parameter trajectories are estimated by interpolation between the endpoints of the original units. The spectral states of the created triphone are generated by the speech synthesiser. Evaluation of the proposed technique has been performed using spectral error measurements and recognition candidate rescoring of N-best lists. In both cases, the created triphones are shown to perform better than the shorter units from which they were constructed. 1. INTRODUCTION The triphone unit is the basic phone model in many current phonetic speech recognition systems. The reason for this is that triphones capture the coarticulation effect caused by the immediate pr...

  • 173.
    Blomberg, Mats
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Elenius, K
    Effects of emphasizing transitional or stationary parts of the speech signal in a discrete utterance recognition system1982Konferensbidrag (Refereegranskat)
    Abstract [en]

    A pattern matching word recognition system has been modified in order to emphasize the transient parts of speech in the similarity mesure. The technique is to weight the word distances with a normalized spectral change function. A small positive effect is measured. Emphasizing the stationary parts is shown to substantially decrease the performance. Adding the time derivative of the speech parameters to the word patterns improves performance significantly.  This is probaly a consequence of an improvement in the description of the transient segments.

  • 174.
    Blomberg, Mats
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Elenius, K
    Nonlinear Frequency Warp for Speech Recognition1986Konferensbidrag (Refereegranskat)
    Abstract [en]

    A technique of nonlinear frequency warping has been investigated for recognition of Swedish vowels. A frequency warp between two spectra is computed using a standard dynamic programming algorithm. The frequency distance, defined as the area between the obtained warping function and the diagonal, is contributing to the spectral distance. The distance between two spectra is a weighted sum of the warped amplitude distance and the frequency distance. By changing two weights, we get a gradual shift between non-warped amplitude distance, warped amplitude distance, and frequency distance. In recognition experiments on natural and synthetic vowel spectra, a metric combining the frequency and amplitude distances gave better results than using only amplitude or frequency deviation. Analysis of the results of the synthetic vowels show a reduced sensitivity to voice source and pitch variation. For the natural vowels, the recognition improvement is larger for the male and female speakers separately than for the combined groups.

  • 175.
    Blomberg, Mats
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Elenius, K
    Optimizing some parameters of a word recognizer used in car noise1990Ingår i: STL-QPSR, Vol. 31, nr 4, s. 43-52Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    A speaker-dependent word recognition system has been modified to improve the performance in noise. Problems with word detection and noise compensation have been addressed by using a close-talk microphone and a "noise addition" method. The reference templates are recorded in relative silence. The additional environmental noise during the recognition phase is measured and is "added" to the reference templates before using them for template matching. The recognition performance has been tested in moving cars with references recorded in parked cars. Recordings of six male speakers have been used in this report to rest the sensitivity of the recognition system to some essential parameters. The results from six male speakers and a twenty word vocabulary show that adapting the endpoint detection threshold to the noise level is essential for good performance and that noise compensation is imponant at signal-to-noise ratios below 15 dB.

  • 176.
    Blomberg, Mats
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Elenius, K
    Statistical analysis of speech signals1970Ingår i: STL-QPSR, Vol. 11, nr 4, s. 1-8Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    This is a condensed report of a thesis study carried out at the Department of Speech Communication in 1970, The purpose was to determine, for continuous speech, peakfactor, formfactor, long-time average spcctrum of voiced and voiceless sections separately, spectral density at different voice intensity level s r dist r ib~l t iono f the speech-wave amplitude, statistics on pause lengths and long-time average RMS of the speech wave. All tasks have been solved using the CDC computer of the Department.

  • 177.
    Blomberg, Mats
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Elenius, K
    Testing some essential parameters of a word recogniser used in car noise1989Konferensbidrag (Refereegranskat)
    Abstract [en]

    A speaker-dependent word recognition system has been modified to improve the performance in noise. Problems with word detection and noise compensation have been addressed by using a close-talk microphone and a "noise addition" method. The reference templates are recorded in relative silence. The additional environmental noise during the recognition phase is measured and is "added" to the reference templates before using them for template matching. The recognition performance has been tested in moving cars with references recorded in parked cars. Recordings of six male speakers have been used in this report to rest the sensitivity of the recognition system to some essential parameters. The results from six male speakers and a twenty word vocabulary show that adapting the endpoint detection threshold to the noise level is essential for good performance and that noise compensation is  imponant at signal-to-noise ratios below 15 dB.

  • 178.
    Blomberg, Mats
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Elenius, K
    Lundin, F
    Sundmalm, C
    Let your voice do the dialing1983Ingår i: Telephony, ISSN 0040-2656, E-ISSN 2161-8690, s. 68-74Artikel i tidskrift (Refereegranskat)
  • 179.
    Blomberg, Mats
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Elenius, K
    Ström, N
    Speech recognition in the Waxholm dialog system1994Konferensbidrag (Refereegranskat)
    Abstract [en]

    The speech recognition component in the KTH "Waxholm" dialog system is described. It will handle continuous speech with a vocabulary of about 1000 words. The output of the recogniser is fed to a probabilistic, knowledge-based parser, that contains a context-free grammar compiled into an augmented transition network.

  • 180.
    Blomberg, Mats
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Elenius, Kjell
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Karlsson, Inger A.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Research Challenges in Speech Technology: A Special Issue in Honour of Rolf Carlson and Bjorn Granstrom2009Ingår i: Speech Communication, ISSN 0167-6393, E-ISSN 1872-7182, Vol. 51, nr 7, s. 563-563Artikel i tidskrift (Refereegranskat)
  • 181.
    Blomberg, Mats
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Elenius, K.O.E
    Automatic time alignment of speech with a phonetic transcription1985Ingår i: STL-QPSR, Vol. 26, nr 1, s. 37-45Artikel i tidskrift (Refereegranskat)
  • 182.
    Blomberg, Mats
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Skantze, Gabriel
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Al Moubayed, Samer
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Gustafson, Joakim
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Beskow, Jonas
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Granström, Björn
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Children and adults in dialogue with the robot head Furhat - corpus collection and initial analysis2012Ingår i: Proceedings of WOCCI, Portland, OR, 2012Konferensbidrag (Refereegranskat)
  • 183.
    Bohman, Mikael
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Ternström, Sten
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Södersten, M.
    Karolinska University Hospital at Huddinge.
    The use of channel estimation techniques for investigating vocal stress in noisy environments2003Ingår i: Ultragarsas, ISSN 1392-2114, Vol. 3, nr 48, s. 9-13Artikel i tidskrift (Övrigt vetenskapligt)
  • 184.
    Bollepalli, Bajibabu
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation. Aalto University, Department of Signal Processing and Acoustics.
    Towards conversational speech synthesis: Experiments with data quality, prosody modification, and non-verbal signals2017Licentiatavhandling, sammanläggning (Övrigt vetenskapligt)
    Abstract [en]

    The aim of a text-to-speech synthesis (TTS) system is to generate a human-like speech waveform from a given input text. Current TTS sys- tems have already reached a high degree of intelligibility, and they can be readily used to read aloud a given text. For many applications, e.g. public address systems, reading style is enough to convey the message to the people. However, more recent applications, such as human-machine interaction and speech-to-speech translation, call for TTS systems to be increasingly human- like in their conversational style. The goal of this thesis is to address a few issues involved in a conversational speech synthesis system.

    First, we discuss issues involve in data collection for conversational speech synthesis. It is very important to have data with good quality as well as con- tain more conversational characteristics. In this direction we studied two methods 1) harvesting the world wide web (WWW) for the conversational speech corpora, and 2) imitation of natural conversations by professional ac- tors. In former method, we studied the effect of compression on the per- formance of TTS systems. It is often the case that speech data available on the WWW is in compression form, mostly use the standard compression techniques such as MPEG. Thus in paper 1 and 2, we systematically stud- ied the effect of MPEG compression on TTS systems. Results showed that the synthesis quality indeed affect by the compression, however, the percep- tual differences are strongly significant if the compression rate is less than 32kbit/s. Even if one is able to collect the natural conversational speech it is not always suitable to train a TTS system due to problems involved in its production. Thus in later method, we asked the question that can we imi- tate the conversational speech by professional actors in recording studios. In this direction we studied the speech characteristics of acted and read speech. Second, we asked a question that can we borrow a technique from voice con- version field to convert the read speech into conversational speech. In paper 3, we proposed a method to transform the pitch contours using artificial neu- ral networks. Results indicated that neural networks are able to transform pitch values better than traditional linear approach. Finally, we presented a study on laughter synthesis, since non-verbal sounds particularly laughter plays a prominent role in human communications. In paper 4 we present an experimental comparison of state-of-the-art vocoders for the application of HMM-based laughter synthesis. 

  • 185.
    Bollepalli, Bajibabu
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Beskow, Jonas
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Gustafson, Joakim
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    HMM based speech synthesis system for Swedish Language2012Ingår i: The Fourth Swedish Language Technology Conference, Lund, Sweden, 2012Konferensbidrag (Refereegranskat)
  • 186.
    Bollepalli, Bajibabu
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Beskow, Jonas
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Gustafson, Joakim
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Non-Linear Pitch Modification in Voice Conversion using Artificial Neural Networks2013Ingår i: Advances in nonlinear speech processing: 6th International Conference, NOLISP 2013, Mons, Belgium, June 19-21, 2013 : proceedings, Springer Berlin/Heidelberg, 2013, s. 97-103Konferensbidrag (Refereegranskat)
    Abstract [en]

    Majority of the current voice conversion methods do not focus on the modelling local variations of pitch contour, but only on linear modification of the pitch values, based on means and standard deviations. However, a significant amount of speaker related information is also present in pitch contour. In this paper we propose a non-linear pitch modification method for mapping the pitch contours of the source speaker according to the target speaker pitch contours. This work is done within the framework of Artificial Neural Networks (ANNs) based voice conversion. The pitch contours are represented with Discrete Cosine Transform (DCT) coefficients at the segmental level. The results evaluated using subjective and objective measures confirm that the proposed method performed better in mimicking the target speaker's speaking style when compared to the linear modification method.

  • 187.
    Bollepalli, Bajibabu
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Raitio, T.
    Alku, P.
    Effect of MPEG audio compression on HMM-based speech synthesis2013Ingår i: Proceedings of the 14th Annual Conference of the International Speech Communication Association: Interspeech 2013. International Speech Communication Association (ISCA), 2013, 2013, s. 1062-1066Konferensbidrag (Refereegranskat)
    Abstract [en]

    In this paper, the effect of MPEG audio compression on HMMbased speech synthesis is studied. Speech signals are encoded with various compression rates and analyzed using the GlottHMM vocoder. Objective evaluation results show that the vocoder parameters start to degrade from encoding with bitrates of 32 kbit/s or less, which is also confirmed by the subjective evaluation of the vocoder analysis-synthesis quality. Experiments with HMM-based speech synthesis show that the subjective quality of a synthetic voice trained with 32 kbit/s speech is comparable to a voice trained with uncompressed speech, but lower bit rates induce clear degradation in quality.

  • 188.
    Bollepalli, Bajibabu
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Raito, T.
    Effect of MPEG audio compression on vocoders used in statistical parametric speech synthesis2014Ingår i: 2014 Proceedings of the 22nd European Signal Processing Conference (EUSIPCO), European Signal Processing Conference, EUSIPCO , 2014, s. 1237-1241Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper investigates the effect of MPEG audio compression on HMM-based speech synthesis using two state-of-the-art vocoders. Speech signals are first encoded with various compression rates and analyzed using the GlottHMM and STRAIGHT vocoders. Objective evaluation results show that the parameters of both vocoders gradually degrade with increasing compression rates, but with a clear increase in degradation with bit-rates of 32 kbit/s or less. Experiments with HMM-based synthesis with the two vocoders show that the degradation in quality is already perceptible with bit-rates of 32 kbit/s and both vocoders show similar trend in degradation with respect to compression ratio. The most perceptible artefacts induced by the compression are spectral distortion and reduced bandwidth, while prosody is better preserved.

  • 189.
    Bollepalli, Bajibabu
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Urbain, Jerome
    Raitio, Tuomo
    Gustafson, Joakim
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Cakmak, Huseyin
    A COMPARATIVE EVALUATION OF VOCODING TECHNIQUES FOR HMM-BASED LAUGHTER SYNTHESIS2014Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper presents an experimental comparison of various leading vocoders for the application of HMM-based laughter synthesis. Four vocoders, commonly used in HMM-based speech synthesis, are used in copy-synthesis and HMM-based synthesis of both male and female laughter. Subjective evaluations are conducted to assess the performance of the vocoders. The results show that all vocoders perform relatively well in copy-synthesis. In HMM-based laughter synthesis using original phonetic transcriptions, all synthesized laughter voices were significantly lower in quality than copy-synthesis, indicating a challenging task and room for improvements. Interestingly, two vocoders using rather simple and robust excitation modeling performed the best, indicating that robustness in speech parameter extraction and simple parameter representation in statistical modeling are key factors in successful laughter synthesis.

  • 190. Bolíbar, Jordi
    et al.
    Bresin, Roberto
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Sound feedback for the optimization of performance in running2012Ingår i: TMH-QPSR special issue: Proceedings of SMC Sweden 2012 Sound and Music Computing, Understanding and Practicing in Sweden, ISSN 1104-5787, Vol. 52, nr 1, s. 39-40Artikel i tidskrift (Refereegranskat)
  • 191. Borch, D. Zangger
    et al.
    Sundberg, Johan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Some Phonatory and Resonatory Characteristics of the Rock, Pop, Soul, and Swedish Dance Band Styles of Singing2011Ingår i: Journal of Voice, ISSN 0892-1997, E-ISSN 1873-4588, Vol. 25, nr 5, s. 532-537Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    This investigation aims at describing voice function of four nonclassical styles of singing, Rock, Pop, Soul, and Swedish Dance Band. A male singer, professionally experienced in performing in these genres, sang representative tunes, both with their original lyrics and on the syllable /pae/. In addition, he sang tones in a triad pattern ranging from the pitch Bb2 to the pitch C4 on the syllable /pae/in pressed and neutral phonation. An expert panel was successful in classifying the samples, thus suggesting that the samples were representative of the various styles. Subglottal pressure was estimated from oral pressure during the occlusion for the consonant [p]. Flow glottograms were obtained from inverse filtering. The four lowest formant frequencies differed between the styles. The mean of the subglottal pressure and the mean of the normalized amplitude quotient (NAQ), that is, the ratio between the flow pulse amplitude and the product of period and maximum flow declination rate, were plotted against the mean of fundamental frequency. In these graphs, Rock and Swedish Dance Band assumed opposite extreme positions with respect to subglottal pressure and mean phonation frequency, whereas the mean NAQ values differed less between the styles.

  • 192. Borg, Erik
    et al.
    Edquist, Gertrud
    Reinholdson, Anna-Clara
    Risberg, Arne
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    McAllister, Bob
    Speech and language development in a population of Swedish hearing-impaired pre-school-children, a cross-sectional study2007Ingår i: International Journal of Pediatric Otorhinolaryngology, ISSN 0165-5876, E-ISSN 1872-8464, Vol. 71, nr 7, s. 1061-1077Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Objective: There is little information on speech and language development in preschool children with mild, moderate or severe hearing impairment. The primary aim of the study is to establish a reference material for clinical use covering various aspects of speech and language functions and to relate test values to pure tone audiograms and parents' judgement of their children's hearing and language abilities. Methods: Nine speech and language tests were applied or modified, both classical tests and newly developed tests. Ninety-seven children with normal hearing and 156 with hearing impairment were tested. Hearing was 80 dB HL PTA or better in the best ear. Swedish was their strongest language. None had any additional diagnosed major handicaps. The children were 4-6 years of age. The material was divided into 10 categories of hearing impairment, 5 conductive and 5 sensorineural: unilateral; bilateral 0-20; 21-40; 41-60; 61-80 dB HL PTA. The tests, selected on the basis of a three component language model, are phoneme discrimination; rhyme matching; Peabody Picture Vocabulary Test (PPVT-III, word perception); Test for Reception of Grammar (TROG, grammar perception); prosodic phrase focus; rhyme construction; Word Finding Vocabulary Test (word production); Action Picture Test (grammar production); oral motor test. Results: Only categories with sensorineural toss showed significant differences from normal. Word production showed the most marked delay for 21-40 dB HL: 5 and 6 years p < 0.01; for 41-60 dB: 4 years p < 0.01 and 6 years p < 0.01 and 61-80 dB: 5 years p < 0.05. Phoneme discrimination 21-40 dB HL: 6 years p < 0.05; 41-60 dB: 4 years p < 0.01; 61-80 dB: 4 years p < 0.001, 5 years p < 0.001. Rhyme matching: no significant difference as compared to normal data. Word perception: sensorineural 41-60 dB HL: 6 years p < 0.05; 61-80 dB: 4 years p < 0.05; 5 years p < 0.01. Grammar perception: sensorineural 41-60 dB HL: 6 years p < 0.05; 61-80 dB: 5 years p < 0.05. Prosodic phrase focus: 41-60 dB HL: 5 years p < 0.01. Rhyme construction: 41-60 dB HL: 4 years p < 0.05. Grammar production: 61-80 dB HL: 5 years p < 0.01. Oral motor function: no differences. The Word production test showed a 1.5-2 years delay for sensorineural impairment 41-80 dB HL through 4-6 years of age. There were no differences between hearing-impaired boys and girls. Extended data for the screening test [E. Borg, A. Risberg, B. McAllister, B.M. Undemar, G. Edquist, A.C. Reinholdsson, et at., Language development in hearing-impaired children. Establishment of a reference material for a "Language test for hearing-impaired children", Int. J. Pediatr. Otorhinolaryngot. 65 (2002) 15-26] are presented. Conclusions: Reference values for expected speech and language development are presented that cover nearly 60% of the studied population. The effect of the peripheral hearing impairment is compensated for in many children with hearing impairment up to 60 dB HL. Above that degree of impairment, language delay is more pronounced, probably due to a toss of acuity. The importance of central cognitive functions, speech reading and signing for compensation of peripheral limitations is pointed out.

  • 193. Borin, Lars
    et al.
    Brandt, Martha D.
    Edlund, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Lindh, Jonas
    Parkvall, Mikael
    The Swedish Language in the Digital Age/Svenska språket i den digitala tidsåldern2012Bok (Refereegranskat)
  • 194. Boves, L.
    et al.
    Carlson, Rolf
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Hinrichs, E.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Krauwer, S.
    Lemnitzer, L.
    Vainio, M.
    Wittenburg, P.
    Resources for Speech Research: Present and Future Infrastructure Needs2009Ingår i: Proceedings of the 10th Annual Conference of the International Speech Communication Association, INTERSPEECH 2009, Brighton, UK, 2009, s. 1803-1806Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper introduces the EU-FP7 project CLARIN, a joint effort of over 150 institutions in Europe, aimed at the creation of a sustainable language resources and technology infrastructure for the humanities and social sciences research community. The paper briefly introduces the vision behind the project and how it relates to speech research with a focus on the contributions that CLARIN can and will make to research in spoken language processing.

  • 195.
    Boye, Johan
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Teoretisk datalogi, TCS.
    Fredriksson, M.
    Götze, Jana
    KTH, Skolan för datavetenskap och kommunikation (CSC), Teoretisk datalogi, TCS.
    Gustafson, Joakim
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Königsmann, J.
    Walk this way: Spatial grounding for city exploration2012Ingår i: IWSDS, 2012Konferensbidrag (Refereegranskat)
  • 196.
    Boye, Johan
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Teoretisk datalogi, TCS.
    Fredriksson, Morgan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Teoretisk datalogi, TCS.
    Götze, Jana
    KTH, Skolan för datavetenskap och kommunikation (CSC), Teoretisk datalogi, TCS.
    Gustafson, Joakim
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Königsmann, Jurgen
    KTH, Skolan för datavetenskap och kommunikation (CSC), Teoretisk datalogi, TCS.
    Walk this way: Spatial grounding for city exploration2014Ingår i: Natural interaction with robots, knowbots and smartphones, Springer-Verlag , 2014, s. 59-67Kapitel i bok, del av antologi (Refereegranskat)
    Abstract [en]

    Recently there has been an interest in spatially aware systems for pedestrian routing and city exploration, due to the proliferation of smartphones with GPS receivers among the general public. Since GPS readings are noisy, giving good and well-timed route instructions to pedestrians is a challenging problem. This paper describes a spoken-dialogue prototype for pedestrian navigation in Stockholm that addresses this problem by using various grounding strategies.

  • 197.
    Bresin, Roberto
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Real-time visualization of musical expression2004Ingår i: Proceedings of Network of Excellence HUMAINE Workshop "From Signals to Signs of Emotion and Vice Versa", Santorini, Greece, Institute of Communication and Computer Systems, National Technical University of Athens, 2004, s. 19-23Konferensbidrag (Refereegranskat)
    Abstract [en]

    A system for real-time feedback of expressive music performance is presented.The feedback is provided by using a graphical interface where acoustic cues arepresented in an intuitive fashion. The graphical interface presents on the computerscreen a three-dimensional object with continuously changing shape, size,position, and colour. Some of the acoustic cues were associated with the shape ofthe object, others with its position. For instance, articulation was associated withshape, staccato corresponded to an angular shape and legato to a rounded shape.The emotional expression resulting from the combination of cues was mapped interms of the colour of the object (e.g., sadness/blue). To determine which colourswere most suitable for respective emotion, a test was run. Subjects rated how welleach of 8 colours corresponds to each of 12 music performances expressingdifferent emotions.

  • 198.
    Bresin, Roberto
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    What is the color of that music performance?2005Ingår i: Proceedings of the International Computer Music Conference - ICMC 2005, Barcelona, 2005, s. 367-370Konferensbidrag (Refereegranskat)
    Abstract [en]

    The representation of expressivity in music is still a fairlyunexplored field. Alternative ways of representing musicalinformation are necessary when providing feedback onemotion expression in music such as in real-time tools formusic education, or in the display of large music databases.One possible solution could be a graphical non-verbal representationof expressivity in music performance using coloras index of emotion. To determine which colors aremost suitable for an emotional expression, a test was run.Subjects rated how well each of 8 colors and their 3 nuancescorresponds to each of 12 music performances expressingdifferent emotions. Performances were playedby professional musicians with 3 instruments, saxophone,guitar, and piano. Results show that subjects associateddifferent hues to different emotions. Also, dark colorswere associated to music in minor tonality and light colorsto music in major tonality. Correspondence betweenspectrum energy and color hue are preliminary discussed.

  • 199.
    Bresin, Roberto
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Askenfelt, Anders
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Friberg, Anders
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Hansen, Kjetil
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Ternström, Sten
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Sound and Music Computing at KTH2012Ingår i: Trita-TMH, ISSN 1104-5787, Vol. 52, nr 1, s. 33-35Artikel i tidskrift (Övrigt vetenskapligt)
    Abstract [en]

    The SMC Sound and Music Computing group at KTH (formerly the Music Acoustics group) is part of the Department of Speech Music and Hearing, School of Computer Science and Communication. In this short report we present the current status of the group mainly focusing on its research.

  • 200.
    Bresin, Roberto
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Medieteknik och interaktionsdesign, MID.
    de Witt, Anna
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Papetti, Stefano
    University of Verona.
    Civolani, Marco
    University of Verona.
    Fontana, Federico
    University of Verona.
    Expressive sonification of footstep sounds2010Ingår i: Proceedings of ISon 2010: 3rd Interactive Sonification Workshop / [ed] Bresin, Roberto; Hermann, Thomas; Hunt, Andy, Stockholm, Sweden: KTH Royal Institute of Technology, 2010, s. 51-54Konferensbidrag (Refereegranskat)
    Abstract [en]

    In this study we present the evaluation of a model for the interactive sonification of footsteps. The sonification is achieved by means of specially designed sensored-shoes which control the expressive parameters of novel sound synthesis models capable of reproducing continuous auditory feedback for walking. In a previousstudy, sounds corresponding to different grounds were associated to different emotions and gender. In this study, we used an interactive sonification actuated by the sensored-shoes for providing auditory feedback to walkers. In an experiment we asked subjects to walk (using the sensored-shoes) with four different emotional intentions (happy, sad, aggressive, tender) and for each emotion we manipulated the ground texture sound four times (wood panels, linoleum, muddy ground, and iced snow). Preliminary results show that walkers used a more active walking style (faster pace) when the sound of the walking surface was characterized by an higher spectral centroid (e.g. iced snow), and a less active style (slower pace) when the spectral centroid was low (e.g. muddy ground). Harder texture sounds lead to more aggressive walking patters while softer ones to more tender and sad walking styles.

1234567 151 - 200 av 1064
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf