Change search
Refine search result
12345 151 - 200 of 201
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 151.
    Seward, Alexander
    KTH, Superseded Departments, Speech, Music and Hearing.
    Low-Latency Incremental Speech Transcription in the Synface Project2003In: Proceedings of the European Conference on Speech Communication and Technology (Eurospeech), Geneva, Switzerland, 2003: vol 2, 2003, p. 1141-1144Conference paper (Other academic)
    Abstract [en]

    In this paper, a real-time decoder for low-latency onlinespeech transcription is presented. The system was developedwithin the Synface project, which aims to improve thepossibilities for hard of hearing people to use conventionaltelephony by providing speech-synchronized multimodalfeedback. This paper addresses the specific issues related toHMM-based incremental phone classification with real-timeconstraints. The decoding algorithm described in this workenables a trade-off to be made between improved recognitionaccuracy and reduced latency. By accepting a longer latencyper output increment, more time can be ascribed tohypothesis look-ahead and by that improve classificationaccuracy. Experiments performed on the Swedish SpeechDatdatabase show that it is possible to generate the sameclassification as is produced by non-incremental decodingusing HTK, by adopting a latency of approx. 150 ms ormore.

  • 152.
    Seward, Alexander
    KTH, Superseded Departments, Speech, Music and Hearing.
    The KTH Large Vocabulary Continuous Speech Recognition System2004Report (Other academic)
  • 153.
    Seward, Alexander
    KTH, Superseded Departments, Speech, Music and Hearing.
    Transducer Optimizations for Tight-Coupled Decoding2001In: Proceedings of the European Conference on Speech Communication and Technology (Eurospeech), Aalborg, Denmark, 2001: vol 3, 2001, p. 1607-1610Conference paper (Other academic)
    Abstract [en]

    In this paper we apply a framework of finiteastate transducers (FsT) to uniformly represent various information sources and dataastructures used in speech recognition. These source models include contextafree language models, phonology models, acoustic model information (Hidden Markov Models), and pronunciation dictionaries. We will describe how this unified representation can serve as a single input model for the recognizer. We will demonstrate how the application of various levels of optimizations can lead to a more compact representation of these transducers and evaluate the effects on recognition performance, in terms of accuracy and computational complexity.

  • 154. Siciliano, C.
    et al.
    Williams, G.
    Faulkner, A.
    Salvi, Giampiero
    KTH, Superseded Departments, Speech, Music and Hearing.
    Intelligibility of an ASR-controlled synthetic talking face2004In: Journal of the Acoustical Society of America, ISSN 0001-4966, E-ISSN 1520-8524, Vol. 115, no 5, p. 2428-Article in journal (Refereed)
    Abstract [en]

    The goal of the SYNFACE project is to develop a multilingual synthetic talking face, driven by an automatic speech recognizer (ASR), to assist hearing‐impaired people with telephone communication. Previous multilingual experiments with the synthetic face have shown that time‐aligned synthesized visual face movements can enhance speech intelligibility in normal‐hearing and hearing‐impaired users [C. Siciliano et al., Proc. Int. Cong. Phon. Sci. (2003)]. Similar experiments are in progress to examine whether the synthetic face remains intelligible when driven by ASR output. The recognizer produces phonetic output in real time, in order to drive the synthetic face while maintaining normal dialogue turn‐taking. Acoustic modeling was performed with a neural network, while an HMM was used for decoding. The recognizer was trained on the SpeechDAT telephone speech corpus. Preliminary results suggest that the currently achieved recognition performance of around 60% frames correct limits the usefulness of the synthetic face movements. This is particularly true for consonants, where correct place of articulation is especially important for visual intelligibility. Errors in the alignment of phone boundaries representative of those arising in the ASR output were also shown to decrease audio‐visual intelligibility.

  • 155. Siciliano, Catherine
    et al.
    Williams, Geoff
    Beskow, Jonas
    KTH, Superseded Departments, Speech, Music and Hearing.
    Faulkner, Andrew
    Evaluation of a Multilingual Synthetic Talking Faceas a Communication Aid for the Hearing Impaired2003In: Proceedings of the 15th International Congress of Phonetic Science (ICPhS'03), Barcelona, Spanien, 2003, p. 131-134Conference paper (Other academic)
  • 156.
    Sjölander, Kåre
    et al.
    KTH, Superseded Departments, Speech, Music and Hearing.
    Heldner, Mattias
    KTH, Superseded Departments, Speech, Music and Hearing.
    Word level precision of the NALIGN automatic segmentation algorithm2004In: Proc of The XVIIth Swedish Phonetics Conference, Fonetik 2004 / [ed] Peter Branderud, Hartmut Traunmüller, 2004, p. 116-119Conference paper (Other academic)
  • 157.
    Sjölander, Peta
    et al.
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Sundberg, Johan
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Spectrum effects of subglottal pressure variation in professional baritone singers2004In: Journal of the Acoustical Society of America, ISSN 0001-4966, E-ISSN 1520-8524, Vol. 115, no 3, p. 1270-1273Article in journal (Refereed)
    Abstract [en]

    The audio signal from five professional baritones was analyzed by means of spectrum analysis. Each subject sang syllables [pae] and [pa] from loudest to softest phonation at fundamental frequencies representing 25%, 50%, and 75% of his total range. Ten subglottal pressures, equidistantly spaced between highest and lowest, were selected for analysis along with the corresponding production of the vowels. The levels of the first formant and singer's formant were measured as a function of subglottal pressure. Averaged across subjects, vowels, and F-0, a 10-dB increase at 600 Hz was accompanied by a 16-dB increase at 3 kHz.

  • 158.
    Skantze, Gabriel
    et al.
    KTH, Superseded Departments, Speech, Music and Hearing.
    Edlund, Jens
    KTH, Superseded Departments, Speech, Music and Hearing.
    Early error detection on word level2004In: Proceedings of ISCA Tutorial and Research Workshop (ITRW) on Robustness Issues in Conversational Interaction, 2004Conference paper (Refereed)
    Abstract [en]

    In this paper two studies are presented in which the detection of speech recognition errors on the word level was examined. In the first study, memory-based and transformation-based machine learning was used for the task, using confidence, lexical, contextual and discourse features. In the second study, we investigated which factors humans benefit from when detecting errors. Information from the speech recogniser (i.e. word confidence scores and 5-best lists) and contextual information were the factors investigated. The results show that word confidence scores are useful and that lexical and contextual (both from the utterance and from the discourse) features further improve performance.

  • 159.
    Skantze, Gabriel
    et al.
    KTH, Superseded Departments, Speech, Music and Hearing.
    Edlund, Jens
    KTH, Superseded Departments, Speech, Music and Hearing.
    Robust interpretation in the Higgins spoken dialogue system2004In: Proceedings of ISCA Tutorial and Research Workshop (ITRW) on Robustness Issues in Conversational Interaction, 2004Conference paper (Refereed)
    Abstract [en]

    This paper describes Pickering, the semantic interpreter developed in the Higgins project - a research project on error handling in spoken dialogue systems. In the project, the initial efforts are centred on the input side of the system. The semantic interpreter combines a rich set of robustness techniques with the production of deep semantic structures. It allows insertions and non-agreement inside phrases, and combines partial results to return a limited list of semantically distinct solutions. A preliminary evaluation shows that the interpreter performs well under error conditions, and that the built-in robustness techniques contribute to this performance.

  • 160.
    Spens, Karl-Erik
    et al.
    KTH, Superseded Departments, Speech, Music and Hearing.
    Agelfors, Eva
    KTH, Superseded Departments, Speech, Music and Hearing.
    Beskow, Jonas
    KTH, Superseded Departments, Speech, Music and Hearing.
    Granström, Björn
    KTH, Superseded Departments, Speech, Music and Hearing.
    Karlsson, Inger
    KTH, Superseded Departments, Speech, Music and Hearing.
    Salvi, Giampiero
    KTH, Superseded Departments, Speech, Music and Hearing.
    SYNFACE, a talking head telephone for the hearing impaired2004Conference paper (Refereed)
  • 161. Strangert, E.
    et al.
    Carlson, Rolf
    KTH, Superseded Departments, Speech, Music and Hearing.
    On the modelling and synthesis of conversational speech2004In: Nordic Prosody: Proceedings of the IXth Conference / [ed] Bruce, G.; Horne, M., Lund: Peter Lang: Frankfurt am Main , 2004, p. 255-264Conference paper (Refereed)
  • 162.
    Sundberg, Johan
    KTH, Superseded Departments, Speech, Music and Hearing.
    Musicians performance prosody2004In: Proceedings of Reading Symposium Music Language and Human Evolution, 2004Conference paper (Refereed)
    Abstract [en]

    Music and speech are specific to humans. In our time we have many opportunities to hear music that is interpreted and executed by machines rather than by living musicians. Such examples mostly sound quite pathologic, particularly for music from the classical western repertoire. This demonstrates the relevance of the performance to the musical listening experience. For many years a research group at the department of Speech Music Hearing, KTH has studied the reasons for the computer's shortcomings as a musician. Our method has mainly analysis-by-synthesis, i.e., we have the computer play music files on a synthesizer. A professional musician, the late violinist Lars Frydén, assessed the emerging performances and recommended how they could be improved. We implemented his recommendations as performance rules in the control program and then tested them on various music examples. After many years of such experiments we had a dozen or two performance rules. These performance rules significantly contribute to improving performance, and the reason for this is an interesting question. The rules seem to be of three types. One type, the grouping rules, serves the purpose of grouping, i.e., showing where the structural boundaries are in the composition and which tones belong together. Another type enhances the difference between musical categories such as note values or scale tones or intervals, e.g., by increasing the dissimilarities between them. A third type adds emphasis to unexpected tones and deemphasizes expected tones. It is thought-provoking that the principles of grouping, category enhancement and emphasis of the unexpected are not specific to music. They occur also in other types of communication, such as speech, architecture, and others. This suggests that they emerge from demands raised by the receiving system. For example it is tempting to speculate that emphasis by delayed arrival, common both in music and speech, and delaying the emphasised information somewhat, is appropriate because it allows the neural system to finish processing the old information before it starts with processing the emphasized and hence particularly important information. In any event it seems likely that music performance as well as speech is tailored to the human cognitive system and that a comparative study of these two examples of systematic interhuman communication by acoustic signals will contribute to the understanding of human perception and cognition.

  • 163.
    Sundberg, Johan
    KTH, Superseded Departments, Speech, Music and Hearing.
    The nasal tract as a resonator in singing: Some experimental findings2004In: Proceedings of the 2nd Intl Physiology and Acoustics of Singing Conference, 2004Conference paper (Refereed)
  • 164. Sundberg, Johan
    et al.
    Friberg, Anders
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Stopping running and stopping a piece of music. Comparing locomotion and music performance1996In: Proc of NAM 96, Nordic Acoustical Meeting / [ed] Riederer, K., & Lahti, T., 1996, p. 351-358Conference paper (Refereed)
  • 165.
    Sundberg, Johan
    et al.
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Friberg, Anders
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Frydén, Lars
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Common Secrets of Musicians and Listeners - An analysis-by-synthesis Study of Musical Performance1991In: Representing Musical Structure / [ed] Howell, P.; West, R.; Cross, I., London: Academic Press, 1991, p. 161-197Chapter in book (Refereed)
  • 166.
    Sundberg, Johan
    et al.
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Friberg, Anders
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Frydén, Lars
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Music and locomotion. a study of the perception of tones with level envelopes replicating force patterns of walking1992In: STL-QPSR, Vol. 33, no 4, p. 109-122Article in journal (Other academic)
    Abstract [en]

    Music listening ofien produces associations to locomotion. This suggests that some patterns in music are similar to those perceived during locomotion. The present investigation tests the hypothesis that the sound level envelope of tones allude to force patterns associated with walking and dancing. Six examples of such force patterns were recorded using a force platform, and the vertical components were translated from kg to dB and used as level envelopes for tones. Sequences of four copies of each of these tones were presented with four different fixed inter-onset times. Music students were asked to characterize these sequences in three tests. In one test, the subjects were free to use any expression, and the occurrence of motion words in the responses was examined. In another test, they were asked to describe, ifpossible, the motion characteristics of the sequences, and the number of blank responses were studied. In the third test, they were asked to describe the sequences along 24 motion adjective scales, and the responses were submitted to a factor analysis. The results from the three tests showed a reasonable degree of coherence, suggesting that associations to locomotions are likely to occur under these conditions, particularly when (1) the inter-onset time is similar to the inter-step time typical of walking, and (2) when the inter-onset time agreed with that observed when the gait patterns were recorded. The latter observation suggests that the different motion patterns thus translated to sound level envelopes also may convey information on the type of motion.

  • 167.
    Sundberg, Johan
    et al.
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Friberg, Anders
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Frydén, Lars
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Music and locomotion. Perception of tones with level envelopes replicating force patterns of walking1994In: Proc. of SMAC ’93, Stockholm Music Acoustics Conference, 1994, p. 136-141Conference paper (Refereed)
  • 168.
    Sundberg, Johan
    et al.
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Friberg, Anders
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Frydén, Lars
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Musicians’ and nonmusicians’ sensitivity to differences in music performance1988In: STL-QPSR, Vol. 29, no 4, p. 077-081Article in journal (Other academic)
    Abstract [en]

    A set of ordered context-dependent rules for the automatic transformation of a music score to the corresponding musical performance has been developed, using an analysis-by-synthesis method [Sundberg, J. (1987): "Computer synthesis of music performance," pp. 52-69 in (J. Sloboda, ed.) Generative Processes in Music, Clarendon, Oxford]. The rules are implemented in the LeLisp language on a Macintosh microcomputer that controls a synthesizer via a MIDI interface. The rules manipulate sound level, fundamental frequency, vibrato extent, and duration of the tones. The present experiment was carried out in order to find out if the sensitivity of these effects differed between musicians and nonrnusicians. Pairs of performances of the same examples were presented in different series, one for each rule. Between the pairs in a series, the performance differences were varied within wide limits and, in the first pair in each series, the difference was pat, so as to catch the subject's attention. Subjects were asked to decide whether the two performances were identical. The results showed that musicians had a clearly greater sensitivity. The pedagogical implications of this finding will be discussed. 

  • 169.
    Sundberg, Johan
    et al.
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Friberg, Anders
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Frydén, Lars
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Rules for automated performance of ensemble music1989In: Contemporary Music Review, ISSN 0749-4467, E-ISSN 1477-2256, Vol. 3, p. 89-109Article in journal (Refereed)
    Abstract [en]

    Recently developed parts of a computer program are presented that contain a rule system which automatically converts music scores to musical performance, and which, in a sense, can be regarded as a model of a musically gifted player. The development of the rule system has followed the analysis-by-synthesis strategy; various rules have been formulated according to the suggestions of a professional string quartet violinist and teacher of ensemble playing. The effects of various rules concerning synchronization and timing and also tuning, in performance of ensemble music are evaluated by a listening panel of professional musicians. Further support for the notion of melodic clzarge, previously introduced and playing a prominent rule in the performance rules, is found in a correlation with fine tuning of intervals. 

  • 170.
    Sundberg, Johan
    et al.
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Friberg, Anders
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Frydén, Lars
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Rules for automatized performance of ensemble music1987In: STL-QPSR, Vol. 28, no 4, p. 057-078Article in journal (Other academic)
    Abstract [en]

    Recently developed parts of a computer program are presented that contain a rule system which automatically converts music scores to musical performance, and which, in a sense, can be regarded as a model of a musically gifted player. The development of the rule system has followed the analysis-by-synthesis strategy; various rules have been formulated after having been suggested by a professional string quartet violinist and teacher of ensemble playing. The effects of various rules concerning synchronization and timing and, also, tuning in performance of ensemble music are evaluated by a listening panel of professional musicians. Further support for the notion of melodic charge, previously introduced and playing a prominent rule in the performance rules, is found in a correlation with fine tuning of intervals. 

  • 171.
    Sundberg, Johan
    et al.
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Friberg, Anders
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Frydén, Lars
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Threshold and preference Quantities of Rules for Music Performance1991In: Music perception, ISSN 0730-7829, E-ISSN 1533-8312, Vol. 9, no 1, p. 71-92Article in journal (Refereed)
    Abstract [en]

    In an analysis- by-synthesis investigation of music performance, rules have been developed that describe when and how expressive deviations are made from the nominal music notation in the score. Two experiments that consider the magnitudes of such deviations are described. In Experiment 1, the musicians' and nonmusicians' sensitivities to expressive deviations generated by seven performance rules are compared. The musicians showed a clearly greater sensitivity. In Experiment 2, professional musicians adjusted to their satisfaction the quantity by which six rules affected the performance. For most rules, there was a reasonable agreement between the musicians regarding preference. The preferred quantities seemed close to the threshold of perceptibility.

  • 172.
    Sundberg, Johan
    et al.
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Friberg, Anders
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Mathews, M. V.
    Bennett, G.
    Experiences of combining the radio baton with the director musices performance grammar2001In: MOSART project workshop on current research directions in computer music, 2001Conference paper (Refereed)
  • 173.
    Sundberg, Johan
    et al.
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Frydén, Lars
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Friberg, Anders
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Expressive aspects of instrumental and sung performance1994In: Proceedings  of the Symposium on Psychophysiology and Psychopathology of the Sense of Music / [ed] Steinberg, R., Heidelberg: Springer Berlin/Heidelberg, 1994Conference paper (Refereed)
  • 174. Sundberg, Johan
    et al.
    Frydén, Lars
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Friberg, Anders
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Expressive aspects of instrumental and vocal performance1995In: Music and the Mind Machine: Psychophysiology and Psychopathology of the Sense of Music / [ed] Steinberg, R., Heidelberg: Springer Berlin/Heidelberg, 1995Chapter in book (Other academic)
    Abstract [en]

    Several music computers can now convert an input note file to a sounding performance. Listening to such performances demonstrates convincingly the significance of the musicians’ contribution to music performance; when the music score is accurately replicated as nominally written, the music sounds dull and nagging. It is the musicians’ contributions that make the performance interesting. In other words, by deviating slightly from what is nominally written in the music score, the musicians add expressivity to the music.

  • 175.
    Sundberg, Johan
    et al.
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Frydén, Lars
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Friberg, Anders
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Music communication as studied by means of performance1991In: STL-QPSR, Vol. 32, no 1, p. 065-083Article in journal (Other academic)
    Abstract [en]

    This article presents an overview of a long-term research work with a rule system for the automatic performance of music. The performance rules produce deviations from the durations, sound levels, and pitches nominally specified in the music score. They can be classified according to their apparent musical function: to help the listener (I) in the differentiation of different pitch and duration categories and (2) in the grouping of the tones. Apart from this, some rules serve the purpose of organizing tuning and synchronization in ensemble performance. The rules reveal striking similarities between music performance and speech; for instance final lengthening occur in both and the acoustic code used for marking of emphasis are similar.

  • 176.
    Sundberg, Johan
    et al.
    KTH, Superseded Departments, Speech, Music and Hearing.
    Thalén, M.
    Alku, P.
    Vilkman, E.
    Estimating perceived phonatory pressedness in singing from flow glottograms2004In: Journal of Voice, ISSN 0892-1997, E-ISSN 1873-4588, Vol. 18, no 1, p. 56-62Article in journal (Refereed)
    Abstract [en]

    The normalized amplitude quotient (NAQ), defined as the ratio between the peak-to-peak amplitude of the flow pulse and the negative peak amplitude of the differentiated flow glottogram and normalized with respect to period time, has been shown to be related to glottal adduction. Glottal adduction, in turn, affects mode of phonation and hence perceived phonatory pressedness. The relationship between NAQ and perceived phonatory pressedness was analyzed in a material collected from a professional female singer and singing teacher who sang a triad pattern in breathy, flow, neutral, and pressed phonation in three different loudness conditions (soft, middle, loud). In addition, she also sang the same triad pattern in four different styles of singing, classical, pop, jazz, and blues, in the same three loudness conditions. A panel of experts rated the degree of perceived phonatory press along visual analogue scales. Comparing the obtained mean rated pressedness ratings with the mean NAQ values for the various triads showed that about 73% of the variation in perceived pressedness could be accounted for by variations of NAQ.

  • 177.
    Svedman, Patrick
    et al.
    KTH.
    Wilson, Sarah Kate
    Cimini, Leonard J., Jr.
    Ottersten, Björn
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Opportunistic beamforming and scheduling for OFDMA systems (vol 55, pg 941, 2007)2007In: IEEE Transactions on Communications, ISSN 0090-6778, E-ISSN 1558-0857, Vol. 55, no 6, p. 1266-1266Article in journal (Refereed)
  • 178.
    Ternström, Sten
    KTH, Superseded Departments, Speech, Music and Hearing.
    Preferred self-to-other ratios in choir singing1999In: Journal of the Acoustical Society of America, ISSN 0001-4966, E-ISSN 1520-8524, Vol. 105, no 6, p. 3563-3574Article in journal (Refereed)
    Abstract [en]

    Choir singers need to hear their own voice in an adequate self-to-other ratio (SOR) over the rest ofthe choir. Knowing singers’ preferences for SOR could facilitate the design of stages and of choral formations. In an experiment to study the preferred SOR, subjects sang sustained vowels together with synthesized choir sounds, whose loudness tracked that of their own voice. They could control the SOR simply by changing their distance to the microphone. At the most comfortable location, the SOR was measured. Experimental factors included unison and four-part tasks, three vowels and two levels of phonation frequency. The same experiment was run four times, using sopranos, altos, tenors, and basses, with stimulus tones adapted for each category. The preferred self-to-other ratios were found to be similar to SORs measured previously in actual performance, if a little higher. Preferences were quite narrow, typically +/-2 dB for each singer, but very different from singer to singer, with intrasubject means ranging from -1 to +15 dB. There was no significant difference between the unison and the four-part tasks, although this might have been caused by systematic differences in the stimulus sounds. Some effects of phonation frequency and vowel were significant, but interdependent and difficult to interpret. The results and their relevance to live choir singing are discussed.

  • 179.
    Ternström, Sten
    et al.
    KTH, Superseded Departments, Speech, Music and Hearing.
    Andersson, Marie
    Scandinavian College of Manual Medicine.
    Bergman, Ulrika
    Scandinavian College of Manual Medicine.
    An effect of body massage on voice loudness and phonation frequency in reading2000In: Logopedics, Phoniatrics, Vocology, ISSN 1401-5439, E-ISSN 1651-2022, Vol. 25, no 4, p. 146-151Article in journal (Refereed)
  • 180.
    Ternström, Sten
    et al.
    KTH, Superseded Departments (pre-2005), Speech Transmission and Music Acoustics. KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Friberg, Anders
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Analysis and simulation of small variations in the fundamental frequency of sustained vowels1989In: STL-QPSR, Vol. 30, no 3, p. 001-014Article in journal (Other academic)
  • 181.
    Ternström, Sten
    et al.
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Friberg, Anders
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Sundberg, Johan
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Monteverdi’s vespers. A case study in music synthesis1988In: STL-QPSR, Vol. 29, no 2-3, p. 093-105Article in journal (Other academic)
    Abstract [en]

    The article describes the methods used in synthesizing a performance of the first movement of Monteverdi's Vespers from 1610. The synthesis combines results from studies of singing voice acoustics, ensemble acoustics, and rules for music performance. The emphasis is on the synthesis of choir sounds.

  • 182.
    Ternström, Sten
    et al.
    KTH, Superseded Departments, Speech, Music and Hearing.
    Howard, D.
    Synthesizing singing: What's the buzz?2004In: Proceedings of the 2nd Intl Physiology and Acoustics of Singing Conference, 2004Conference paper (Other academic)
  • 183.
    Ternström, Sten
    et al.
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Sundberg, Johan
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Friberg, Anders
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Synthesizing choir singing1988In: Journal of Voice, ISSN 0892-1997, E-ISSN 1873-4588, Vol. 1, no 4, p. 332-335Article in journal (Refereed)
    Abstract [en]

    Analysis by synthesis is a method that has been successfully applied in many areas of scientific research. In speech research, it has proven to be an excellent tool for identifying perceptually relevant acoustical properties of sounds. This paper reports on some first attempts at synthesizing choir singing, the aim being to elucidate the importance of factors such as the frequency scatter in the fundamental and the formants. The presentation relies heavily on sound examples.

  • 184. Thompson, W. F.
    et al.
    Friberg, Anders
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Frydén, Lars
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Sundberg, Johan
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Evaluating rules for the synthetic performance of melodies1986In: STL-QPSR, Vol. 27, no 2-3, p. 027-044Article in journal (Other academic)
    Abstract [en]

    Starting from a text-to-speech conversion program (Carlson & Granstrom, 1975), a note-to-tone conversion program has been developed (!Xmdberg & ~rydh, 1985). It works with a set of ordered rules af fe&- ing the performance of melodies written into the computer. Depending on the musical context, each of these rules manipulates various tone parameters, such as sound level, fundamental frequency, duration, etc. In the present study the effect of some of the rules developed so far on the musical quality of the performance is tested; various musical excerpts perbrmed according to different combinations an5 versions of nine performance rules were played to musically trained listeners who rated the musical quality. The results support the assumption that the musical quality of the performance is improved by applying the rules. 

  • 185. Thompson, W. F.
    et al.
    Sundberg, Johan
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Friberg, Anders
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Frydén, Lars
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    The Use of Rules for Expression in the Performance of Melodies1989In: Psychology of Music, ISSN 0305-7356, E-ISSN 1741-3087, Vol. 17, p. 63-82Article in journal (Refereed)
  • 186. Welch, G.F.
    et al.
    Sergeant, D.
    White, Peta
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    The “threat” to the cathedral choir tradition: an empirical study of gender differences in singing voices of trained cathedral choristers1995In: Proceedings of the DGM and ESCOM 1995 Conference / [ed] G. Kleinen, 1995, p. 77-79Conference paper (Refereed)
  • 187.
    White, Peta
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    A study of the effects of vocal intensity variation on children’s voices using long-term average spectrum (LTAS) analysis1997In: TMH-QPSR, ISSN 1104-5787, Vol. 38, no 1, p. 119-131Article in journal (Other academic)
    Abstract [en]

    It has been well documented in adult studies that, as overall vocal intensity increases, the resulting increase in partials is greater in higher than in lower frequencies. Investigations involving children’s normal productions are uncommon however, and there is, as a consequence, little knowledge of how children’s vocal function differs from that of adults. Using long term average spectrum (LTAS) analysis, this study investigates the effects of vocal intensity variation on the voices of fifteen schoolchildren aged 10 years, singing in soft, mid and loud voice. Mean amplitudes, dynamic range, and gain in each frequency band were calculated, and means are presented as normative data for children’s vocal productions. Observed systematic effects of vocal loudness as well as male-female differences in the averaged spectra are discussed, and comparisons with adult data made.

  • 188.
    White, Peta
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing. KTH Voice Research Centre.
    Formant frequency analysis of children's spoken and sung vowels using sweeping fundamental frequency production1999In: Journal of Voice, ISSN 0892-1997, E-ISSN 1873-4588, Vol. 13, no 4, p. 570-582Article in journal (Refereed)
    Abstract [en]

    High-pitched productions present difficulties in formant frequency analysis due to wide harmonic spacing and poorly defined formants. As a consequence, there is little reliable data regarding children's spoken or sung vowel formants. Twenty-nine 11-year-old Swedish children were asked to produce 4 sustained spoken and sung vowels. In order to circumvent the problem of wide harmonic spacing, F-1 and F-2 measurements were taken from vowels produced with a sweeping F-0. Experienced choir singers were selected as subjects in order to minimize the larynx height adjustments associated with pitch variation in less skilled subjects. Results showed significantly higher formant frequencies for speech than for singing. Formants were consistently higher in girls than in boys suggesting longer vocal tracts in these preadolescent bays. Furthermore, formant scaling demonstrated vowel dependent differences between boys and girls suggesting non-uniform differences in male and female vocal tract dimensions. These vowel-dependent sex differences were not consistent with adult data.

  • 189.
    White, Peta
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Formant frequency analysis of children’s spoken and sung vowels using sweeping fundamental frequency production1998In: TMH-QPSR, ISSN 1104-5787, Vol. 1-2, p. 43-52Article in journal (Other academic)
    Abstract [en]

    High-pitched productions present difficulties in formant frequency analysis due to wide harmonic spacing and poorly defined formants. As a consequence, there is little reliable data regarding children’s spoken or sung vowel formants. In order to circumvent the problem of wide harmonic spacing, 29 11-year-old Swedish children were asked to produce four sustained spoken and sung vowels with a sweeping F0. F1 and F2 measurements were taken. Experienced choir singers were used as subjects in order to minimise the larynx height adjustments associated with pitch variation in less-skilled subjects. Results showed significantly higher formant frequencies for speech than for singing. Formants were consistently higher in females than in males suggesting longer vocal tracts in these preadolescent boys. Furthermore, formant scaling demonstrated vowel-dependent differences between boys and girls suggesting non-uniform differences in male and female vocal tract dimensions. These vowel-dependent sex differences were not consistent with adult data.

  • 190.
    White, Peta
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Long-term average spectrum (LTAS) analysis of developmental changes in children's voices2000In: TMH-QPSR, ISSN 1104-5787, Vol. 41, no 3, p. 85-88Article in journal (Other academic)
    Abstract [en]

    Long­term average spectrum (LTAS) analysis has been found to offer representative information on voice timbre. It provides spectral information averaged over a period of time and is particularly useful when persistent spectral features are under investigation. The aim of this study was to compare perceived and actual sex of the recorded voices of children to the LTAS characteristics. A total of 320 children, 20 boys and 20 girls in each of eight age groups (range 3 to 12 years), were recorded singing a nursery rhyme. In an earlier analysis, the recorded voices were evaluated with respect to perceived sex by expert listeners. Mean LTAS analysis for boys and girls groups revealed a peak at 5 kHz for children consistently perceived as boys (whether male or female in actuality), and a flat spectrum at 5 kHz for children consistently perceived as girls.

  • 191.
    White, Peta
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Some acoustic measurements of children’s voiced and whispered vowels1994In: Voice: Journal of the British Voice Association, ISSN 0966-789X, Vol. 4, no 1, p. 1-14Article in journal (Refereed)
  • 192.
    White, Peta
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Voice source and formant frequencies in 11-year-old girls and boys2000In: Child Voice / [ed] White, P., KTH Royal Institute of Technology, 2000, p. 13-26Chapter in book (Other academic)
  • 193.
    White, Peta
    et al.
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Sundberg, Johan
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Spectrum effects of subglottal pressure variation in professional baritone singers2000In: TMH-QPSR, ISSN 1104-5787, Vol. 41, p. 29-32Article in journal (Other academic)
    Abstract [en]

    The audio signal from five professional operatic baritone singers was analysed by means of spectrum analysis. Each subject sang a sustained diminuendo, from loudest to softest phonation, three times on the vowels [a:] and [ä:] at fundamental frequencies representing 25%, 50% and 75% of his total pitch range as measured in semitones. During the diminuendi the subjects repeatedly inserted the consonant [p] so that associated subglottal pressures could be estimated from the oral pressure during [p]­occlusions. Pooling the three takes of each condition, ten subglottal pressures (PS), equidistantly spaced between highest and lowest, were selected for analysis along with the corresponding production of [a:] and [ä:] vowels. The levels of the first formant and the singer’s formant, L1 and LSF, were measured as a function of increasing subglottal pressure. Averaged across subjects, an increase in PS resulted in (a) an increase in L1 and (b) a decrease in L1-LSF. This implies that a 10 dB increase at or near 600 Hz was, on average, accompanied by an increase of 17 dB of the level near 3 kHz.

  • 194.
    White, Peta
    et al.
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Welch, GF
    A laryngographic study of the speaking and singing voices of young children1992In: Proceedings of the Institute of Acoustics 1992 Conference / [ed] R. Lawrence, 1992, p. 225-231Conference paper (Refereed)
  • 195.
    White Sjölander, Peta
    KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
    Long-term average spectrum (LTAS) analysis of sex- and gender-related differences in children’s voices2001In: Logopedics, Phoniatrics, Vocology, ISSN 1401-5439, E-ISSN 1651-2022, Vol. 26, no 3, p. 97-101Article in journal (Refereed)
    Abstract [en]

    Long-term average spectrum (LTAS) analysis offers representative information on voice timbre providing spectral information averaged over time. It is particularly useful when persistent spectral features are under investigation. The aim of this study was to compare perceived sex of children to the LTAS analysis of their audio signals. A total of 320 children, aged between 3 and 12 years, were recorded singing a song. In an earlier analysis, the recorded voices were evaluated with respect to perceived and actual sex by experienced listeners. From this group, a subgroup of 59 children (30 boys and 29 girls) was selected. The mean LTAS revealed a peak at 5 kHz for children perceived with confidence as boys, and a flat spectrum at 5 kHz for children perceived confidently as girls (whether male or female in actuality).

  • 196.
    Wik, Preben
    KTH, Superseded Departments, Speech, Music and Hearing.
    Designing a virtual language tutor2004In: Proc of The XVIIth Swedish Phonetics Conference, Fonetik 2004, 2004, p. 136-139Conference paper (Other academic)
    Abstract [en]

    This paper gives an overview of some of the choices that have been considered in the process of designing a virtual language tutor, and the direction we have decided to take based on these choices.

  • 197. Zangger Borch, D.
    et al.
    Sundberg, Johan
    KTH, Superseded Departments, Speech, Music and Hearing.
    Lindestad, P.
    Thalén, M.
    Vocal fold vibration and voice source aperiodicity in "dist" tones: a study of a timbral ornament in rock singing2004In: Logopedics, Phoniatrics, Vocology, ISSN 1401-5439, E-ISSN 1651-2022, Vol. 29, no 4, p. 147-153Article in journal (Refereed)
    Abstract [en]

    The acoustic characteristics of so-called 'dist' tones, commonly used in singing rock music, are analyzed in a case study. In an initial experiment a professional rock singer produced examples of 'dist' tones. The tones were found to contain aperiodicity, SPL at 0.3 m varied between 90 and 96 dB, and subglottal pressure varied in the range of 20-43 cm H2O, a doubling yielding, on average, an SPL increase of 2.3 dB. In a second experiment, the associated vocal fold vibration patterns were recorded by digital high-speed imaging of the same singer. Inverse filtering of the simultaneously recorded audio signal showed that the aperiodicity was caused by a low frequency modulation of the flow glottogram pulse amplitude. This modulation was produced by an aperiodic or periodic vibration of the supraglottic mucosa. This vibration reduced the pulse amplitude by obstructing the airway for some of the pulses produced by the apparently periodically vibrating vocal folds. The supraglottic mucosa vibration can be assumed to be driven by the high airflow produced by the elevated subglottal pressure.

  • 198.
    Zetterholm, Elisabeth
    et al.
    Department of Philosophy & Linguistics, Umeå University.
    Blomberg, Mats
    KTH, Superseded Departments, Speech, Music and Hearing.
    Elenius, Daniel
    KTH, Superseded Departments, Speech, Music and Hearing.
    A comparison between human perception and a speaker verification system score of a voice imitation2004In: Proc of Tenth Australian International Conference on Speech Science & Technology, 2004, p. 393-397Conference paper (Refereed)
    Abstract [en]

    A professional impersonator has been studied when training his voice tomimic two target speakers. A three-fold investigation has been conducted; acomputer-based speaker verification system was used, phonetic-acousticmeasurements were made and a perception test was conducted. Our ideabehind using this type of system is to measure how close to the target voice aprofessional impersonation might be able to reach and to relate this tophonetic-acoustic analyses of the mimic speech and human perception. Thesignificantly increased verification scores and the phonetic-acoustic analysesshow that the impersonator really changes his natural voice and speech in hisimitations. The results of the perception test show that there is no, or only asmall, correlation between the verification system and the listeners whenestimating the voice imitations and how close they are to one of the targetspeakers.

  • 199.
    Zhang, Xi
    et al.
    KTH, Superseded Departments, Signals, Sensors and Systems.
    Ottersten, Björn
    KTH, Superseded Departments, Speech, Music and Hearing.
    Power allocation and bit loading for spatial multiplexing in MIMO systems2003In: 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS - SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO AND ELECTROACOUSTICS MULTIMEDIA SIGNAL PROCESSING, NEW YORK: IEEE , 2003, p. 53-56Conference paper (Refereed)
    Abstract [en]

    The power assignment problem is important for Multiple-Input-Multiple-Output (MIMO) systems to achieve high capacity. Although this problem is solved by well-known water filling algorithms, this does not provide an optimal solution if the system is constrained to a fixed raw bit error rate threshold and to discrete modulation orders. In this work an approximate approach, called QoS based WF, is proposed to solve the power assignment problem with such constrains. It is shown to outperform quantization of the conventional water filling solution and a well known bit loading algorithm (Chow's algorithm) used in the Digital Subscriber Lines (DSL).

  • 200. Öhlin, David
    et al.
    Carlson, Rolf
    KTH, Superseded Departments, Speech, Music and Hearing.
    Data-driven formant synthesis2004In: Proceedings FONETIK 2004: The XVIIth Swedish Phonetics Conference / [ed] Peter Branderud, Hartmut Traunmüller, Stockholm University, 2004, p. 160-163Conference paper (Other academic)
12345 151 - 200 of 201
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf