Change search
Refine search result
1 - 7 of 7
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Jande, Per-Anders
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Annotating Speech Data for Pronunciation Variation Modelling2005In: Proceedings, FONETIK 2005, Göteborg, Sweden, 2005, p. 25-27Conference paper (Other academic)
    Abstract [en]

    This paper describes methods for annotating recorded speech with information hypothesised to be important for the pronunciation of words in discourse context. Annotation is structured into six hierarchically ordered tiers, each tier corresponding to a segmentally defined linguis-tic unit. Automatic methods are used to seg-ment and annotate the respective annotation tiers. Decision tree models trained on annota-tion from elicited monologue showed a pho-neme error rate of 9.91%, corresponding to a 55.25% error reduction compared to using a canonical pronunciation representation from a lexicon for estimating the phonetic realisation.

  • 2.
    Jande, Per-Anders
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Inducing Decision Tree Pronunciation Variation Models from Annotated Speech Data2005In: 9th European Conference on Speech Communication and Technology, Lisbon, Portugal, 2005, p. 1945-1948Conference paper (Refereed)
    Abstract [en]

    Amodelofpronunciationof words in discourse context has been induced from the annotation of a spoken language corpus. The information included in the annotation is a set of variables hypothesised to be important for thepronunciationof words in discourse context. The annotation is connected to segmentally defined units on tiers corresponding to linguistically relevant units: the discourse, the utterance, the phrase, the word, the syllable and the phoneme. Themodelis represented as atreestructure, making it transparent for analysis and easy to use in aspeechsynthesis system. Using phonemic canonicalpronunciationrepresentations to estimate the segmental string of theannotateddatagives a 22.1% phone error rate.Decisiontreepronunciationvariationmodelsgenerated in a tenfold cross validation procedure showed an average phone error rate of 9.9%. Using multiple context variables for modellingpronunciationvariationcould thus reduce the error rate by 55%, compared to a baseline using canonicalpronunciationrepresentations.

  • 3.
    Jande, Per-Anders
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Integrating Linguistic Information from Multiple Sources in Lexicon Development and Spoken Language Annotation2006In: Proceedings of the LREC workshop on merging and layering linguistic information, Genua, Italy, 2006, p. 1-8Conference paper (Refereed)
  • 4.
    Jande, Per-Anders
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Modelling Phone-Level Pronunciation in Discourse Context2006Doctoral thesis, monograph (Other scientific)
    Abstract [en]

    Analytic knowledge about the systematic variation in a language has an important place in the description of the language. Such knowledge is interesting e.g. in the language teaching domain, as a background for various types of linguistic studies, and in the development of more dynamic speech technology applications. In previous studies, the effects of single variables or relatively small groups of related variables on the pronunciation of words have been studied separately. The work described in this thesis takes a holistic perspective on pronunciation variation and focuses on a method for creating general descriptions of phone-level pronunciation in discourse context. The discourse context is defined by a large set of linguistic attributes ranging from high-level variables such as speaking style, down to the articulatory feature level. Models of phone-level pronunciation in the context of a discourse have been created for the central standard Swedish language variety. The models are represented in the form of decision trees, which are readable for both machines and humans. A data-driven approach was taken for the pronunciation modelling task, and the work involved the annotation of recorded speech with linguistic and related information. The decision tree models were induced from the annotation. An important part of the work on pronunciation modelling was also the development of a pronunciation lexicon for Swedish. In a cross-validation experiment, several sets of pronunciation models were created with access to different parts of the attributes in the annotation. The prediction accuracy of pronunciation models could be improved by 42.2% by making information from layers above the phoneme level accessible during model training. Optimal models were obtained when attributes from all layers of annotation were used. The goal for the models was to produce pronunciation representations representative for the language variety and not necessarily for the individual speakers, on whose speech the models were trained. In the cross-validation experiment, model-produced phone strings were compared to key phonetic transcripts of actual speech, and the phone error rate was defined as the share of discrepancies between the respective phone strings. Thus, the phone error rate is the sum of actual errors and discrepancies resulting from desired adaptations from a speaker-specific pronunciation to a pronunciation reflecting general traits of the language variety. The optimal models gave an average phone error rate of 8.2%.

  • 5.
    Jande, Per-Anders
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Modelling Pronunciation in Discourse Context2006In: Lund University, Centre for Languages and Literature, General Linguistics, Phonetics, Working Papers, 52. 2006, Proceedings from Fonetik 2006, Lund, June 7–9, 2006 / [ed] Gilbert Ambrazaitis and Susanne Schötz, Lund, Sweden, 2006, p. 69-72Conference paper (Other academic)
    Abstract [en]

    This paper describes a method for modelling phone-level pronunciation in discourse context.Spoken language is annotated with linguistic and related information in several layers. Theannotation serves as a description of the discourse context and is used as training data fordecision tree model induction. In a cross validation experiment, the decision tree pronunciationmodels are shown to produce a phone error rate of 8.1% when trained on all availabledata. This is an improvement by 60.2% compared to using a phoneme string compiled fromlexicon transcriptions for estimating phone-level pronunciation and an improvement by42.6% compared to using decision tree models trained on phoneme layer attributes only.

  • 6.
    Jande, Per-Anders
    KTH, Superseded Departments, Speech, Music and Hearing.
    Pronunciation variation modelling using decision tree induction from multiple linguistic parameters2004In: Proceedings of Fonetik, Stockholm, Sweden, 2004, p. 12-15Conference paper (Other academic)
  • 7.
    Jande, Per-Anders
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Spoken language annotation and data-driven modelling of phone-level pronunciation in discourse context2008In: Speech Communication, ISSN 0167-6393, E-ISSN 1872-7182, Vol. 50, no 2, p. 126-141Article in journal (Refereed)
    Abstract [en]

    A detailed description of the discourse context of a word can be used for predicting word pronunciation in discourse context and also enables studies of the interplay between various types of information on e.g. phone-level pronunciation. The work presented in this paper is aimed at modelling systematic variation in the phone-level realisation of words inherent to a language variety. A data-driven approach based on access to detailed discourse context descriptions is used. The discourse context descriptions are constructed through annotation of spoken language with a large variety of linguistic and related variables in multiple layers. Decision tree pronunciation models are induced from the annotation. The effects of using different types and different amounts of information for model induction are explored. Models generated in a tenfold cross-validation experiment produce on average 8.2% errors on the phone level when they are trained on all available information. Models trained on phoneme level information only have an average phone error rate of 14.2%. This means that including information above the phoneme level in the context description can improve model performance by 42.2%.

1 - 7 of 7
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf