Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Accent clustering in Swedish using the Bhattacharyya distance
KTH, Tidigare Institutioner                               , Tal, musik och hörsel.ORCID-id: 0000-0002-3323-5311
2003 (Engelska)Ingår i: Proceedings of the 15th International Congress of Phonetic Sciences (ICPhS), Barcelona Spain, 2003, s. 1149-1152Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

In an attempt to improve automatic speech recognition(ASR) models for Swedish, accent variations wereconsidered. These have proved to be important variablesin the statistical distribution of the acoustic featuresusually employed in ASR. The analysis of featurevariability have revealed phenomena that are consistentwith what is known from phonetic investigations,suggesting that a consistent part of the informationabout accents could be derived form those features. Agraphical interface has been developed to simplify thevisualization of the geographical distributions of thesephenomena.

Ort, förlag, år, upplaga, sidor
2003. s. 1149-1152
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
URN: urn:nbn:se:kth:diva-6153OAI: oai:DiVA.org:kth-6153DiVA, id: diva2:10782
Anmärkning
QC 20100630Tillgänglig från: 2006-09-21 Skapad: 2006-09-21 Senast uppdaterad: 2018-01-13Bibliografiskt granskad
Ingår i avhandling
1. Mining Speech Sounds: Machine Learning Methods for Automatic Speech Recognition and Analysis
Öppna denna publikation i ny flik eller fönster >>Mining Speech Sounds: Machine Learning Methods for Automatic Speech Recognition and Analysis
2006 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

This thesis collects studies on machine learning methods applied to speech technology and speech research problems. The six research papers included in this thesis are organised in three main areas.

The first group of studies were carried out within the European project Synface. The aim was to develop a low latency phonetic recogniser to drive the articulatory movements of a computer generated virtual face from the acoustic speech signal. The visual information provided by the face is used as hearing aid for persons using the telephone.

Paper A compares two solutions to the problem of mapping acoustic to visual information that are based on regression and classification techniques. Recurrent Neural Networks are used to perform regression while Hidden Markov Models are used for the classification task. In the second case the visual information needed to drive the synthetic face is obtained by interpolation between target values for each acoustic class. The evaluation is based on listening tests with hearing impaired subjects were the intelligibility of sentence material is compared in different conditions: audio alone, audio and natural face, audio and synthetic face driven by the different methods.

Paper B analyses the behaviour, in low latency conditions, of a phonetic recogniser based on a hybrid of Recurrent Neural Networks (RNNs) and Hidden Markov Models (HMMs). The focus is on the interaction between the time evolution model learnt by the RNNs and the one imposed by the HMMs.

Paper C investigates the possibility of using the entropy of the posterior probabilities estimated by a phoneme classification neural network, as a feature for phonetic boundary detection. The entropy and its time evolution are analysed with respect to the identity of the phonetic segment and the distance from a reference phonetic boundary.

In the second group of studies, the aim was to provide tools for analysing large amount of speech data in order to study geographical variations in pronunciation (accent analysis).

Paper D and Paper E use Hidden Markov Models and Agglomerative Hierarchical Clustering to analyse a data set of about 100 millions data points (5000 speakers, 270 hours of speech recordings). In Paper E, Linear Discriminant Analysis was used to determine the features that most concisely describe the groupings obtained with the clustering procedure.

The third group belongs to studies carried out during the international project MILLE (Modelling Language Learning) that aims at investigating and modelling the language acquisition process in infants.

Paper F proposes the use of an incremental form of Model Based Clustering to describe the unsupervised emergence of phonetic classes in the first stages of language acquisition. The experiments were carried out on child-directed speech expressly collected for the purposes of the project

Ort, förlag, år, upplaga, sidor
Stockholm: KTH, 2006. s. xix, 87
Serie
Trita-CSC-A, ISSN 1653-5723 ; 2006:12
Nyckelord
speech, machine learning, data mining, signal processing
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:kth:diva-4111 (URN)91-7178-446-2 (ISBN)
Disputation
2006-10-06, F3, Sing Sing, Lindstedtsvägen 26, Stockholm, 13:00
Opponent
Handledare
Anmärkning
QC 20100630Tillgänglig från: 2006-09-21 Skapad: 2006-09-21 Senast uppdaterad: 2018-01-13Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

CiteSeerX

Personposter BETA

Salvi, Giampiero

Sök vidare i DiVA

Av författaren/redaktören
Salvi, Giampiero
Av organisationen
Tal, musik och hörsel
Datavetenskap (datalogi)

Sök vidare utanför DiVA

GoogleGoogle Scholar

urn-nbn

Altmetricpoäng

urn-nbn
Totalt: 55 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf