Change search
Refine search result
78910 451 - 472 of 472
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 451.
    Vanhainen, Niklas
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Salvi, Giampiero
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Free Acoustic and Language Models for Large Vocabulary Continuous Speech Recognition in Swedish2014In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), 2014Conference paper (Refereed)
    Abstract [en]

    This paper presents results for large vocabulary continuous speech recognition (LVCSR) in Swedish. We trained acoustic models on the public domain NST Swedish corpus and made them freely available to the community. The training procedure corresponds to the reference recogniser (RefRec) developed for the SpeechDat databases during the COST249 action. We describe the modifications we made to the procedure in order to train on the NST database, and the language models we created based on the N-gram data available at the Norwegian Language Council. Our tests include medium vocabulary isolated word recognition and LVCSR. Because no previous results are available for LVCSR in Swedish, we use as baseline the performance of the SpeechDat models on the same tasks. We also compare our best results to the ones obtained in similar conditions on resource rich languages such as American English. We tested the acoustic models with HTK and Julius and plan to make them available in CMU Sphinx format as well in the near future. We believe that the free availability of these resources will boost research in speech and language technology in Swedish, even in research groups that do not have resources to develop ASR systems.

  • 452.
    Vanhainen, Niklas
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Salvi, Giampiero
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Pattern Discovery in Continuous Speech Using Block Diagonal Infinite HMM2014Conference paper (Refereed)
    Abstract [en]

    We propose the application of a recently introduced inference method, the Block Diagonal Infinite Hidden Markov Model (BDiHMM), to the problem of learning the topology of a Hidden Markov Model (HMM) from continuous speech in an unsupervised way. We test the method on the TiDigits continuous digit database and analyse the emerging patterns corresponding to the blocks of states inferred by the model. We show how the complexity of these patterns increases with the amount of observations and number of speakers. We also show that the patterns correspond to sub-word units that constitute stable and discriminative representations of the words contained in the speech material.

  • 453.
    Vanhainen, Niklas
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Salvi, Giampiero
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Word Discovery with Beta Process Factor Analysis2012In: 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012, Vol 1, 2012, p. 798-801Conference paper (Refereed)
    Abstract [en]

    We propose the application of a recently developed non-parametric Bayesian method for factor analysis to the problem of word discovery from continuous speech. The method, based on Beta Process priors, has a number of advantages compared to previously proposed methods, such as Non-negative Matrix Factorisation (NMF). Beta Process Factor Analysis (BPFA) is able to estimate the size of the basis, and therefore the number of recurring patterns, or word candidates, found in the data. We compare the results obtained with BPFA and NMF on the TIDigits database, showing that our method is capable of not only finding the correct words, but also the correct number of words. We also show that the method can infer the approximate number of words for different vocabulary sizes by testing on randomly generated sequences of words.

  • 454. Ward, N. G.
    et al.
    Werner, S. D.
    Novick, D. G.
    Shriberg, E. E.
    Oertel, Catharine
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Morency, L. -P
    Kawahara, T.
    The similar segments in social speech task2013In: CEUR Workshop Proceedings, 2013, Vol. 1043Conference paper (Refereed)
    Abstract [en]

    Similar Segments in Social Speech was one of the Brave New Tasks at MediaEval 2013. The task involves finding segments similar to a query segment, in a multimedia collection of informal, unstructured dialogs among members of a small community.

  • 455.
    Wik, Preben
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    The Virtual Language Teacher: Models and applications for language learning using embodied conversational agents2011Doctoral thesis, monograph (Other academic)
    Abstract [en]

    This thesis presents a framework for computer assisted language learning using a virtual language teacher. It is an attempt at creating, not only a new type of language learning software, but also a server-based application that collects large amounts of speech material for future research purposes.The motivation for the framework is to create a research platform for computer assisted language learning, and computer assisted pronunciation training.Within the thesis, different feedback strategies and pronunciation error detectors are exploredThis is a broad, interdisciplinary approach, combining research from a number of scientific disciplines, such as speech-technology, game studies, cognitive science, phonetics, phonology, and second-language acquisition and teaching methodologies.The thesis discusses the paradigm both from a top-down point of view, where a number of functionally separate but interacting units are presented as part of a proposed architecture, and bottom-up by demonstrating and testing an implementation of the framework.

  • 456.
    Wik, Preben
    et al.
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT. KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Engwall, Olov
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT. KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Can visualization of internal articulators support speech perception?2008In: INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2008, p. 2627-2630Conference paper (Refereed)
    Abstract [en]

    This paper describes the contribution to speech perception given by animations of intra-oral articulations. 18 subjects were asked to identify the words in acoustically degraded sentences in three different presentation modes: acoustic signal only, audiovisual with a front view of a synthetic face and an audiovisual with both front face view and a side view, where tongue movements were visible by making parts of the cheek transparent. The augmented reality side-view did not help subjects perform better overall than with the front view only, but it seems to have been beneficial for the perception of palatal plosives, liquids and rhotics, especially in clusters. The results indicate that it cannot be expected that intra-oral animations support speech perception in general, but that information on some articulatory features can be extracted. Animations of tongue movements have hence more potential for use in computer-assisted pronunciation and perception training than as a communication aid for the hearing-impaired.

  • 457.
    Wik, Preben
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
    Engwall, Olov
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
    Looking at tongues – can it help in speech perception?2008In: Proceedings The XXIst Swedish Phonetics Conference, FONETIK 2008, 2008, p. 57-60Conference paper (Other academic)
    Abstract [en]

    This paper describes the contribution to speech perception given by animations of intra-oral articulations. 18 subjects were asked to identify the words in acoustically degraded sentences in three different presentation modes: acoustic signal only, audiovisual with a front view of a synthetic face and an audiovisual with both front face view and a side view, where tongue movements were visible by making parts of the cheek transparent. The augmented reality sideview did not help subjects perform better overall than with the front view only, but it seems to have been beneficial for the perception of palatal plosives, liquids and rhotics, especially in clusters.

  • 458.
    Wik, Preben
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Granström, Björn
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Att lära sig språk med en virtuell lärare2007In: Från Vision till praktik, språkutbildning och informationsteknik / [ed] Patrik Svensson, Härnösand: Myndigheten för nätverk och samarbete inom högre utbildning , 2007, p. 51-70Chapter in book (Refereed)
  • 459.
    Wik, Preben
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Granström, Björn
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Simicry: A mimicry-feedback loop for second language learning2010In: Proceedings of Second Language Studies: Acquisition, Learning, Education and Technology, 2010Conference paper (Refereed)
    Abstract [en]

    This paper introduces the concept of Simicry, defined as similarityof mimicry, for the purpose of second language acquisition.We apply this method using a computer assisted languagelearning system called Ville on foreign students learningSwedish. The system deploys acoustic similarity measures betweennative and non-native pronunciation, derived from durationsyllabicity and pitch. The system uses these measures togive pronunciation feedback in a mimicry-feedback loop exercisewhich has two variants: a ’say after me’ mimicry exercise,and a ’shadow with me’ exercise.The answers of questionnaires filled out by students afterseveral training sessions spread over a month, show that thelearning and practicing procedure has a promising potential beingvery useful and fun.

  • 460.
    Wik, Preben
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
    Hincks, Rebecca
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Language and Communication.
    Hirschberg, Julia
    Department of Computer Science, Columbia University, USA.
    Responses to Ville: A virtual language teacher for Swedish2009In: Proc. of SLaTE Workshop on Speech and Language Technology in Education, Wroxall, England, 2009Conference paper (Refereed)
    Abstract [en]

    A series of novel capabilities have been designed to extend the repertoire of Ville, a virtual language teacher for Swedish, created at the Centre for Speech technology at KTH. These capabilities were tested by twenty-seven language students at KTH. This paper reports on qualitative surveys and quantitative performance from these sessions which suggest some general lessons for automated language training.

  • 461.
    Wik, Preben
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
    Hjalmarsson, Anna
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
    Embodied conversational agents in computer assisted language learning2009In: Speech Communication, ISSN 0167-6393, E-ISSN 1872-7182, Vol. 51, no 10, p. 1024-1037Article in journal (Refereed)
    Abstract [en]

    This paper describes two systems using embodied conversational agents (ECAs) for language learning. The first system, called Ville, is a virtual language teacher for vocabulary and pronunciation training. The second system, a dialogue system called DEAL, is a role-playing game for practicing conversational skills. Whereas DEAL acts as a conversational partner with the objective of creating and keeping an interesting dialogue, Ville takes the role of a teacher who guides, encourages and gives feedback to the students.

  • 462.
    Wik, Preben
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Hjalmarsson, Anna
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Brusk, J.
    Computer Assisted Conversation Training for Second Language Learners2007In: Proceedings of Fonetik 2007, 2007, Vol. 50, no 1, p. 57-60Conference paper (Other academic)
    Abstract [en]

    This paper describes work in progress on DEAL, a spoken dialogue system under development at KTH. It is intended as a platform for exploring the challenges and potential benefits of combining elements from computer games, dialogue systems and language learning.

  • 463.
    Wik, Preben
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Hjalmarsson, Anna
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Brusk, J.
    DEAL A Serious Game For CALL Practicing Conversational Skills In The Trade Domain2007In: Proceedings of SLATE 2007, 2007Conference paper (Refereed)
    Abstract [en]

    This paper describes work in progress on DEAL, a spoken dialogue system under development at KTH. It is intended as a platform for exploring the challenges and potential benefits of combining elements from computer games, dialogue systems and language learning.

  • 464.
    Wik, Preben
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Husby, O.
    Øvregaard, Å.
    Bech, Ø.
    Albertsen, E.
    Nefzaoui, S.
    Skarpnes, E.
    Koreman, J.
    Contrastive analysis through L1-L2map2011In: TMH-QPSR, ISSN 1104-5787, Vol. 51, no 1, p. 49-52Article in journal (Other academic)
    Abstract [en]

    This paper describes the CALST project, in which the primary aim is to developVille-N, a computer assisted pronunciation training (CAPT) system for learners ofNorwegian as a second language. Ville-N makes use of L1-L2map, a tool for multilingualcontrastive analysis, to generate a list of language-specific features. Thesecan be used to tailor pronunciation and listening exercises. The tool can also beused for other target languages.

  • 465.
    Wik, Preben
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Lucas Escribano, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Say ‘Aaaaa’ Interactive Vowel Practice for Second Language Learning2009In: Proc. of SLaTE Workshop on Speech and Language Technology in Education, 2009Conference paper (Refereed)
    Abstract [en]

    This paper reports on a system created to help language students learn the vowel inventory of Swedish. Formants are tracked, and a 3D ball moves over a vowel-chart canvas in real time. Target spheres are placed at the target values of vowels, and the students’ task is to get the target spheres. A calibration process of capturing data from three cardinal vowels is used to normalize the effects of different size vocal tract, thus making it possible for people to use the program, regardless of age, size, or gender. A third formant is used in addition to the first and second formant, to distinguish the difference between two Swedish vowels.

  • 466.
    Wik, Preben
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Nygaard, Lars
    Fjeld, Ruth Vatvedt
    Managing complex and multilingual lexical data with a simple editor2004In: Proceedings of the Eleventh EURALEX International Congress, Lorient, France, 2004Conference paper (Refereed)
    Abstract [en]

    This paper presents an editor for compiling a multilingual machine readable lexicon, like the Simple-lexicon. This editor has proven to be a useful tool in linking several languages in one lexical database, and to edit the entries in a consistent and convenient way. The editor has been designed for linking Danish, Swedish and Norwegian in the Simple Scan-project, but might easily be extended to include all the languages in the Simple project. The editor may also be modified for similar machine readable lexical projects. 1.

  • 467.
    Zellers, Margaret
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Perception of pitch tails at potential turn boundaries in Swedish2014In: Proceedings of the Annual Conference of the International Speech Communication Association, 2014, p. 1944-1948Conference paper (Refereed)
    Abstract [en]

    In a number of languages, intonational patterns at prosodic boundaries are considered to be relevant for turn transition or turn hold. A perception experiment tested the influence of fundamental frequency (F0) peak height and rising final contours on Swedish listeners’ judgment about whether a speaker wanted to hold the turn. While F0 peak height, as has been previously shown, did influence listeners’ judgments, the end height of rising pitch tails apparently did not influence listeners’ judgments about whether a speaker planned to continue talking, even though they showed sensitivity to the differences in a discrimination task. The differences in responses in the tasks, as well as the difference from results found for other languages, may indicate that listeners used comparative prominence to guide their judgments, rather than intonation playing a direct role in the turn-transition system.

  • 468.
    Zellers, Margaret
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Pitch and lengthening as cues to turn transition in Swedish2013In: Proceedings of Interspeech 2013, 2013, p. 248-252Conference paper (Refereed)
    Abstract [en]

    In many cases of turn transition in conversation, a new speaker may respond to phonetic cues from the end of the prior turn, including variation in prosodic features such as pitch and final lengthening. Although consistent pitch and lengthening features are well-established for some languages at potential points of turn transition, this is not necessarily the case for Swedish. The current study uses a two-alternative forced choice task to investigate how variation in pitch contour and lengthening at the ends of syntactically complete turns can influence listeners’ expectations of turn hold or turn transition. Both lengthening and pitch contour features were found to influence listeners’ judgments about whether turn transition would occur, with shorter length and higher final pitch peaks associated with turn hold. Furthermore, listeners were more certain about their judgments when asked about turn-hold rather than turn-change, suggesting an imbalance in the strength of turn-hold versus turn-transition cues.

  • 469.
    Zellers, Margaret
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology. University of Stuttgart, Germany.
    Prosodic Variation and Segmental Reduction and Their Roles in Cuing Turn Transition in Swedish2017In: Language and Speech, ISSN 0023-8309, E-ISSN 1756-6053, Vol. 60, no 3, p. 454-478Article in journal (Refereed)
    Abstract [en]

    Prosody has often been identified alongside syntax as a cue to turn hold or turn transition in conversational interaction. However, evidence for which prosodic cues are most relevant, and how strong those cues are, has been somewhat scattered. The current study addresses prosodic cues to turn transition in Swedish. A perception study looking closely at turn changes and holds in cases where the syntax does not lead inevitably to a particular outcome shows that Swedish listeners are sensitive to duration variations, even in the very short space of the final unstressed syllable of a turn, and that they may use pitch cues to a lesser extent. An investigation of production data indicates that duration, and to some extent segmental reduction, demonstrate consistent variation in relation to the types of turn boundaries they accompany, while fundamental frequency and glottalization do not. Taken together, these data suggest that duration may be the primary cue to turn transition in Swedish conversation, rather than fundamental frequency, as some other studies have suggested.

  • 470.
    Zellers, Margaret
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Parallels between hand gestures and acoustic prosodic features in turn-taking2015In: 14th International Pragmatics Conference, Antwerp, Belgium, 2015, p. 454-455Conference paper (Refereed)
  • 471.
    Zellers, Margaret
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Ogden, Richard
    Exploring Interactional Features with Prosodic Patterns2014In: Language and Speech, ISSN 0023-8309, E-ISSN 1756-6053, Vol. 57, no 3, p. 285-309Article in journal (Refereed)
    Abstract [en]

    This study adopts a multiple-methods approach to the investigation of prosody, drawing on insights from a quantitative methodology (experimental prosody research) as well as a qualitative one (conversation analysis). We use a k-means cluster analysis to investigate prosodic patterns in conversational sequences involving lexico-semantic contrastive structures. This combined methodology demonstrates that quantitative/statistical methods are a valuable tool for making relatively objective characterizations of acoustic features of speech, while qualitative methods are essential for interpreting the quantitative results. We find that in sequences that maintain global prosodic characteristics across contrastive structures, participants orient to interactional problems, such as determining who has the right to the floor, or avoiding disruption of an ongoing interaction. On the other hand, in sequences in which the global prosody is different across contrastive structures, participants do not generally appear to be orienting to such problems of alignment. Our findings expand the interpretation of "contrastive prosody" that is commonly used in experimental prosody approaches, while providing a way for conversation-analytic research to improve quantification and generalizability of findings.

  • 472.
    Zimmerer, Frank
    et al.
    Saarland University.
    Andreeva, Bistra
    Saarland University.
    Möbius, Bernd
    Saarland University.
    Malisz, Zofia
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Ferragne, Emmanuel
    CNRS Université Lyon 2.
    Pellegrino, François
    CNRS Université Lyon 2.
    Brandt, Erika
    Saarland University.
    Perzeption von Sprechgeschwindigkeit und der (nicht nachgewiesene) Einfluss von Surprisal2017In: ESSV - 28. Konferenz Elektronische Sprachsignalverarbeitung 2017, Saarbrücken, 2017Conference paper (Refereed)
    Abstract [de]

    In zwei Perzeptionsexperimenten wurde die Perzeption von Sprech- geschwindigkeit untersucht. Ein Faktor, der dabei besonders im Zentrum des In- teresses steht, ist Surprisal, ein informationstheoretisches Maß für die Vorhersag- barkeit einer linguistischen Einheit im Kontext. Zusammengenommen legen die Ergebnisse der Experimente den Schluss nahe, dass Surprisal keinen signifikanten Einfluss auf die Wahrnehmung von Sprechgeschwindigkeit ausübt. 

78910 451 - 472 of 472
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf