Endre søk
Begrens søket
12 1 - 50 of 80
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Treff pr side
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Disputationsdatum (tidligste først)
  • Disputationsdatum (siste først)
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Disputationsdatum (tidligste først)
  • Disputationsdatum (siste først)
Merk
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 1.
    Al Moubayed, Samer
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    Beskow, Jonas
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    Öster, Ann-Marie
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    Salvi, Giampiero
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    Granström, Björn
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    van Son, Nic
    Ormel, Ellen
    Virtual Speech Reading Support for Hard of Hearing in a Domestic Multi-Media Setting2009Inngår i: INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2009, s. 1443-1446Konferansepaper (Fagfellevurdert)
    Abstract [en]

    In this paper we present recent results on the development of the SynFace lip synchronized talking head towards multilinguality, varying signal conditions and noise robustness in the Hearing at Home project. We then describe the large scale hearing impaired user studies carried out for three languages. The user tests focus on measuring the gain in Speech Reception Threshold in Noise when using SynFace, and on measuring the effort scaling when using SynFace by hearing impaired people. Preliminary analysis of the results does not show significant gain in SRT or in effort scaling. But looking at inter-subject variability, it is clear that many subjects benefit from SynFace especially with speech with stereo babble noise.

  • 2.
    Al Moubayed, Samer
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT. KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    De Smet, Michael
    Van Hamme, Hugo
    Lip Synchronization: from Phone Lattice to PCA Eigen-projections using Neural Networks2008Inngår i: INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2008, s. 2016-2019Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Lip synchronization is the process of generating natural lip movements from a speech signal. In this work we address the lip-sync problem using an automatic phone recognizer that generates a phone lattice carrying posterior probabilities. The acoustic feature vector contains the posterior probabilities of all the phones over a time window centered at the current time point. Hence this representation characterizes the phone recognition output including the confusion patterns caused by its limited accuracy. A 3D face model with varying texture is computed by analyzing a video recording of the speaker using a 3D morphable model. Training a neural network using 30 000 data vectors from an audiovisual recording in Dutch resulted in a very good simulation of the face on independent data sets of the same or of a different speaker.

  • 3. Ambrazaitis, G.
    et al.
    House, David
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Tal, musik och hörsel, TMH.
    Multimodal prominences: Exploring the patterning and usage of focal pitch accents, head beats and eyebrow beats in Swedish television news readings2017Inngår i: Speech Communication, ISSN 0167-6393, E-ISSN 1872-7182, Vol. 95, s. 100-113Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Facial beat gestures align with pitch accents in speech, functioning as visual prominence markers. However, it is not yet well understood whether and how gestures and pitch accents might be combined to create different types of multimodal prominence, and how specifically visual prominence cues are used in spoken communication. In this study, we explore the use and possible interaction of eyebrow (EB) and head (HB) beats with so-called focal pitch accents (FA) in a corpus of 31 brief news readings from Swedish television (four news anchors, 986 words in total), focusing on effects of position in text, information structure as well as speaker expressivity. Results reveal an inventory of four primary (combinations of) prominence markers in the corpus: FA+HB+EB, FA+HB, FA only (i.e., no gesture), and HB only, implying that eyebrow beats tend to occur only in combination with the other two markers. In addition, head beats occur significantly more frequently in the second than in the first part of a news reading. A functional analysis of the data suggests that the distribution of head beats might to some degree be governed by information structure, as the text-initial clause often defines a common ground or presents the theme of the news story. In the rheme part of the news story, FA, HB, and FA+HB are all common prominence markers. The choice between them is subject to variation which we suggest might represent a degree of freedom for the speaker to use the markers expressively. A second main observation concerns eyebrow beats, which seem to be used mainly as a kind of intensification marker for highlighting not only contrast, but also value, magnitude, or emotionally loaded words; it is applicable in any position in a text. We thus observe largely different patterns of occurrence and usage of head beats on the one hand and eyebrow beats on the other, suggesting that the two represent two separate modalities of visual prominence cuing.

  • 4.
    Ananthakrishnan, Gopal
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT. KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Engwall, Olov
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT. KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Mapping between acoustic and articulatory gestures2011Inngår i: Speech Communication, ISSN 0167-6393, E-ISSN 1872-7182, Vol. 53, nr 4, s. 567-589Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    This paper proposes a definition for articulatory as well as acoustic gestures along with a method to segment the measured articulatory trajectories and acoustic waveforms into gestures. Using a simultaneously recorded acoustic-articulatory database, the gestures are detected based on finding critical points in the utterance, both in the acoustic and articulatory representations. The acoustic gestures are parameterized using 2-D cepstral coefficients. The articulatory trajectories arc essentially the horizontal and vertical movements of Electromagnetic Articulography (EMA) coils placed on the tongue, jaw and lips along the midsagittal plane. The articulatory movements are parameterized using 2D-DCT using the same transformation that is applied on the acoustics. The relationship between the detected acoustic and articulatory gestures in terms of the timing as well as the shape is studied. In order to study this relationship further, acoustic-to-articulatory inversion is performed using GMM-based regression. The accuracy of predicting the articulatory trajectories from the acoustic waveforms are at par with state-of-the-art frame-based methods with dynamical constraints (with an average error of 1.45-1.55 mm for the two speakers in the database). In order to evaluate the acoustic-to-articulatory inversion in a more intuitive manner, a method based on the error in estimated critical points is suggested. Using this method, it was noted that the estimated articulatory trajectories using the acoustic-to-articulatory inversion methods were still not accurate enough to be within the perceptual tolerance of audio-visual asynchrony.

  • 5.
    Ananthakrishnan, Gopal
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Neiberg, Daniel
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Engwall, Olov
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    In search of Non-uniqueness in the Acoustic-to-Articulatory Mapping2009Inngår i: INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2009, s. 2799-2802Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper explores the possibility and extent of non-uniqueness in the acoustic-to-articulatory inversion of speech, from a statistical point of view. It proposes a technique to estimate the non-uniqueness, based on finding peaks in the conditional probability function of the articulatory space. The paper corroborates the existence of non-uniqueness in a statistical sense, especially in stop consonants, nasals and fricatives. The relationship between the importance of the articulator position and non-uniqueness at each instance is also explored.

  • 6. Arzyutov, Dmitry
    et al.
    Lyublinskaya, Marina
    Nenet͡skoe olenevodstvo: geografii͡a, ėtnografii͡a, lingvistika [Nenets Reindeer Husbandry: Geography, Ethnography, and Linguistics].2018Collection/Antologi (Fagfellevurdert)
  • 7.
    Bergren, Max
    et al.
    Gavagai.
    Karlgren, Jussi
    KTH, Skolan för datavetenskap och kommunikation (CSC), Teoretisk datalogi, TCS.
    Östling, Robert
    Stockholms universitet.
    Parkvall, Mikael
    Stockholms universitet.
    Inferring the location of authors from words in their texts2015Inngår i: Proceedings of the 20th Nordic Conference of Computational Linguistics, Linköping University Electronic Press, 2015Konferansepaper (Fagfellevurdert)
    Abstract [en]

    For the purposes of computational dialec- tology or other geographically bound text analysis tasks, texts must be annotated with their or their authors’ location. Many texts are locatable but most have no ex- plicit annotation of place. This paper describes a series of experiments to de- termine how positionally annotated mi- croblog posts can be used to learn loca- tion indicating words which then can be used to locate blog texts and their authors. A Gaussian distribution is used to model the locational qualities of words. We in- troduce the notion of placeness to describe how locational words are.

    We find that modelling word distributions to account for several locations and thus several Gaussian distributions per word, defining a filter which picks out words with high placeness based on their local distributional context, and aggregating lo- cational information in a centroid for each text gives the most useful results. The re- sults are applied to data in the Swedish language. 

  • 8.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Bruce, Gösta
    Lunds universitet.
    Enflo, Laura
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Granström, Björn
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Schötz, Susanne
    Lunds universitet.
    Recognizing and Modelling Regional Varieties of Swedish2008Inngår i: INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, 2008, s. 512-515Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Our recent work within the research project SIMULEKT (Simulating Intonational Varieties of Swedish) includes two approaches. The first involves a pilot perception test, used for detecting tendencies in human clustering of Swedish dialects. 30 Swedish listeners were asked to identify the geographical origin of Swedish native speakers by clicking on a map of Sweden. Results indicate for example that listeners from the south of Sweden are better at recognizing some major Swedish dialects than listeners from the central part of Sweden, which includes the capital area. The second approach concerns a method for modelling intonation using the newly developed SWING (Swedish INtonation Generator) tool, where annotated speech samples are resynthesized with rule based intonation and audiovisually analysed with regards to the major intonational varieties of Swedish. We consider both approaches important in our aim to test and further develop the Swedish prosody model.

  • 9.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT. KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Edlund, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT. KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Nordstrand, Magnus
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    A Model for Multimodal Dialogue System Output Applied to an Animated Talking Head2005Inngår i: SPOKEN MULTIMODAL HUMAN-COMPUTER DIALOGUE IN MOBILE ENVIRONMENTS / [ed] Minker, Wolfgang; Bühler, Dirk; Dybkjær, Laila, Dordrecht: Springer , 2005, s. 93-113Kapittel i bok, del av antologi (Fagfellevurdert)
    Abstract [en]

    We present a formalism for specifying verbal and non-verbal output from a multimodal dialogue system. The output specification is XML-based and provides information about communicative functions of the output, without detailing the realisation of these functions. The aim is to let dialogue systems generate the same output for a wide variety of output devices and modalities. The formalism was developed and implemented in the multimodal spoken dialogue system AdApt. We also describe how facial gestures in the 3D-animated talking head used within this system are controlled through the formalism.

  • 10.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Granström, Björn
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Visual correlates to prominence in several expressive modes2006Inngår i: INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2006, s. 1272-1275Konferansepaper (Fagfellevurdert)
    Abstract [en]

    In this paper, we present measurements of visual, facial parameters obtained from a speech corpus consisting of short, read utterances in which focal accent was systematically varied. The utterances were recorded in a variety of expressive modes including certain, confirming, questioning, uncertain, happy, angry and neutral. Results showed that in all expressive modes, words with focal accent are accompanied by a greater variation of the facial parameters than are words in non-focal positions. Moreover, interesting differences between the expressions in terms of different parameters were found.

  • 11.
    Bigert, Johnny
    et al.
    KTH, Tidigare Institutioner, Numerisk analys och datalogi, NADA.
    Kann, Viggo
    KTH, Tidigare Institutioner, Numerisk analys och datalogi, NADA.
    Knutsson, Ola
    KTH, Tidigare Institutioner, Numerisk analys och datalogi, NADA.
    Sjöbergh, Jonas
    KTH, Tidigare Institutioner, Numerisk analys och datalogi, NADA.
    Grammar checking for Swedish second language learners2004Inngår i: CALL for the Nordic Languages: Tools and Methods for Computer Assisted Language Learning, Copenhagen Business School: Samfundslitteratur , 2004, s. 33-47Kapittel i bok, del av antologi (Annet vitenskapelig)
    Abstract [en]

    Grammar errors and context-sensitive spelling errors in texts written by second language learners are hard to detect automatically. We have used three different approaches for grammar checking: manually constructed error detection rules, statistical differences between correct and incorrect texts, and machine learning of specific error types. The three approaches have been evaluated using a corpus of second language learner Swedish. We found that the three methods detect different errors and therefore complement each other.

  • 12.
    Blomberg, Mats
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Elenius, Daniel
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Tree-Based Estimation of Speaker Characteristics for Speech Recognition2009Inngår i: INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2009, s. 580-583Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Speaker adaptation by means of adjustment of speaker characteristic properties, such as vocal tract length, has the important advantage compared to conventional adaptation techniques that the adapted models are guaranteed to be realistic if the description of the properties are. One problem with this approach is that the search procedure to estimate them is computationally heavy. We address the problem by using a multi-dimensional, hierarchical tree of acoustic model sets. The leaf sets are created by transforming a conventionally trained model set using leaf-specific speaker profile vectors. The model sets of non-leaf nodes are formed by merging the models of their child nodes, using a computationally efficient algorithm. During recognition, a maximum likelihood criterion is followed to traverse the tree. Studies of one- (VTLN) and four-dimensional speaker profile vectors (VTLN, two spectral slope parameters and model variance scaling) exhibit a reduction of the computational load to a fraction compared to that of an exhaustive grid search. In recognition experiments on children's connected digits using adult and male models, the one-dimensional tree search performed as well as the exhaustive search. Further reduction was achieved with four dimensions. The best recognition results are 0.93% and 10.2% WER in TIDIGITS and PF-Star-Sw, respectively, using adult models.

  • 13.
    Boholm, Max
    KTH, Skolan för arkitektur och samhällsbyggnad (ABE), Filosofi och teknikhistoria, Filosofi.
    Risk, language and discourse2016Doktoravhandling, med artikler (Annet vitenskapelig)
    Abstract [en]

    This doctoral thesis analyses the concept of risk and how it functions as an organizing principle of discourse, paying close attention to actual linguistic practice.

              Article 1 analyses the concepts of risk, safety and security and their relations based on corpus data (the Corpus of Contemporary American English). Lexical, grammatical and semantic contexts of the nouns risk, safety and security, and the adjectives risky, safe and secure are analysed and compared. Similarities and differences are observed, suggesting partial synonymy between safety (safe) and security (secure) and semantic opposition to risk (risky). The findings both support and contrast theoretical assumptions about these concepts in the literature.

              Article 2 analyses the concepts of risk and danger and their relation based on corpus data (in this case the British National Corpus). Frame semantics is used to explore the assumptions of the sociologist Niklas Luhmann (and others) that the risk concept presupposes decision-making, while the concept of danger does not. Findings partly support and partly contradict this assumption.

              Article 3 analyses how newspapers represent risk and causality. Two theories are used: media framing and the philosopher John Mackie’s account of causality. A central finding of the study is that risks are “framed” with respect to causality in several ways (e.g. one and the same type of risk can be presented as resulting from various causes). Furthermore, newspaper reporting on risk and causality vary in complexity. In some articles, risks are presented without causal explanations, while in other articles, risks are presented as results from complex causal conditions. Considering newspaper reporting on an aggregated overall level, complex schemas of causal explanations emerge.

              Article 4 analyses how phenomena referred to by the term nano (e.g. nanotechnology, nanoparticles and nanorobots) are represented as risks in Swedish newspaper reporting. Theoretically, the relational theory of risk and frame semantics are used. Five main groups of nano-risks are identified based on the risk object of the article: (I) nanotechnology; (II) nanotechnology and its artefacts (e.g. nanoparticles and nanomaterials); (III) nanoparticles, without referring to nanotechnology; (IV) non-nanotechnological nanoparticles (e.g. arising from traffic); and (V) nanotechnology and nanorobots. Various patterns are explored within each group, concerning, for example, what is considered to be at stake in relation to these risk objects, and under what conditions. It is concluded that Swedish patterns of newspaper reporting on nano-risks follow international trends, influenced by scientific assessment, as well as science fiction.

              Article 5 analyses the construction and negotiation of risk in the Swedish controversy over the use of antibacterial silver in health care and consumer products (e.g. sports clothes and equipment). The controversy involves several actors: print and television news media, Government and parliament, governmental agencies, municipalities, non-government organisations, and companies. In the controversy, antibacterial silver is claimed to be a risk object that negatively affects health, the environment, and sewage treatment industry (objects at risk). In contrast, such claims are denied. Antibacterial silver is even associated with the benefit of mitigating risk objects (e.g. bacteria and micro-organisms) that threaten health and the environment (objects at risk). In other words, both sides of the controversy invoke health and the environment as objects at risk. Three strategies organising risk communication are identified: (i) representation of silver as a risk to health and the environment; (ii) denial of such representations; and (iii) benefit association, where silver is construed to mitigate risks to health and the environment.

  • 14.
    Boholm, Max
    School of Global Studies, University of Gothenburg, Gothenburg, Sweden..
    The semantic distinction between ‘risk’ and ‘danger’: A linguistic analysis2012Inngår i: Risk Analysis, ISSN 0272-4332, E-ISSN 1539-6924, Vol. 32, nr 2, s. 281-293Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    The analysis combines frame semantic and corpus linguistic approaches in analyzing the role of agency and decision making in the semantics of the words “risk” and “danger” (both nominal and verbal uses). In frame semantics, the meanings of “risk” and of related words, such as “danger,” are analyzed against the background of a specific cognitive-semantic structure (a frame) comprising frame elements such as Protagonist, Bad Outcome, Decision, Possession, and Source. Empirical data derive from the British National Corpus (100 million words). Results indicate both similarities and differences in use. First, both “risk” and “danger” are commonly used to represent situations having potential negative consequences as the result of agency. Second, “risk” and “danger,” especially their verbal uses (to risk, to endanger), differ in agent-victim structure, i.e., “risk” is used to express that a person affected by an ac- tion is also the agent of the action, while “endanger” is used to express that the one affected is not the agent. Third, “risk,” but not “danger,” tends to be used to represent rational and goal-directed action. The results therefore to some extent confirm the analysis of “risk” and “danger” suggested by German sociologist Niklas Luhmann. As a point of discussion, the present findings arguably have implications for risk communication.

  • 15. Borg, Erik
    et al.
    Edquist, Gertrud
    Reinholdson, Anna-Clara
    Risberg, Arne
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    McAllister, Bob
    Speech and language development in a population of Swedish hearing-impaired pre-school-children, a cross-sectional study2007Inngår i: International Journal of Pediatric Otorhinolaryngology, ISSN 0165-5876, E-ISSN 1872-8464, Vol. 71, nr 7, s. 1061-1077Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Objective: There is little information on speech and language development in preschool children with mild, moderate or severe hearing impairment. The primary aim of the study is to establish a reference material for clinical use covering various aspects of speech and language functions and to relate test values to pure tone audiograms and parents' judgement of their children's hearing and language abilities. Methods: Nine speech and language tests were applied or modified, both classical tests and newly developed tests. Ninety-seven children with normal hearing and 156 with hearing impairment were tested. Hearing was 80 dB HL PTA or better in the best ear. Swedish was their strongest language. None had any additional diagnosed major handicaps. The children were 4-6 years of age. The material was divided into 10 categories of hearing impairment, 5 conductive and 5 sensorineural: unilateral; bilateral 0-20; 21-40; 41-60; 61-80 dB HL PTA. The tests, selected on the basis of a three component language model, are phoneme discrimination; rhyme matching; Peabody Picture Vocabulary Test (PPVT-III, word perception); Test for Reception of Grammar (TROG, grammar perception); prosodic phrase focus; rhyme construction; Word Finding Vocabulary Test (word production); Action Picture Test (grammar production); oral motor test. Results: Only categories with sensorineural toss showed significant differences from normal. Word production showed the most marked delay for 21-40 dB HL: 5 and 6 years p < 0.01; for 41-60 dB: 4 years p < 0.01 and 6 years p < 0.01 and 61-80 dB: 5 years p < 0.05. Phoneme discrimination 21-40 dB HL: 6 years p < 0.05; 41-60 dB: 4 years p < 0.01; 61-80 dB: 4 years p < 0.001, 5 years p < 0.001. Rhyme matching: no significant difference as compared to normal data. Word perception: sensorineural 41-60 dB HL: 6 years p < 0.05; 61-80 dB: 4 years p < 0.05; 5 years p < 0.01. Grammar perception: sensorineural 41-60 dB HL: 6 years p < 0.05; 61-80 dB: 5 years p < 0.05. Prosodic phrase focus: 41-60 dB HL: 5 years p < 0.01. Rhyme construction: 41-60 dB HL: 4 years p < 0.05. Grammar production: 61-80 dB HL: 5 years p < 0.01. Oral motor function: no differences. The Word production test showed a 1.5-2 years delay for sensorineural impairment 41-80 dB HL through 4-6 years of age. There were no differences between hearing-impaired boys and girls. Extended data for the screening test [E. Borg, A. Risberg, B. McAllister, B.M. Undemar, G. Edquist, A.C. Reinholdsson, et at., Language development in hearing-impaired children. Establishment of a reference material for a "Language test for hearing-impaired children", Int. J. Pediatr. Otorhinolaryngot. 65 (2002) 15-26] are presented. Conclusions: Reference values for expected speech and language development are presented that cover nearly 60% of the studied population. The effect of the peripheral hearing impairment is compensated for in many children with hearing impairment up to 60 dB HL. Above that degree of impairment, language delay is more pronounced, probably due to a toss of acuity. The importance of central cognitive functions, speech reading and signing for compensation of peripheral limitations is pointed out.

  • 16.
    Carlson, Rolf
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Gustafson, Kjell
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Strangert, Eva
    Cues for Hesitation in Speech Synthesis2006Inngår i: INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2006, s. 1300-1303Konferansepaper (Fagfellevurdert)
    Abstract [en]

    The current study investigates acoustic correlates to perceived hesitation based on previous work showing that pause duration and final lengthening both contribute to the perception of hesitation. It is the total duration increase that is the valid cue rather than the contribution by either factor. The present experiment using speech synthesis was designed to evaluate F0 slope and presence vs. absence of creaky voice before the inserted hesitation in addition to durational cues. The manipulations occurred in two syntactic positions, within a phrase and between two phrases, respectively. The results showed that in addition to durational increase, variation of both F0 slope and creaky voice had perceptual effects, although to a much lesser degree. The results have a bearing on efforts to model spontaneous speech including disfluencies, to be explored, for example, in spoken dialogue systems.

  • 17.
    Carlson, Rolf
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    Hirschberg, Julia
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    Cross-Cultural Perception of Discourse Phenomena2009Inngår i: INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2009, s. 1723-1726Konferansepaper (Fagfellevurdert)
    Abstract [en]

    We discuss perception studies of two low level indicators of discourse phenomena by Swedish. Japanese, and Chinese native speakers. Subjects were asked to identify upcoming prosodic boundaries and disfluencies in Swedish spontaneous speech. We hypothesize that speakers of prosodically unrelated languages should be less able to predict upcoming phrase boundaries but potentially better able to identify disfluencies, since indicators of disfluency are more likely to depend upon lexical, as well as acoustic information. However, surprisingly, we found that both phenomena were fairly well recognized by native and non-native speakers, with, however, some possible interference from word tones for the Chinese subjects.

  • 18.
    Dahlberg, Leif
    KTH, Skolan för datavetenskap och kommunikation (CSC), Medieteknik och interaktionsdesign, MID.
    Det akademiska samtalet2015Inngår i: Universitetet som medium / [ed] Matts Lindström & Adam Wickberg Månsson, Lund: Mediehistoria, Lunds universitet , 2015, s. 195-223Kapittel i bok, del av antologi (Fagfellevurdert)
  • 19. de Leeuw, Esther
    et al.
    Opitz, Conny
    Lubinska, Dorota
    KTH, Skolan för teknikvetenskaplig kommunikation och lärande (ECE), Avdelningen för bibliotek, språk och ARC, Språk och kommunikation.
    Dynamics of first language attrition across the lifespan Introduction2013Inngår i: International Journal of Bilingualism, ISSN 1367-0069, E-ISSN 1756-6878, Vol. 17, nr 6, s. 667-674Artikkel i tidsskrift (Fagfellevurdert)
  • 20.
    Edlund, Jens
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Beskow, Jonas
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Pushy versus meek: using avatars to influence turn-taking behaviour2007Inngår i: INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2007, s. 2784-2787Konferansepaper (Fagfellevurdert)
    Abstract [en]

    The flow of spoken interaction between human interlocutors is a widely studied topic. Amongst other things, studies have shown that we use a number of facial gestures to improve this flow - for example to control the taking of turns. This type of gestures ought to be useful in systems where an animated talking head is used, be they systems for computer mediated human-human dialogue or spoken dialogue systems, where the computer itself uses speech to interact with users. In this article, we show that a small set of simple interaction control gestures and a simple model of interaction can be used to influence users' behaviour in an unobtrusive manner. The results imply that such a model may improve the flow of computer mediated interaction between humans under adverse circumstances, such as network latency, or to create more human-like spoken human-computer interaction.

  • 21.
    Edlund, Jens
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Gustafson, Joakim
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Hidden resources - Strategies to acquire and exploit potential spoken language resources in national archives2016Inngår i: Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016, European Language Resources Association (ELRA) , 2016, s. 4531-4534Konferansepaper (Fagfellevurdert)
    Abstract [en]

    In 2014, the Swedish government tasked a Swedish agency, The Swedish Post and Telecom Authority (PTS), with investigating how to best create and populate an infrastructure for spoken language resources (Ref N2014/2840/ITP). As a part of this work, the department of Speech, Music and Hearing at KTH Royal Institute of Technology have taken inventory of existing potential spoken language resources, mainly in Swedish national archives and other governmental or public institutions. In this position paper, key priorities, perspectives, and strategies that may be of general, rather than Swedish, interest are presented. We discuss broad types of potential spoken language resources available; to what extent these resources are free to use; and thirdly the main contribution: strategies to ensure the continuous acquisition of spoken language resources in a manner that facilitates speech and speech technology research.

  • 22.
    Edlund, Jens
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Heldner, Manias
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Hirschberg, Julia
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Pause and gap length in face-to-face interaction2009Inngår i: INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2009, s. 2779-2782Konferansepaper (Fagfellevurdert)
    Abstract [en]

    It has long been noted that conversational partners tend to exhibit increasingly similar pitch, intensity, and timing behavior over the course of a conversation. However, the metrics developed to measure this similarity to date have generally failed to capture the dynamic temporal aspects of this process. In this paper, we propose new approaches to measuring interlocutor similarity in spoken dialogue. We define similarity in terms of convergence and synchrony and propose approaches to capture these, illustrating our techniques on gap and pause production in Swedish spontaneous dialogues.

  • 23.
    Edlund, Jens
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Heldner, Mattias
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    vertical bar nailon vertical bar: Software for Online Analysis of Prosody2006Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper presents /nailon/ - a software package for online real-time prosodic analysis that captures a number of prosodic features relevant for inter-action control in spoken dialogue systems. The current implementation captures silence durations; voicing, intensity, and pitch; pseudo-syllable durations; and intonation patterns. The paper provides detailed information on how this is achieved. As an example application of /nailon/, we demonstrate how it is used to improve the efficiency of identifying relevant places at which a machine can legitimately begin to talk to a human interlocutor, as well as to shorten system response times.

  • 24.
    Edlund, Jens
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Heldner, Mattias
    Wlodarczak, Marcin
    Catching wind of multiparty conversation2014Inngår i: LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014Konferansepaper (Fagfellevurdert)
    Abstract [en]

    The paper describes the design of a novel corpus of respiratory activity in spontaneous multiparty face-to-face conversations in Swedish. The corpus is collected with the primary goal of investigating the role of breathing for interactive control of interaction. Physiological correlates of breathing are captured by means of respiratory belts, which measure changes in cross sectional area of the rib cage and the abdomen. Additionally, auditory and visual cues of breathing are recorded in parallel to the actual conversations. The corpus allows studying respiratory mechanisms underlying organisation of spontaneous communication, especially in connection with turn management. As such, it is a valuable resource both for fundamental research and speech techonology applications.

  • 25.
    Engwall, Olov
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT. KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Can audio-visual instructions help learners improve their articulation?: an ultrasound study of short term changes2008Inngår i: INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2008, s. 2631-2634Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper describes how seven French subjects change their pronunciation and articulation when practising Swedish words with a computer-animated virtual teacher. The teacher gives feedback on the user's pronunciation with audiovisual instructions suggesting how the articulation should be changed. A wizard-of-Oz set-up was used for the training session, in which a human listener choose the adequate pre-generated feedback based on the user's pronunciation. The subjects change of the articulation was monitored during the practice session with a hand-held ultrasound probe. The perceptual analysis indicates that the subjects improved their pronunciation during the training and the ultrasound measurements suggest that the improvement was made by following the articulatory instructions given by the computer-animated teacher.

  • 26.
    Engwall, Olov
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Dynamical Aspects of Coarticulation in Swedish Fricatives: A Combined EMA and EPG Study2000Inngår i: TMH Quarterly Status and Progress Report, s. 49-73Artikkel i tidsskrift (Annet vitenskapelig)
    Abstract [en]

    An electromagnetic articulography (EMA) system and electropalatography (EPG)have been employed to study five Swedish fricatives in different vowel contexts.Articulatory measures at the onset of, the mean value during, and at the offset ofthe fricative were used to evidence the coarticulation throughout the fricative. Thecontextual influence on these three different measurements of the fricative arecompared and contrasted to evidence how the coarticulation changes. Measureswere made for the jaw motion, lip protrusion, tongue body with EMA and linguopalatalcontact with EPG. The data from the two sources were further combinedand assessed for complementary and conflicting results.

  • 27.
    Engwall, Olov
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT. KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Wik, Preben
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT. KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Are real tongue movements easier to speech read than synthesized?2009Inngår i: INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2009, s. 824-827Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Speech perception studies with augmented reality displays in talking heads have shown that tongue reading abilities are weak initially, but that subjects become able to extract some information from intra-oral visualizations after a short training session. In this study, we investigate how the nature of the tongue movements influences the results, by comparing synthetic rule-based and actual, measured movements. The subjects were significantly better at perceiving sentences accompanied by real movements, indicating that the current coarticulation model developed for facial movements is not optimal for the tongue.

  • 28.
    Ericsdotter, Christine
    et al.
    Stockholm University.
    Ternström, Sten
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Swedish2012Inngår i: The Use of the International Phonetic Alphabet in the Choral Rehearsal / [ed] Duane R. Karna, Scarecrow Press, 2012, s. 245-251Kapittel i bok, del av antologi (Annet vitenskapelig)
  • 29.
    Fant, Gunnar
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    A personal note from Gunnar Fant2009Inngår i: Speech Communication, ISSN 0167-6393, E-ISSN 1872-7182, Vol. 51, nr 7, s. 564-568Artikkel i tidsskrift (Annet vitenskapelig)
  • 30. Gerholm, Tove
    et al.
    Gustavsson, L.
    Schwarz, I.
    Marklund, U.
    Salomão, Gláucia Laís
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH. Institutionen för Lingvistiken, Stockholm Universitet.
    Kallioinen, P.
    Andersson, S.
    Eriksson, F.
    Pagmar, D.
    Tahbaz, S.
    The Swedish MINT Project modelling infant language acquisition2015Konferansepaper (Fagfellevurdert)
  • 31.
    Grancharov, Volodya
    et al.
    KTH, Skolan för elektro- och systemteknik (EES), Ljud- och bildbehandling.
    Zhao, David Yuheng
    KTH, Skolan för elektro- och systemteknik (EES), Ljud- och bildbehandling.
    Lindblom, Jonas
    KTH, Skolan för elektro- och systemteknik (EES), Ljud- och bildbehandling.
    Kleijn, W. Bastiaan
    KTH, Skolan för elektro- och systemteknik (EES), Ljud- och bildbehandling.
    Non-Intrusive Speech Quality Assessment with Low Computational Complexity2006Inngår i: INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2006, s. 189-192Konferansepaper (Fagfellevurdert)
    Abstract [en]

    We describe an algorithm for monitoring subjective speech quality without access to the original signal that has very low computational and memory requirements. The features used in the proposed algorithm can be computed from commonly used speech-coding parameters. Reconstruction and perceptual transformation of the signal are not performed. The algorithm generates quality assessment ratings without explicit distortion modeling. The simulation results indicate that the proposed non-intrusive objective quality measure performs better than the ITU-T P.563 standard despite its very low computational complexity.

  • 32. Han, Qichao
    et al.
    Sundberg, Johan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Duration, Pitch, and Loudness in Kunqu Opera Stage Speech2017Inngår i: Journal of Voice, ISSN 0892-1997, E-ISSN 1873-4588, Vol. 31, nr 2, artikkel-id UNSP 255.e1Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Objectives. Kunqu is a special type of opera within the Chinese tradition with 600 years of history. In it, stage speech is used for the spoken dialogue. It is performed in Ming Dynasty's mandarin language and is a much more dominant part of the play than singing. Stage speech deviates considerably from normal conversational speech with respect to duration, loudness and pitch. This paper compares these properties in stage speech conversational speech. Method. A famous, highly experienced female singer's performed stage speech and reading of the same lyrics in a conversational speech mode. Clear differences are found. Results. As compared with conversational speech, stage speech had longer word and sentence duration and word duration was less variable. Average sound level was 16 dB higher. Also mean fundamental frequency was considerably higher and more varied. Within sentences, both loudness and fundamental frequency tended to vary according to a low-high-low pattern. Conclusions. Some of the findings fail to support current opinions regarding the characteristics of stage speech, and in this sense the study demonstrates the relevance of objective measurements in descriptions of vocal styles.

  • 33.
    Heldner, Mattias
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Detection thresholds for gaps, overlaps, and no-gap-no-overlaps2011Inngår i: Journal of the Acoustical Society of America, ISSN 0001-4966, E-ISSN 1520-8524, Vol. 130, nr 1, s. 508-513Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Detection thresholds for gaps and overlaps, that is acoustic and perceived silences and stretches of overlapping speech in speaker changes, were determined. Subliminal gaps and overlaps were categorized as no-gap-no-overlaps. The established gap and overlap detection thresholds both corresponded to the duration of a long vowel, or about 120 ms. These detection thresholds are valuable for mapping the perceptual speaker change categories gaps, overlaps, and no-gap-no-overlaps into the acoustic domain. Furthermore, the detection thresholds allow generation and understanding of gaps, overlaps, and no-gap-no-overlaps in human-like spoken dialogue systems.

  • 34.
    Hincks, Rebecca
    KTH, Skolan för teknikvetenskaplig kommunikation och lärande (ECE), Lärande, Språk och kommunikation.
    Recently hired tenure-track faculty and Swedish: An unsolicited report for KTH leadership2018Rapport (Annet vitenskapelig)
    Abstract [en]

    The new KTH development plan acknowledges that many KTH environments are no longer bilingual, and that efforts must be made to strengthen the position of Swedish. Unfortunately, it appears that many KTH leaders underestimate the time and resources necessary for most adults to learn a second language to the high proficiency level necessary for teaching or for academic leadership.

    An examination of job advertisements found that it is at present a common practice across northern Europe to specify that applicants to faculty positions be prepared to learn the local language well enough to use it for teaching within two years. The language expectations placed on newly hired KTH faculty hired to tenure-track positions were investigated to find out what extent this is true at KTH. Of 49 non-Swedish speakers who answered a survey, eight were met with the teaching-within-two-year expectation when hired, and 14 are meeting the expectation at present. The Swedish learning is to take place mostly in one’s free time, and little progress toward adequate proficiency is being made among the faculty. These findings are discussed in light of what is known about the time it takes for adults to learn a second language to a high level of professional proficiency.

    If departments seriously expect transnational faculty to teach in Swedish within two years, they should allow the individual the equivalent of six months of full-time study of the language. A more reasonable timeframe for learning high-proficiency Swedish would be five or six years. Language-learning plans should be written for all new hires to tenure-track positions, and followed up at regular intervals.

  • 35.
    Hincks, Rebecca
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Språk och kommunikation.
    Speaking rate and information content in English lingua franca oral presentations2010Inngår i: English for specific purposes (New York, N.Y.), ISSN 0889-4906, E-ISSN 1873-1937, Vol. 29, nr 1, s. 4-18Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    This paper quantifies differences in speaking rates in a first and second language, and examines the effects of slower rates on the speakers' abilities to convey information. The participants were 14 fluent (CEF B2/C1) English L2 speakers who held the same oral presentation twice, once in English and once in their native Swedish. The temporal variables of mean length of runs and speaking rate in syllables per second were calculated for each language. Speaking rate was found to be 23% slower when using English. The slower rate of speech was found to significantly reduce the information content of the presentations when speaking time was held constant. Implications for teaching as European universities adopt English as a medium of instruction are discussed.

  • 36.
    Hincks, Rebecca
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Språk och kommunikation.
    Edlund, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT.
    PROMOTING INCREASED PITCH VARIATION IN ORAL PRESENTATIONS WITH TRANSIENT VISUAL FEEDBACK2009Inngår i: Language Learning & Technology, ISSN 1094-3501, E-ISSN 1094-3501, Vol. 13, nr 3, s. 32-50Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    This paper investigates learner response to a novel kind of intonation feedback generated from speech analysis. Instead of displays of pitch curves, our feedback is flashing lights that show how much pitch variation the speaker has produced. The variable used to generate the feedback is the standard deviation of fundamental frequency as measured in semitones. Flat speech causes the system to show yellow lights, while more expressive speech that has used pitch to give focus to any part of an utterance generates green lights. Participants in the study were 14 Chinese students of English at intermediate and advanced levels. A group that received visual feedback was compared with a group that received audio feedback. Pitch variation was measured at four stages: in a baseline oral presentation; for the first and second halves of three hours of training; and finally in the production of a new oral presentation. Both groups increased their pitch variation with training, and the effect lasted after the training had ended. The test group showed a significantly higher increase than the control group, indicating that the feedback is effective. These positive results imply that the feedback could be beneficially used in a system for practicing oral presentations.

  • 37. Horne, Merle
    et al.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Svantesson, Jan-Olof
    Touati, Paul
    Gösta Bruce 1947-2010 In Memoriam2010Inngår i: Phonetica, ISSN 0031-8388, E-ISSN 1423-0321, Vol. 67, nr 4, s. 268-270Artikkel i tidsskrift (Fagfellevurdert)
  • 38.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Integrating Audio and Visual Cues for Speaker Friendliness in Multimodal Speech Synthesis2007Inngår i: INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2007, s. 1461-1464Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper investigates interactions between audio and visual cues to friendliness in questions in two perception experiments. In the first experiment, manually edited parametric audio-visual synthesis was used to create the stimuli. Results were consistent with earlier findings in that a late, high final focal accent peak was perceived as friendlier than an earlier, lower focal accent peak. Friendliness was also effectively signaled by visual facial parameters such as a smile, head nod and eyebrow raising synchronized with the final accent. Consistent additive effects were found between the audio and visual cues for the subjects as a group and individually showing that subjects integrate the two modalities. The second experiment used data-driven visual synthesis where the database was recorded by an actor instructed to portray anger and happiness. Friendliness was correlated to the happy database, but the effect was not as strong as for the parametric synthesis.

  • 39.
    House, David
    KTH, Skolan för elektroteknik och datavetenskap (EECS).
    Response to Fred Cummins: Looking for Rhythm in Speech.2012Inngår i: Empirical Musicology Review, ISSN 1559-5749, E-ISSN 1559-5749, Vol. 7, nr 1-2, s. 45-48Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    This commentary briefly reviews three aspects of rhythm in speech. The first concerns the issues of what to measure and how measurements should relate to rhythm's communicative functions. The second relates to how tonal and durational features of speech contribute to the percept of rhythm, noting evidence that indicates such features can be tightly language-specific. The third aspect addressed is how bodily gestures integrate with and enhance the communicative functions of speech rhythm.

  • 40.
    House, David
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Karlsson, Anastasia
    Svantesson, Jan-Olof
    Tayanin, Damrong
    The Phrase-Final Accent in Kammu: Effects of Tone, Focus and Engagement2009Inngår i: INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2009, s. 2439-2442Konferansepaper (Fagfellevurdert)
    Abstract [en]

    The phrase-final accent can typically contain a multitude of simultaneous prosodic signals. In this study, aimed at separating the effects of lexical tone from phrase-final intonation, phrase-final accents of two dialects of Kammu were analyzed. Kammu, a Mon-Khmer language spoken primarily in northern Laos, has dialects with lexical tones and dialects with no lexical tones. Both dialects seem to engage the phrase-final accent to simultaneously convey focus, phrase finality, utterance finality, and speaker engagement. Both dialects also show clear evidence of truncation phenomena. These results have implications for our understanding of the interaction between tone, intonation and phrase-finality.

  • 41.
    Kann, Viggo
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Teoretisk datalogi, TCS.
    Nordberg, Richard
    KTH, Skolan för teknikvetenskaplig kommunikation och lärande (ECE), Avdelningen för bibliotek, språk och ARC, Språk och kommunikation.
    Hur kan en språkpolicy bli verklighet?2014Konferansepaper (Fagfellevurdert)
    Abstract [sv]

    KTH antog år 2010 en språkpolicy som beskriver universitetets ambitioner vad gäller språklig kvalitet och språkliga förmågor. För att ett policydokument verkligen ska påverka verksamheten krävs att några arbetar med att göra policyn känd och implementera den [2]. Därför inrättades KTH:s språkkommitté, som sedan 2011 har arbetat med att bevaka språkfrågor i vid mening på KTH. Kommittén ska implementera KTH:s språkpolicy i verksamheten och ge råd och tips i språkliga frågor av generell art. Detta bidrag beskriver några aktiviteter som språkkommittén har genomfört i detta syfte och diskuterar hur dessa har påverkat verksamheten. År 2011 och 2012 genomförde språkkommittén två enkäter för att få en aktuell bild av språkanvändningen på KTH och vilka problem som kan finnas. Första enkäten vände sig till alla lärare på KTH och gav över 500 svar. Andra enkäten gick till alla studenter och doktorander vid KTH och gav över 3 000 svar. Båda enkäterna visade att språkintresset är mycket stort och att både lärare och studenter gärna vill gå språkkurser. Många öppna svar bekräftar resultaten från en liknande enkät vid SU [1], till exempel de komplexa problem som uppstår då masterprogram börjar undervisas på engelska. Lärarenkäten visade att bara 40 % av kurserna följer språkpolicyns rekommendationer att presentera fackterminologin både på svenska och engelska. Både lärare och studenter anser att det är ett problem att studenterna inte har parallell fackspråkskompetens på svenska och engelska. Många studenter på masterprogrammen exponeras inte heller för det svenska fackspråket vilket kan ge problem i examensarbetet och vid första anställningen. För att komma tillrätta med fackspråksproblematiken har språkkommittén gett ett seminarium om fackspråk i samarbete med TNC och anordnat en workshop för lärare där idéer till hur parallellspråkig terminologi i undervisningen ska tillhandahållas och övas. Idéerna har sedan sammanställts, strukturerats och publicerats på språkkommitténs webbplats, där såväl KTH:s lärare som övriga intresserade kan hitta dem och inspireras av dem. På webbplatsen finns också länkar till språkresurser och språkverktyg, bland annat en svensk-engelsk KTH-ordbok med ettusen administrativa termer, som språkkommittén utvecklat för att den engelska terminologin ska bli mindre yvig. En webbsida med vanliga språkliga frågor svarar till exempel på frågor om användning av svenska och engelska vid examination, i examensarbetsrapporter och avhandlingar. Lärare och administratörer vid KTH kan prenumerera på språkkommitténs gruppwebb och får därigenom meddelande om när nyheter läggs upp på webben. Språkkommitténs aktiva arbete med att medvetandegöra språkfrågor har gjort språkpolicyns genomslag större på KTH.

  • 42.
    Karlgren, Jussi
    Stockholm University.
    A Computer Program for Recognizing Blazons1988Independent thesis Basic level (degree of Bachelor), 20hpOppgave
    Abstract [en]

    This candidate of philosophy thesis describes a computer program which analyzes so called blazons, i.e., classic descriptions of heraldic coats-of-arms. If an expression is recognized as an acceptable blazon, the program produces a graphic representation of the coat-of-arms in question on screen.

  • 43.
    Karlgren, Jussi
    Stockholm University, SICS.
    Stylistic Experiments for Information Retrieval2000Doktoravhandling, monografi (Annet vitenskapelig)
    Abstract [en]

    Information retrieval systems are built to handle texts as topical items:texts are tabulated by occurrence frequencies of content words in them,under the assumption that text topic is reasonably well modeled by contentword occurrence. But texts have several interesting characteristics beyondtopic. The experiments described in this text investigate {\em stylisticvariation}. Roughly put, style is the difference between two ways of sayingthe same thing --- and systematic stylistic variation can be used tocharacterize the {\em genre} of documents. These experiments investigate ifstylistic information is distinguishable using simple language engineeringmethods, and if in that case this type of information can be used toimprove information retrieval systems.

    A first set of experiments shows that simple measures of stylisticvariation can be used to distinguish genres from each other quiteadequately; how well depends on what the genres in question are.

    A second set of experiments evaluates the utility of stylistic measures forthe purposes of information retrieval, to identify common characteristicsof relevant and non-relevant documents. The conclusion is that the requestsfor information as typically expressed to retrieval systems are too terseand inspecific for non-topical information to improve retrieval results.Systems for information access need to be designed from the beginning tohandle richer information about the texts and documents at hand:information about stylistic variation cannot easily be added to an existingsystem.

    A third set of experiments explores how an interactive system can bedesigned to incorporate stylistic information in the interface between userand system. These experiments resulted in the design an interface forcategorizing retrieval results by genre, and displaying the retrievalresults using this categorization. This interface is integrated into aprototype for retrieving information from the World Wide Web.

  • 44.
    Kjellström, Hedvig
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Engwall, Olov
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Talteknologi, CTT. KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Abdou, Sherif
    Bälter, Olle
    KTH, Skolan för datavetenskap och kommunikation (CSC), Människa-datorinteraktion, MDI.
    Audio-visual phoneme classification for pronunciation training applications2007Inngår i: INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2007, s. 57-60Konferansepaper (Fagfellevurdert)
    Abstract [en]

    We present a method for audio-visual classification of Swedish phonemes, to be used in computer-assisted pronunciation training. The probabilistic kernel-based method is applied to the audio signal and/or either a principal or an independent component (PCA or ICA) representation of the mouth region in video images. We investigate which representation (PCA or ICA) that may be most suitable and the number of components required in the base, in order to be able to automatically detect pronunciation errors in Swedish from audio-visual input. Experiments performed on one speaker show that the visual information help avoiding classification errors that would lead to gravely erroneous feedback to the user; that it is better to perform phoneme classification on audio and video separately and then fuse the results, rather than combining them before classification; and that PCA outperforms ICA for fewer than 50 components.

  • 45. Kousidis, Spyros
    et al.
    Malisz, Zofia
    Wagner, Petra
    Schlangen, David
    Exploring annotation of head gesture forms in spontaneous human interaction.2013Inngår i: TiGeR 2013, Tilburg Gesture Research Meeting, 2013Konferansepaper (Fagfellevurdert)
  • 46.
    Lindborg, PerMagnus
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik. Nanyang Technological University, Singapore.
    About TreeTorika: Rhetoric, CAAC and Mao2008Inngår i: OM Composer’s Book #2 / [ed] Bresson, J., Agon C. & Assayag G., Paris, France: Éditions Delatour France / IRCAM - Centre Pompidou , 2008, s. 95-116Kapittel i bok, del av antologi (Fagfellevurdert)
    Abstract [en]

    This chapter examines computer assisted analysis and composition (CAAC) techniquesin relation to the composition of my piece TreeTorika for chamber orchestra. I describemethods for analysing the musical features of a recording of a speech by Mao Zedong,in order to extract compositional material such as global form, melody, harmony andrhythm, and for developing rhythmic material. The first part focuses on large scalesegmentation, melody transcription, quantification and quantization. Automatic tran-scription of the voice was discarded in favour of an aural method using tools in Amadeusand Max/MSP. The data was processed in OpenMusic to optimise the accuracy and read-ability of the notation. The harmonic context was derived from the transcribed melodyand from AudioSculpt partial tracking and chord-sequence analyses. The second partof this chapter describes one aspect of computer assisted composition, that is the useof the rhythm constraint library in OpenMusic to develop polyrhythmic textures. Theflexibility of these techniques allowed the computer to assist me in all but the final phasesof the work. In addition, attention is given to the artistic and political implications ofusing recordings of such a disputed public figure as Mao.

  • 47.
    Ma, Zhanyu
    et al.
    KTH, Skolan för elektro- och systemteknik (EES), Ljud- och bildbehandling.
    Leijon, Arne
    KTH, Skolan för elektro- och systemteknik (EES), Ljud- och bildbehandling.
    Human Audio-Visual Consonant Recognition Analyzed with Three Bimodal Integration Models2009Inngår i: INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2009, s. 812-815Konferansepaper (Fagfellevurdert)
    Abstract [en]

    With A-V recordings. ten normal hearing people took recognition tests at different signal-to-noise ratios (SNR). The AV recognition results are predicted by the fuzzy logical model of perception (FLMP) and the post-labelling integration model (POSTL). We also applied hidden Markov models (HMMs) and multi-stream HMMs (MSHMMs) for the recognition. As expected, all the models agree qualitatively with the results that the benefit gained from the visual signal is larger at lower acoustic SNRs. However, the FLMP severely overestimates the AV integration result, while the POSTL model underestimates it. Our automatic speech recognizers integrated the audio and visual stream efficiently. The visual automatic speech recognizer could be adjusted to correspond to human visual performance. The MSHMMs combine the audio and visual streams efficiently, but the audio automatic speech recognizer must be further improved to allow precise quantitative comparisons with human audio-visual performance.

  • 48.
    Malisz, Zofia
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    O'Dell, Michael
    University of Tampere.
    Nieminen, Tommi
    University of Eastern Finland.
    Wagner, Petra
    Bielefeld University.
    Perspectives on speech timing: coupled oscillator modeling of Polish and Finnish2016Inngår i: Phonetica, ISSN 0031-8388, E-ISSN 1423-0321, Vol. 73, nr 3-4Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    We use an updated version of the Coupled Oscillator Model of speech timing and rhythm variability (O'Dell and Nieminen, 1999;2009) to analyze empirical duration data for Polish spoken at different tempos. We use Bayesian inference on parameters relating to speech rate to investigate how tempo affects timing in Polish. The model parameters found are then compared to parameters obtained for equivalent material in Finnish to shed light on which of the effects represent general speech rate mechanisms and which are specific to Polish. We discuss the model and its predictions in the context of current perspectives on speech timing.

  • 49. Malisz, Zofia
    et al.
    Włodarczak, Marcin
    Buschmeier, Hendrik
    Kopp, Stefan
    Wagner, Petra
    Prosodic characteristics of feedback expressions in distracted and non-distracted listeners2012Konferansepaper (Fagfellevurdert)
    Abstract [en]

    In a previous study (Buschmeier et al., INTERSPEECH-2011) we investigated properties of communicative feedback produced by attentive and non-attentive listeners in dialogue. Distracted listeners were found to produce less feedback communicating understanding. Here, we assess the role of prosody in differentiating between feedback functions. We find significant differences across all studied prosodic dimensions as well as influences of lexical form and phonetic structure on feedback function categorisation. We also show that differences in prosodic features between attentiveness states exist, e.g., in overall intensity.

  • 50.
    Malisz, Zofia
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation. Saarland University, Germany.
    Włodarczak, Marcin
    Stockholms Universitet.
    Buschmeier, Hendrik
    Bielefeld University.
    Skubisz, Joanna
    Universidade Nova de Lisboa.
    Kopp, Stefan
    Bielefeld University.
    Wagner, Petra
    Bielefeld University.
    The ALICO corpus: analysing the active listener2016Inngår i: Language resources and evaluation, ISSN 1574-020X, E-ISSN 1574-0218, Vol. 50, nr 2, s. 411-442Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    The Active Listening Corpus (ALICO) is a multimodal data set of spontaneous dyadic conversations in German with diverse speech and gestural annotations of both dialogue partners. The annotations consist of short feedback expression transcriptions with corresponding communicative function interpretations as well as segmentations of interpausal units, words, rhythmic prominence intervals and vowel-to-vowel intervals. Additionally, ALICO contains head gesture annotations of both interlocutors. The corpus contributes to research on spontaneous human–human interaction, on functional relations between modalities, and timing variability in dialogue. It also provides data that differentiates between distracted and attentive listeners. We describe the main characteristics of the corpus and briefly present the most important results obtained from analyses in recent years.

12 1 - 50 of 80
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf