Endre søk
Begrens søket
1 - 19 of 19
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Treff pr side
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Disputationsdatum (tidligste først)
  • Disputationsdatum (siste først)
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Disputationsdatum (tidligste først)
  • Disputationsdatum (siste først)
Merk
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 1. Bellec, G.
    et al.
    Elowsson, Anders
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Friberg, Anders
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Wolff, D.
    Weyde, T.
    A social network integrated game experiment to relate tapping to speed perception and explore rhythm reproduction2013Inngår i: Proceedings of the Sound and Music Computing Conference 2013, 2013, s. 19-26Konferansepaper (Fagfellevurdert)
    Abstract [en]

    During recent years, games with a purpose (GWAPs) have become increasingly popular for studying human behaviour [1–4]. However, no standardised method for web-based game experiments has been proposed so far. We present here our approach comprising an extended version of the CaSimIR social game framework [5] for data collection, mini-games for tempo and rhythm tapping, and an initial analysis of the data collected so far. The game presented here is part of the Spot The Odd Song Out game, which is freely available for use on Facebook and on the Web 1 .We present the GWAP method in some detail and a preliminary analysis of data collected. We relate the tapping data to perceptual ratings obtained in previous work. The results suggest that the tapped tempo data collected in a GWAP can be used to predict perceived speed. I toned down the above statement as I understand from the results section that our data are not as good as When averagingthe rhythmic performances of a group of 10 players in the second experiment, the tapping frequency shows a pattern that corresponds to the time signature of the music played. Our experience shows that more effort in design and during runtime is required than in a traditional experiment. Our experiment is still running and available on line.

  • 2.
    Dubois, Juliette
    et al.
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Tal, musik och hörsel, TMH. ENSTA ParisTech.
    Elovsson, Anders
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Tal, musik och hörsel, TMH.
    Friberg, Anders
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Tal, musik och hörsel, TMH.
    Predicting Perceived Dissonance of Piano Chords Using a Chord-Class Invariant CNN and Deep Layered Learning2019Inngår i: Proceedings of 16th Sound & Music Computing Conference (SMC), Malaga, Spain, 2019, s. 530-536Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper presents a convolutional neural network (CNN) able to predict the perceived dissonance of piano chords. Ratings of dissonance for short audio excerpts were com- bined from two different datasets and groups of listeners. The CNN uses two branches in a directed acyclic graph (DAG). The first branch receives input from a pitch esti- mation algorithm, restructured into a pitch chroma. The second branch analyses interactions between close partials, known to affect our perception of dissonance and rough- ness. The analysis is pitch invariant in both branches, fa- cilitated by convolution across log-frequency and octave- wide max-pooling. Ensemble learning was used to im- prove the accuracy of the predictions. The coefficient of determination (R2) between rating and predictions are close to 0.7 in a cross-validation test of the combined dataset. The system significantly outperforms recent computational models. An ablation study tested the impact of the pitch chroma and partial analysis branches separately, conclud- ing that the deep layered learning approach with a pitch chroma was driving the high performance.

  • 3.
    Elowsson, Anders
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Beat Tracking with a Cepstroid Invariant Neural Network2016Inngår i: 17th International Society for Music Information Retrieval Conference (ISMIR 2016), International Society for Music Information Retrieval , 2016, s. 351-357Konferansepaper (Fagfellevurdert)
    Abstract [en]

    We present a novel rhythm tracking architecture that learns how to track tempo and beats through layered learning. A basic assumption of the system is that humans understand rhythm by letting salient periodicities in the music act as a framework, upon which the rhythmical structure is interpreted. Therefore, the system estimates the cepstroid (the most salient periodicity of the music), and uses a neural network that is invariant with regards to the cepstroid length. The input of the network consists mainly of features that capture onset characteristics along time, such as spectral differences. The invariant proper-ties of the network are achieved by subsampling the input vectors with a hop size derived from a musically relevant subdivision of the computed cepstroid of each song. The output is filtered to detect relevant periodicities and then used in conjunction with two additional networks, which estimates the speed and tempo of the music, to predict the final beat positions. We show that the architecture has a high performance on music with public annotations. 

  • 4.
    Elowsson, Anders
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Tal, musik och hörsel, TMH.
    Deep Layered Learning in MIRManuskript (preprint) (Annet vitenskapelig)
    Abstract [en]

    Deep learning has boosted the performance of many music information retrieval (MIR) systems in recent years. Yet, the complex hierarchical arrangement of music makes end-to-end learning hard for some MIR tasks – a very deep and structurally flexible processing chain is necessary to extract high-level features from a spectrogram representation. Mid-level representations such as tones, pitched onsets, chords, and beats are fundamental building blocks of music. This paper discusses how these can be used as intermediate representations in MIR to facilitate deep processing that generalizes well: each music concept is predicted individually in learning modules that are connected through latent representations in a directed acyclic graph. It is suggested that this strategy for inference, defined as deep layered learning (DLL), can help generalization by (1) – enforcing the validity of intermediate representations during processing, and by (2) – letting the inferred representations establish disentangled structures that support high-level invariant processing. A background to DLL and modular music processing is provided, and relevant concepts such as pruning, skip connections, and layered performance supervision are reviewed.

  • 5.
    Elowsson, Anders
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Tal, musik och hörsel, TMH, Musikakustik.
    Modeling Music: Studies of Music Transcription, Music Perception and Music Production2018Doktoravhandling, med artikler (Annet vitenskapelig)
    Abstract [en]

    This dissertation presents ten studies focusing on three important subfields of music information retrieval (MIR): music transcription (Part A), music perception (Part B), and music production (Part C).

    In Part A, systems capable of transcribing rhythm and polyphonic pitch are described. The first two publications present methods for tempo estimation and beat tracking. A method is developed for computing the most salient periodicity (the “cepstroid”), and the computed cepstroid is used to guide the machine learning processing. The polyphonic pitch tracking system uses novel pitch-invariant and tone-shift-invariant processing techniques. Furthermore, the neural flux is introduced – a latent feature for onset and offset detection. The transcription systems use a layered learning technique with separate intermediate networks of varying depth.  Important music concepts are used as intermediate targets to create a processing chain with high generalization. State-of-the-art performance is reported for all tasks.

    Part B is devoted to perceptual features of music, which can be used as intermediate targets or as parameters for exploring fundamental music perception mechanisms. Systems are proposed that can predict the perceived speed and performed dynamics of an audio file with high accuracy, using the average ratings from around 20 listeners as ground truths. In Part C, aspects related to music production are explored. The first paper analyzes long-term average spectrum (LTAS) in popular music. A compact equation is derived to describe the mean LTAS of a large dataset, and the variation is visualized. Further analysis shows that the level of the percussion is an important factor for LTAS. The second paper examines songwriting and composition through the development of an algorithmic composer of popular music. Various factors relevant for writing good compositions are encoded, and a listening test employed that shows the validity of the proposed methods.

    The dissertation is concluded by Part D - Looking Back and Ahead, which acts as a discussion and provides a road-map for future work. The first paper discusses the deep layered learning (DLL) technique, outlining concepts and pointing out a direction for future MIR implementations. It is suggested that DLL can help generalization by enforcing the validity of intermediate representations, and by letting the inferred representations establish disentangled structures supporting high-level invariant processing. The second paper proposes an architecture for tempo-invariant processing of rhythm with convolutional neural networks. Log-frequency representations of rhythm-related activations are suggested at the main stage of processing. Methods relying on magnitude, relative phase, and raw phase information are described for a wide variety of rhythm processing tasks.

  • 6.
    Elowsson, Anders
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Tal, musik och hörsel, TMH, Musikakustik.
    Polyphonic Pitch Tracking with Deep Layered LearningManuskript (preprint) (Annet vitenskapelig)
    Abstract [en]

    This paper presents a polyphonic pitch tracking system able to extract both framewise and note-based estimates from audio. The system uses six artificial neural networks in a deep layered learning setup. First, cascading networks are applied to a spectrogram for framewise fundamental frequency (f0) estimation. A sparse receptive field is learned by the first network and then used for weight-sharing throughout the system. The f0 activations are connected across time to extract pitch ridges. These ridges define a framework, within which subsequent networks perform tone-shift-invariant onset and offset detection. The networks convolve the pitch ridges across time, using as input, e.g., variations of latent representations from the f0 estimation networks, defined as the “neural flux.” Finally, incorrect tentative notes are removed one by one in an iterative procedure that allows a network to classify notes within an accurate context. The system was evaluated on four public test sets: MAPS, Bach10, TRIOS, and the MIREX Woodwind quintet, and performed state-of-the-art results for all four datasets. It performs well across all subtasks: f0, pitched onset, and pitched offset tracking.

  • 7.
    Elowsson, Anders
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Tal, musik och hörsel, TMH, Musikakustik.
    Tempo-Invariant Processing of Rhythm with Convolutional Neural NetworksManuskript (preprint) (Annet vitenskapelig)
    Abstract [en]

    Rhythm patterns can be performed with a wide variation of tempi. This presents a challenge for many music information retrieval (MIR) systems; ideally, perceptually similar rhythms should be represented and processed similarly, regardless of the specific tempo at which they were performed. Several recent systems for tempo estimation, beat tracking, and downbeat tracking have therefore sought to process rhythm in a tempo-invariant way, often by sampling input vectors according to a precomputed pulse level. This paper describes how a log-frequency representation of rhythm-related activations instead can promote tempo invariance when processed with convolutional neural networks. The strategy incorporates invariance at a fundamental level and can be useful for most tasks related to rhythm processing. Different methods are described, relying on magnitude, phase relationships of different rhythm channels, as well as raw phase information. Several variations are explored to provide direction for future implementations.

  • 8.
    Elowsson, Anders
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Friberg, Anders
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Algorithmic Composition of Popular Music2012Inngår i: Proceedings of the 12th International Conference on Music Perception and Cognition and the 8th Triennial Conference of the European Society for the Cognitive Sciences of Music / [ed] Emilios Cambouropoulos, Costas Tsourgas, Panayotis Mavromatis, Costas Pastiadis, 2012, s. 276-285Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Human  composers  have  used  formal  rules  for  centuries  to  compose music, and an algorithmic composer – composing without the aid of human intervention – can be seen as an extension of this technique. An algorithmic  composer  of  popular  music  (a  computer  program)  has been  created  with  the  aim  to  get  a  better  understanding  of  how  the composition process can be formalized and at the same time to get a better  understanding  of  popular  music  in  general.  With  the  aid  of statistical  findings  a  theoretical  framework  for  relevant  methods  are presented.  The concept of Global Joint Accent Structure is introduced, as a way of understanding how melody and rhythm interact to help the listener   form   expectations  about   future   events. Methods  of  the program   are   presented   with   references   to   supporting   statistical findings. The  algorithmic  composer  creates a  rhythmic  foundation (drums), a chord progression, a phrase structure and at last the melody. The main focus has been the composition of the melody. The melodic generation  is  based  on  ten  different  musical  aspects  which  are described. The resulting output was evaluated in a formal listening test where 14  computer  compositions  were  compared  with  21  human compositions. Results indicate a slightly lower score for the computer compositions but the differences were statistically insignificant.

  • 9.
    Elowsson, Anders
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Friberg, Anders
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Long-term Average Spectrum in Popular Music and its Relation to the Level of the Percussion2017Inngår i: AES 142nd Convention, Berlin, Germany, 2017Konferansepaper (Fagfellevurdert)
    Abstract [en]

    The spectral distribution of music audio has an important influence on listener perception, but large-scale charac- terizations are lacking. Therefore, the long-term average spectrum (LTAS) was analyzed for a large dataset of popular music. The mean LTAS was computed, visualized, and then approximated with two quadratic fittings. The fittings were subsequently used to derive the spectrum slope. By applying harmonic/percussive source sepa- ration, the relationship between LTAS and percussive prominence was investigated. A clear relationship was found; tracks with more percussion have a relatively higher LTAS in the bass and high frequencies. We show how this relationship can be used to improve targets in automatic equalization. Furthermore, we assert that variations in LTAS between genres is mainly a side-effect of percussive prominence.

  • 10.
    Elowsson, Anders
    et al.
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Tal, musik och hörsel, TMH.
    Friberg, Anders
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Tal, musik och hörsel, TMH.
    Modeling Music Modality with a Key-Class Invariant Pitch Chroma CNN2019Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper presents a convolutional neural network (CNN) that uses input from a polyphonic pitch estimation system to predict perceived minor/major modality in music audio. The pitch activation input is structured to allow the first CNN layer to compute two pitch chromas focused on dif-ferent octaves. The following layers perform harmony analysis across chroma and time scales. Through max pooling across pitch, the CNN becomes invariant with re-gards to the key class (i.e., key disregarding mode) of the music. A multilayer perceptron combines the modality ac-tivation output with spectral features for the final predic-tion. The study uses a dataset of 203 excerpts rated by around 20 listeners each, a small challenging data size re-quiring a carefully designed parameter sharing. With an R2 of about 0.71, the system clearly outperforms previous sys-tems as well as individual human listeners. A final ablation study highlights the importance of using pitch activations processed across longer time scales, and using pooling to facilitate invariance with regards to the key class.

  • 11.
    Elowsson, Anders
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Friberg, Anders
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Modeling the perception of tempo2015Inngår i: Journal of the Acoustical Society of America, ISSN 0001-4966, E-ISSN 1520-8524, Vol. 137, nr 6, s. 3163-3177Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    A system is proposed in which rhythmic representations are used to model the perception of tempo in music. The system can be understood as a five-layered model, where representations are transformed into higher-level abstractions in each layer. First, source separation is applied (Audio Level), onsets are detected (Onset Level), and interonset relationships are analyzed (Interonset Level). Then, several high-level representations of rhythm are computed (Rhythm Level). The periodicity of the music is modeled by the cepstroid vector-the periodicity of an interonset interval (IOI)-histogram. The pulse strength for plausible beat length candidates is defined by computing the magnitudes in different IOI histograms. The speed of the music is modeled as a continuous function on the basis of the idea that such a function corresponds to the underlying perceptual phenomena, and it seems to effectively reduce octave errors. By combining the rhythmic representations in a logistic regression framework, the tempo of the music is finally computed (Tempo Level). The results are the highest reported in a formal benchmarking test (2006-2013), with a P-Score of 0.857. Furthermore, the highest results so far are reported for two widely adopted test sets, with an Acc1 of 77.3% and 93.0% for the Songs and Ballroom datasets.

  • 12.
    Elowsson, Anders
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Friberg, Anders
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Modelling Perception of Speed in Music Audio2013Inngår i: Proceedings of the Sound and Music Computing Conference 2013, 2013, s. 735-741Konferansepaper (Fagfellevurdert)
    Abstract [en]

    One of the major parameters in music is the overall speed of a musical performance. Speed is often associated with tempo, but other factors such as note density (onsets per second) seem to be important as well. In this study, a computational model of speed in music audio has been developed using a custom set of rhythmic features. The original audio is first separated into a harmonic part and a percussive part and onsets are extracted separately from the different layers. The characteristics of each onset are determined based on frequency content as well as perceptual salience using a clustering approach. Using these separated onsets a set of eight features including a tempo estimation are defined which are specifically designed for modelling perceived speed. In a previous study 20 listeners rated the speed of 100 ringtones consisting mainly of popular songs, which had been converted from MIDI to audio. The ratings were used in linear regression and PLS regression in order to evaluate the validity of the model as well as to find appropriate features. The computed audio features were able to explain about 90 % of the variability in listener ratings.

  • 13.
    Elowsson, Anders
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Friberg, Anders
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Predicting the perception of performed dynamics in music audio with ensemble learning2017Inngår i: Journal of the Acoustical Society of America, ISSN 0001-4966, E-ISSN 1520-8524, Vol. 141, nr 3, s. 2224-2242Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    By varying the dynamics in a musical performance, the musician can convey structure and different expressions. Spectral properties of most musical instruments change in a complex way with the performed dynamics, but dedicated audio features for modeling the parameter are lacking. In this study, feature extraction methods were developed to capture relevant attributes related to spectral characteristics and spectral fluctuations, the latter through a sectional spectral flux. Previously, ground truths ratings of performed dynamics had been collected by asking listeners to rate how soft/loud the musicians played in a set of audio files. The ratings, averaged over subjects, were used to train three different machine learning models, using the audio features developed for the study as input. The highest result was produced from an ensemble of multilayer perceptrons with an R2 of 0.84. This result seems to be close to the upper bound, given the estimated uncertainty of the ground truth data. The result is well above that of individual human listeners of the previous listening experiment, and on par with the performance achieved from the average rating of six listeners. Features were analyzed with a factorial design, which highlighted the importance of source separation in the feature extraction.

  • 14.
    Elowsson, Anders
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Friberg, Anders
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Madison, Guy
    Paulin, Johan
    Modelling the Speed of Music Using Features from Harmonic/Percussive Separated Audio2013Inngår i: Proceedings of the 14th International Society for Music Information Retrieval Conference, 2013, s. 481-486Konferansepaper (Fagfellevurdert)
    Abstract [en]

    One of the major parameters in music is the overall speed of a musical performance. In this study, a computational model of speed in music audio has been developed using a custom set of rhythmic features. Speed is often associ-ated with tempo, but as shown in this study, factors such as note density (onsets per second) and spectral flux are important as well. The original audio was first separated into a harmonic part and a percussive part and the fea-tures were extracted separately from the different layers. In previous studies, listeners had rated the speed of 136 songs, and the ratings were used in a regression to evalu-ate the validity of the model as well as to find appropriate features. The final models, consisting of 5 or 8 features, were able to explain about 90% of the variation in the training set, with little or no degradation for the test set.

  • 15.
    Elowsson, Anders
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Schön, Ragnar
    KTH.
    Höglund, Matts
    KTH.
    Zea, Elías
    KTH, Skolan för teknikvetenskap (SCI), Farkost och flyg, MWL Marcus Wallenberg Laboratoriet.
    Friberg, Anders
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Estimation of vocal duration in monaural mixtures2014Inngår i: Proceedings - 40th International Computer Music Conference, ICMC 2014 and 11th Sound and Music Computing Conference, SMC 2014 - Music Technology Meets Philosophy: From Digital Echos to Virtual Ethos, National and Kapodistrian University of Athens , 2014, s. 1172-1177Konferansepaper (Fagfellevurdert)
    Abstract [en]

    In this study, the task of vocal duration estimation in monaural music mixtures is explored. We show how presently available algorithms for source separation and predominant f0 estimation can be used as a front end from which features can be extracted. A large set of features is presented, devised to connect different vocal cues to the presence of vocals. Two main cues are utilized; the voice is neither stable in pitch nor in timbre. We evaluate the performance of the model by estimating the length of the vocal regions of the mixtures. To facilitate this, a new set of annotations to a widely adopted data set is developed and made available to the community. The proposed model is able to explain about 78 % of the variance in vocal region length. In a classification task, where the excerpts are classified as either vocal or non-vocal, the model has an accuracy of about 0.94.

  • 16.
    Friberg, Anders
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Choi, K.
    Schön, Ragnar
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Downie, J. S.
    Elowsson, Anders
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Cross-cultural aspects of perceptual features in K-pop: A pilot study comparing Chinese and Swedish listeners2017Inngår i: 2017 ICMC/EMW - 43rd International Computer Music Conference and the 6th International Electronic Music Week, Shanghai Conservatory of Music , 2017, s. 291-296Konferansepaper (Fagfellevurdert)
    Abstract [en]

    In previous studies it has been shown that perceptual features can be used as an intermediate representation in music processing to model higher-level semantic descriptions. In this pilot study, we focused on the cross-cultural aspect of such perceptual features, by asking both Chinese and Swedish listeners to rate a set of K-Pop samples using a web-based questionnaire. The music samples were selected from a larger set, previously rated in terms of different emotion labels. The selection procedure of the subset was carefully designed to maximize both the variation of emotion and genre. The listeners rated eight perceptual features: dissonance, speed, rhythmic complexity, rhythmic clarity, articulation, harmonic complexity, modality, and pitch. The results indicated a small but significant difference in the two groups, regarding the average speed and rhythmic complexity. In particular the perceived speed of hip hop was different for the two groups. We discuss the overall consistency of the ratings using this methodology in relation to the interface, selection and number of subjects.

  • 17.
    Friberg, Anders
    et al.
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Tal, musik och hörsel, TMH.
    Lindeberg, Tony
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Beräkningsvetenskap och beräkningsteknik (CST).
    Hellwagner, Martin
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Tal, musik och hörsel, TMH.
    Helgason, Pétur
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Tal, musik och hörsel, TMH.
    Salomão, Gláucia Laís
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Tal, musik och hörsel, TMH.
    Elovsson, Anders
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Tal, musik och hörsel, TMH.
    Lemaitre, Guillaume
    Institute for Research and Coordination in Acoustics and Music, Paris, France.
    Ternström, Sten
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Tal, musik och hörsel, TMH.
    Prediction of three articulatory categories in vocal sound imitations using models for auditory receptive fields2018Inngår i: Journal of the Acoustical Society of America, ISSN 0001-4966, E-ISSN 1520-8524, Vol. 144, nr 3, s. 1467-1483Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Vocal sound imitations provide a new challenge for understanding the coupling between articulatory mechanisms and the resulting audio. In this study, we have modeled the classification of three articulatory categories, phonation, supraglottal myoelastic vibrations, and turbulence from audio recordings. Two data sets were assembled, consisting of different vocal imitations by four professional imitators and four non-professional speakers in two different experiments. The audio data were manually annotated by two experienced phoneticians using a detailed articulatory description scheme. A separate set of audio features was developed specifically for each category using both time-domain and spectral methods. For all time-frequency transformations, and for some secondary processing, the recently developed Auditory Receptive Fields Toolbox was used. Three different machine learning methods were applied for predicting the final articulatory categories. The result with the best generalization was found using an ensemble of multilayer perceptrons. The cross-validated classification accuracy was 96.8 % for phonation, 90.8 % for supraglottal myoelastic vibrations, and 89.0 % for turbulence using all the 84 developed features. A final feature reduction to 22 features yielded similar results.

  • 18.
    Friberg, Anders
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Schoonderwaldt, Erwin
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik. Hanover University, Germany .
    Hedblad, Anton
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Fabiani, Marco
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Elowsson, Anders
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Using listener-based perceptual features as intermediate representations in music information retrieval2014Inngår i: Journal of the Acoustical Society of America, ISSN 0001-4966, E-ISSN 1520-8524, Vol. 136, nr 4, s. 1951-1963Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    The notion of perceptual features is introduced for describing general music properties based on human perception. This is an attempt at rethinking the concept of features, aiming to approach the underlying human perception mechanisms. Instead of using concepts from music theory such as tones, pitches, and chords, a set of nine features describing overall properties of the music was selected. They were chosen from qualitative measures used in psychology studies and motivated from an ecological approach. The perceptual features were rated in two listening experiments using two different data sets. They were modeled both from symbolic and audio data using different sets of computational features. Ratings of emotional expression were predicted using the perceptual features. The results indicate that (1) at least some of the perceptual features are reliable estimates; (2) emotion ratings could be predicted by a small combination of perceptual features with an explained variance from 75% to 93% for the emotional dimensions activity and valence; (3) the perceptual features could only to a limited extent be modeled using existing audio features. Results clearly indicated that a small number of dedicated features were superior to a "brute force" model using a large number of general audio features.

  • 19.
    Friberg, Anders
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Schoonderwaldt, Erwin
    Hanover University of Music, Germany.
    Hedblad, Anton
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Fabiani, Marco
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Elowsson, Anders
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Musikakustik.
    Using perceptually defined music features in music information retrieval2014Manuskript (preprint) (Annet vitenskapelig)
    Abstract [en]

    In this study, the notion of perceptual features is introduced for describing general music properties based on human perception. This is an attempt at rethinking the concept of features, in order to understand the underlying human perception mechanisms. Instead of using concepts from music theory such as tones, pitches, and chords, a set of nine features describing overall properties of the music was selected. They were chosen from qualitative measures used in psychology studies and motivated from an ecological approach. The selected perceptual features were rated in two listening experiments using two different data sets. They were modeled both from symbolic (MIDI) and audio data using different sets of computational features. Ratings of emotional expression were predicted using the perceptual features. The results indicate that (1) at least some of the perceptual features are reliable estimates; (2) emotion ratings could be predicted by a small combination of perceptual features with an explained variance up to 90%; (3) the perceptual features could only to a limited extent be modeled using existing audio features. The results also clearly indicated that a small number of dedicated features were superior to a 'brute force' model using a large number of general audio features.

1 - 19 of 19
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf