kth.sePublications
Change search
Link to record
Permanent link

Direct link
Publications (10 of 85) Show all publications
Adiban, M., Siniscalchi, S. M. & Salvi, G. (2023). A step-by-step training method for multi generator GANs with application to anomaly detection and cybersecurity. Neurocomputing, 537, 296-308
Open this publication in new window or tab >>A step-by-step training method for multi generator GANs with application to anomaly detection and cybersecurity
2023 (English)In: Neurocomputing, ISSN 0925-2312, E-ISSN 1872-8286, Vol. 537, p. 296-308Article in journal (Refereed) Published
Abstract [en]

Cyber attacks and anomaly detection are problems where the data is often highly unbalanced towards normal observations. Furthermore, the anomalies observed in real applications may be significantly different from the ones contained in the training data. It is, therefore, desirable to study methods that are able to detect anomalies only based on the distribution of the normal data. To address this problem, we propose a novel objective function for generative adversarial networks (GANs), referred to as STEPGAN. STEP-GAN simulates the distribution of possible anomalies by learning a modified version of the distribution of the task-specific normal data. It leverages multiple generators in a step-by-step interaction with a discriminator in order to capture different modes in the data distribution. The discriminator is optimized to distinguish not only between normal data and anomalies but also between the different generators, thus encouraging each generator to model a different mode in the distribution. This reduces the well-known mode collapse problem in GAN models considerably. We tested our method in the areas of power systems and network traffic control systems (NTCSs) using two publicly available highly imbalanced datasets, ICS (Industrial Control System) security dataset and UNSW-NB15, respectively. In both application domains, STEP-GAN outperforms the state-of-the-art systems as well as the two baseline systems we implemented as a comparison. In order to assess the generality of our model, additional experiments were carried out on seven real-world numerical datasets for anomaly detection in a variety of domains. In all datasets, the number of normal samples is significantly more than that of abnormal samples. Experimental results show that STEP-GAN outperforms several semi-supervised methods while being competitive with supervised methods.

Place, publisher, year, edition, pages
Elsevier BV, 2023
Keywords
Anomaly detection, One -class classification, GAN, Mode collapse, Cyber security
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-327437 (URN)10.1016/j.neucom.2023.03.056 (DOI)000978367500001 ()2-s2.0-85151669864 (Scopus ID)
Note

QC 20230529

Available from: 2023-05-29 Created: 2023-05-29 Last updated: 2023-05-29Bibliographically approved
Cao, X., Fan, Z., Svendsen, T. & Salvi, G. (2023). An Analysis of Goodness of Pronunciation for Child Speech. In: Interspeech 2023: . Paper presented at 24th International Speech Communication Association, Interspeech 2023, Dublin, Ireland, Aug 20 2023 - Aug 24 2023 (pp. 4613-4617). International Speech Communication Association
Open this publication in new window or tab >>An Analysis of Goodness of Pronunciation for Child Speech
2023 (English)In: Interspeech 2023, International Speech Communication Association , 2023, p. 4613-4617Conference paper, Published paper (Refereed)
Abstract [en]

In this paper, we study the use of goodness of pronunciation (GOP) on child speech. We first compare the distributions of GOP scores on several open datasets representing various dimensions of speech variability. We show that the GOP distribution over CMU Kids, corresponding to young age, has larger spread than those on datasets representing other dimensions, i.e., accent, dialect, spontaneity and environmental conditions. We hypothesize that the increased variability of pronunciation in young age may impair the use of traditional mispronunciation detection methods for children. To support this hypothesis, we perform simulated mispronunciation experiments both for children and adults using different variants of the GOP algorithm. We also compare the results to real-case mispronunciations for native children showing that GOP is less effective for child speech than for adult speech.

Place, publisher, year, edition, pages
International Speech Communication Association, 2023
Keywords
ASR, child speech, data scarcity, GOP, mispronunciation detection and diagnosis, speech assessment
National Category
Computer Sciences Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-337872 (URN)10.21437/Interspeech.2023-743 (DOI)2-s2.0-85171580096 (Scopus ID)
Conference
24th International Speech Communication Association, Interspeech 2023, Dublin, Ireland, Aug 20 2023 - Aug 24 2023
Note

QC 20231010

Available from: 2023-10-10 Created: 2023-10-10 Last updated: 2023-10-10Bibliographically approved
Stenwig, E., Salvi, G., Rossi, P. S. & Skjaervold, N. K. (2023). Comparison of correctly and incorrectly classified patients for in-hospital mortality prediction in the intensive care unit. BMC Medical Research Methodology, 23(1), Article ID 102.
Open this publication in new window or tab >>Comparison of correctly and incorrectly classified patients for in-hospital mortality prediction in the intensive care unit
2023 (English)In: BMC Medical Research Methodology, E-ISSN 1471-2288, Vol. 23, no 1, article id 102Article in journal (Refereed) Published
Abstract [en]

Background

The use of machine learning is becoming increasingly popular in many disciplines, but there is still an implementation gap of machine learning models in clinical settings. Lack of trust in models is one of the issues that need to be addressed in an effort to close this gap. No models are perfect, and it is crucial to know in which use cases we can trust a model and for which cases it is less reliable.

Methods

Four different algorithms are trained on the eICU Collaborative Research Database using similar features as the APACHE IV severity-of-disease scoring system to predict hospital mortality in the ICU. The training and testing procedure is repeated 100 times on the same dataset to investigate whether predictions for single patients change with small changes in the models. Features are then analysed separately to investigate potential differences between patients consistently classified correctly and incorrectly.

Results

A total of 34 056 patients (58.4%) are classified as true negative, 6 527 patients (11.3%) as false positive, 3 984 patients (6.8%) as true positive, and 546 patients (0.9%) as false negatives. The remaining 13 108 patients (22.5%) are inconsistently classified across models and rounds. Histograms and distributions of feature values are compared visually to investigate differences between groups.ConclusionsIt is impossible to distinguish the groups using single features alone. Considering a combination of features, the difference between the groups is clearer. Incorrectly classified patients have features more similar to patients with the same prediction rather than the same outcome.

Place, publisher, year, edition, pages
Springer Nature, 2023
Keywords
Machine learning, Explainability, Mortality prediction, eICU, SHAP values
National Category
Computer Sciences Clinical Medicine
Identifiers
urn:nbn:se:kth:diva-327383 (URN)10.1186/s12874-023-01921-9 (DOI)000974652000002 ()37095430 (PubMedID)2-s2.0-85153687506 (Scopus ID)
Note

QC 20230526

Available from: 2023-05-26 Created: 2023-05-26 Last updated: 2024-01-17Bibliographically approved
Getman, Y., Phan, N., Al-Ghezi, R., Voskoboinik, E., Singh, M., Grosz, T., . . . Ylinen, S. (2023). Developing an AI-Assisted Low-Resource Spoken Language Learning App for Children. IEEE Access, 11, 86025-86037
Open this publication in new window or tab >>Developing an AI-Assisted Low-Resource Spoken Language Learning App for Children
Show others...
2023 (English)In: IEEE Access, E-ISSN 2169-3536, Vol. 11, p. 86025-86037Article in journal (Refereed) Published
Abstract [en]

Computer-assisted Language Learning (CALL) is a rapidly developing area accelerated by advancements in the field of AI. A well-designed and reliable CALL system allows students to practice language skills, like pronunciation, any time outside of the classroom. Furthermore, gamification via mobile applications has shown encouraging results on learning outcomes and motivates young users to practice more and perceive language learning as a positive experience. In this work, we adapt the latest speech recognition technology to be a part of an online pronunciation training system for small children. As part of our gamified mobile application, our models will assess the pronunciation quality of young Swedish children diagnosed with Speech Sound Disorder, and participating in speech therapy. Additionally, the models provide feedback to young non-native children learning to pronounce Swedish and Finnish words. Our experiments revealed that these new models fit into an online game as they function as speech recognizers and pronunciation evaluators simultaneously. To make our systems more trustworthy and explainable, we investigated whether the combination of modern input attribution algorithms and time-aligned transcripts can explain the decisions made by the models, give us insights into how the models work and provide a tool to develop more reliable solutions.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2023
Keywords
ASR, children's speech, L2 speech, speech rating, SSD, wav2vec2
National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-335200 (URN)10.1109/ACCESS.2023.3304274 (DOI)001051656600001 ()2-s2.0-85167833032 (Scopus ID)
Note

QC 20230925

Available from: 2023-09-25 Created: 2023-09-25 Last updated: 2023-09-25Bibliographically approved
Rugayan, J., Salvi, G. & Svendsen, T. (2023). Perceptual and Task-Oriented Assessment of a Semantic Metric for ASR Evaluation. In: Interspeech 2023: . Paper presented at 24th International Speech Communication Association, Interspeech 2023, Dublin, Ireland, Aug 20 2023 - Aug 24 2023 (pp. 2158-2162). International Speech Communication Association
Open this publication in new window or tab >>Perceptual and Task-Oriented Assessment of a Semantic Metric for ASR Evaluation
2023 (English)In: Interspeech 2023, International Speech Communication Association , 2023, p. 2158-2162Conference paper, Published paper (Refereed)
Abstract [en]

Automatic speech recognition (ASR) systems have become a vital part of our everyday lives through their many applications. However, as much as we have developed in this regard, our most common evaluation method for ASR systems still remains to be word error rate (WER). WER does not give information on the severity of errors, which strongly impacts practical performance. As such, we examine a semantic-based metric called Aligned Semantic Distance (ASD) against WER and demonstrate its advantage over WER in two facets. First, we conduct a survey asking participants to score reference text and ASR transcription pairs. We perform a correlation analysis and show that ASD is more correlated to the human evaluation scores compared to WER. We also explore the feasibility of predicting human perception using ASD. Second, we demonstrate that ASD is more effective than WER as an indicator of performance on downstream NLP tasks such as named entity recognition and sentiment classification.

Place, publisher, year, edition, pages
International Speech Communication Association, 2023
Keywords
ASR evaluation metric, semantic context, user perception
National Category
Computer Sciences Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-337837 (URN)10.21437/Interspeech.2023-1778 (DOI)2-s2.0-85171598286 (Scopus ID)
Conference
24th International Speech Communication Association, Interspeech 2023, Dublin, Ireland, Aug 20 2023 - Aug 24 2023
Note

QC 20231009

Available from: 2023-10-09 Created: 2023-10-09 Last updated: 2023-10-09Bibliographically approved
Shahrebabaki, A. S., Salvi, G., Svendsen, T. & Siniscalchi, S. M. (2022). Acoustic-to-Articulatory Mapping With Joint Optimization of Deep Speech Enhancement and Articulatory Inversion Models. IEEE/ACM transactions on audio, speech, and language processing, 30, 135-147
Open this publication in new window or tab >>Acoustic-to-Articulatory Mapping With Joint Optimization of Deep Speech Enhancement and Articulatory Inversion Models
2022 (English)In: IEEE/ACM transactions on audio, speech, and language processing, ISSN 2329-9290, Vol. 30, p. 135-147Article in journal (Refereed) Published
Abstract [en]

We investigate the problem of speaker independent acoustic-to-articulatory inversion (AAI) in noisy conditions within the deep neural network (DNN) framework. In contrast with recent results in the literature, we argue that a DNN vector-to-vector regression front-end for speech enhancement (DNN-SE) can play a key role in AAI when used to enhance spectral features prior to AAI back-end processing. We experimented with single- and multi-task training strategies for the DNN-SE block finding the latter to be beneficial to AAI. Furthermore, we show that coupling DNN-SE producing enhanced speech features with an AAI trained on clean speech outperforms a multi-condition AAI (AAI-MC) when tested on noisy speech. We observe a 15% relative improvement in the Pearson's correlation coefficient (PCC) between our system and AAI-MC at 0 dB signal-to-noise ratio on the Haskins corpus. Our approach also compares favourably against using a conventional DSP approach to speech enhancement (MMSE with IMCRA) in the front-end. Finally, we demonstrate the utility of articulatory inversion in a downstream speech application. We report significant WER improvements on an automatic speech recognition task in mismatched conditions based on the Wall Street Journal corpus (WSJ) when leveraging articulatory information estimated by AAI-MC system over spectral-alone speech features.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2022
Keywords
Noise measurement, Speech enhancement, Task analysis, Mel frequency cepstral coefficient, Training, Hidden Markov models, Deep learning, Deep neural network, acoustic-to-articulatory inversion, multi-task training, speaker independent models
National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-307335 (URN)10.1109/TASLP.2021.3133218 (DOI)000735507400007 ()2-s2.0-85121342065 (Scopus ID)
Note

QC 20220124

Available from: 2022-01-24 Created: 2022-01-24 Last updated: 2022-06-25Bibliographically approved
Ásgrímsson, D. S., González, I., Salvi, G. & Karoumi, R. (2022). Bayesian Deep Learning for Vibration-Based Bridge Damage Detection. In: Structural Integrity: (pp. 27-43). Springer Nature, 21
Open this publication in new window or tab >>Bayesian Deep Learning for Vibration-Based Bridge Damage Detection
2022 (English)In: Structural Integrity, Springer Nature , 2022, Vol. 21, p. 27-43Chapter in book (Refereed)
Abstract [en]

A machine learning approach to damage detection is presented for a bridge structural health monitoring (SHM) system. The method is validated on the renowned Z24 bridge benchmark dataset where a sensor instrumented, three-span bridge was monitored for almost a year before being deliberately damaged in a realistic and controlled way. Several damage cases were successfully detected, making this a viable approach in a data-based bridge SHM system. The method addresses directly a critical issue in most data-based SHM systems, which is that the collected training data will not contain all natural weather events and load conditions. A SHM system that is trained on such limited data must be able to handle uncertainty in its predictions to prevent false damage detections. A Bayesian autoencoder neural network is trained to reconstruct raw sensor data sequences, with uncertainty bounds in prediction. The uncertainty-adjusted reconstruction error of an unseen sequence is compared to a healthy-state error distribution, and the sequence is accepted or rejected based on the fidelity of the reconstruction. If the proportion of rejected sequences goes over a predetermined threshold, the bridge is determined to be in a damaged state. This is a fully operational, machine learning-based bridge damage detection system that is learned directly from raw sensor data.

Place, publisher, year, edition, pages
Springer Nature, 2022
Series
Structural Integrity, ISSN 2522-560X ; 21
Keywords
Autoencoders, Bayesian deep learning, Bridge damage detection, Machine learning, Structural health monitoring, Z24 bridge benchmark
National Category
Infrastructure Engineering
Identifiers
urn:nbn:se:kth:diva-312838 (URN)10.1007/978-3-030-81716-9_2 (DOI)2-s2.0-85117941432 (Scopus ID)
Note

QC 20220530

Available from: 2022-05-30 Created: 2022-05-30 Last updated: 2022-06-25Bibliographically approved
Stenwig, E., Salvi, G., Salvo Rossi, P. & Skjærvold, N. K. (2022). Comparative analysis of explainable machine learning prediction models for hospital mortality. BMC Medical Research Methodology, 22(1), Article ID 53.
Open this publication in new window or tab >>Comparative analysis of explainable machine learning prediction models for hospital mortality
2022 (English)In: BMC Medical Research Methodology, E-ISSN 1471-2288, Vol. 22, no 1, article id 53Article in journal (Refereed) Published
Place, publisher, year, edition, pages
Springer Nature, 2022
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-324765 (URN)10.1186/s12874-022-01540-w (DOI)000761480900005 ()35220950 (PubMedID)2-s2.0-85125432454 (Scopus ID)
Projects
IGLU
Note

QC 20230404

Available from: 2023-03-15 Created: 2023-03-15 Last updated: 2024-01-17Bibliographically approved
Adiban, M., Siniscalchi, M., Stefanov, K. & Salvi, G. (2022). Hierarchical Residual Learning Based Vector Quantized Variational Autoencorder for Image Reconstruction and Generation. In: The 33rd British Machine Vision Conference Proceedings: . Paper presented at 33rd British Machine Vision Conference.
Open this publication in new window or tab >>Hierarchical Residual Learning Based Vector Quantized Variational Autoencorder for Image Reconstruction and Generation
2022 (English)In: The 33rd British Machine Vision Conference Proceedings, 2022Conference paper, Published paper (Refereed)
Abstract [en]

We propose a multi-layer variational autoencoder method, we call HR-VQVAE, thatlearns hierarchical discrete representations of the data. By utilizing a novel objectivefunction, each layer in HR-VQVAE learns a discrete representation of the residual fromprevious layers through a vector quantized encoder. Furthermore, the representations ateach layer are hierarchically linked to those at previous layers. We evaluate our methodon the tasks of image reconstruction and generation. Experimental results demonstratethat the discrete representations learned by HR-VQVAE enable the decoder to reconstructhigh-quality images with less distortion than the baseline methods, namely VQVAE andVQVAE-2. HR-VQVAE can also generate high-quality and diverse images that outperform state-of-the-art generative models, providing further verification of the efficiency ofthe learned representations. The hierarchical nature of HR-VQVAE i) reduces the decoding search time, making the method particularly suitable for high-load tasks and ii) allowsto increase the codebook size without incurring the codebook collapse problem.

National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-324770 (URN)
Conference
33rd British Machine Vision Conference
Note

QC 20230320

Available from: 2023-03-15 Created: 2023-03-15 Last updated: 2023-03-20Bibliographically approved
Abdelnour, J., Rouat, J. & Salvi, G. (2022). NAAQA: A Neural Architecture for Acoustic Question Answering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1-12
Open this publication in new window or tab >>NAAQA: A Neural Architecture for Acoustic Question Answering
2022 (English)In: IEEE Transactions on Pattern Analysis and Machine Intelligence, ISSN 0162-8828, E-ISSN 1939-3539, p. 1-12Article in journal (Refereed) Published
Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2022
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-324766 (URN)10.1109/tpami.2022.3194311 (DOI)000947840300064 ()36121954 (PubMedID)2-s2.0-85139450848 (Scopus ID)
Projects
IGLU
Note

QC 20230320

Available from: 2023-03-15 Created: 2023-03-15 Last updated: 2023-09-21Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-3323-5311

Search in DiVA

Show all publications