Open this publication in new window or tab >>Show others...
2020 (English)In: 2020 IEEE International Conference on Big Data (Big Data), Institute of Electrical and Electronics Engineers (IEEE) , 2020, p. 3428-3437Conference paper, Published paper (Refereed)
Abstract [en]
Privacy preservation plays a vital role in health care applications as the requirements for privacy preservation are very strict in this domain. With the rapid increase in the amount, quality and detail of health data being gathered with smart devices, new mechanisms are required that can cope with the challenges of large scale and real-time processing requirements. Federated learning (FL) is one of the conventional approaches that facilitate the training of AI models without access to the raw data. However, recent studies have shown that FL alone does not guarantee sufficient privacy. Differential privacy (DP) is a well-known approach for privacy guarantees, however, because of the noise addition, DP needs to make a trade-off between privacy and accuracy. In this work, we design and implement an end-to-end pipeline using DP and FL for the first time in the context of health data streams. We propose a clustering mechanism to leverage the similarities between users to improve the prediction accuracy as well as significantly reduce the model training time. Depending on the dataset and features, our predictions are no more than 0.025% far off the ground-truth value with respect to the range of value. Moreover, our clustering mechanism brings a significant reduction in the training time, with up to 49% reduction in prediction accuracy error in the best case, as compared to training a single model on the entire dataset. Our proposed privacy preserving mechanism at best introduces a decrease of ≈ 2% in the prediction accuracy of the trained models. Furthermore, our proposed clustering mechanism reduces the prediction error even in highly noisy settings by as much as 38% as compared to using a single federated private model.
Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2020
Keywords
Federated Learning, Differential Privacy, Streaming k-means, Generative Adversarial Networks
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-295068 (URN)10.1109/BigData50022.2020.9378186 (DOI)000662554703071 ()2-s2.0-85103842271 (Scopus ID)
Conference
2020 IEEE International Conference on Big Data (Big Data)
Note
QC 20210602
2021-05-182021-05-182023-03-06Bibliographically approved