kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Towards Scalable Machine Learning with Privacy Protection
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Decision and Control Systems (Automatic Control).ORCID iD: 0000-0002-5530-2714
2023 (English)Licentiate thesis, monograph (Other academic)
Abstract [en]

The increasing size and complexity of datasets have accelerated the development of machine learning models and exposed the need for more scalable solutions. This thesis explores challenges associated with large-scale machine learning under data privacy constraints. With the growth of machine learning models, traditional privacy methods such as data anonymization are becoming insufficient. Thus, we delve into alternative approaches, such as differential privacy.

Our research addresses the following core areas in the context of scalable privacy-preserving machine learning: First, we examine the implications of data dimensionality on privacy for the application of medical image analysis. We extend the classification algorithm Private Aggregation of Teacher Ensembles (PATE) to deal with high-dimensional labels, and demonstrate that dimensionality reduction can be used to improve privacy. Second, we consider the impact of hyperparameter selection on privacy. Here, we propose a novel adaptive technique for hyperparameter selection in differentially gradient-based optimization. Third, we investigate sampling-based solutions to scale differentially private machine learning to dataset with a large number of records. We study the privacy-enhancing properties of importance sampling, highlighting that it can outperform uniform sub-sampling not only in terms of sample efficiency but also in terms of privacy.

The three techniques developed in this thesis improve the scalability of machine learning while ensuring robust privacy protection, and aim to offer solutions for the effective and safe application of machine learning in large datasets.

Abstract [sv]

Den ständigt ökande storleken och komplexiteten hos datamängder har accelererat utvecklingen av maskininlärningsmodeller och gjort behovet av mer skalbara lösningar alltmer uppenbart. Den här avhandlingen utforskar tre utmaningar förknippade med storskalig maskininlärning under dataskyddskrav. För stora och komplexa maskininlärningsmodeller blir traditionella metoder för integritet, såsom datananonymisering, otillräckliga. Vi undersöker därför alternativa tillvägagångssätt, såsom differentiell integritet.

Vår forskning behandlar följande utmaningar inom skalbar och integitetsmedveten maskininlärning: För det första undersöker vi hur hög data-dimensionalitet påverkar integriteten för medicinsk bildanalys. Vi utvidgar klassificeringsalgoritmen Private Aggregation of Teacher Ensembles (PATE) för att hantera högdimensionella etiketter och visar att dimensionsreducering kan användas för att förbättra integriteten. För det andra studerar vi hur valet av hyperparametrar påverkar integriteten. Här föreslår vi en ny adaptiv teknik för val av hyperparametrar i gradient-baserad optimering med garantier på differentiell integritet. För det tredje granskar vi urvalsbaserade lösningar för att skala differentiellt privat maskininlärning till stora datamängder. Vi studerar de integritetsförstärkande egenskaperna hos importance sampling och visar att det kan överträffa ett likformigt urval av sampel, inte bara när det gäller effektivitet utan även för integritet.

De tre teknikerna som utvecklats i denna avhandling förbättrar skalbarheten för integritetsskyddad maskininlärning och syftar till att erbjuda lösningar för effektiv och säker tillämpning av maskininlärning på stora datamängder.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2023. , p. xi, 94
Series
TRITA-EECS-AVL ; 2023:79
Keywords [en]
Machine Learning, Privacy, Differential Privacy, Dimensionality Reduction, Image Segmentation, Hyperparameter Selection, Adaptive Optimization, Privacy Amplification, Importance Sampling
Keywords [sv]
Maskininlärning, Dataskydd, Differentiell Integritet, Dimensionsreducering, Bildsegmentering, Hyperparameterurval, Adaptiv Optimering, Integritetsförstärkning, Importance Sampling
National Category
Computer Sciences
Research subject
Computer Science; Information and Communication Technology
Identifiers
URN: urn:nbn:se:kth:diva-338979ISBN: 978-91-8040-751-9 (print)OAI: oai:DiVA.org:kth-338979DiVA, id: diva2:1808715
Presentation
2023-11-21, D31, Lindstedtsvägen 9, Stockholm, 10:00 (English)
Opponent
Supervisors
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP), 3309
Note

QC 20231101

Available from: 2023-11-01 Created: 2023-10-31 Last updated: 2023-11-06Bibliographically approved

Open Access in DiVA

Full text(15634 kB)604 downloads
File information
File name FULLTEXT02.pdfFile size 15634 kBChecksum SHA-512
a490fca5462999f4cbe26b3c260499115f3fe6e5b6fa201cf7d53f153d6a17ea7ba7b17cd1383f0b05ba8c7dfc692b3f1474fe522f2a5f438bbce10047334feb
Type fulltextMimetype application/pdf

Authority records

Fay, Dominik

Search in DiVA

By author/editor
Fay, Dominik
By organisation
Decision and Control Systems (Automatic Control)
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 631 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 670 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf