kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Evaluating Membership Inference Attacks on Synthetic Data Generated With Formal Privacy Guarantees
KTH, School of Electrical Engineering and Computer Science (EECS).
KTH, School of Electrical Engineering and Computer Science (EECS).
2023 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

Synthetic data generation using generative machine learning has been increasinglypublicized as a new tool for data anonymization. It promises to offer privacy whilemaintaining the statistical properties of the original dataset. This study focuses on the riskswith synthetic data by looking mainly at two aspects: privacy and utility. In terms of privacy,we consider what information can be inferred about the underlying dataset by accessing thesynthetic data. To test this, we launch membership inference attacks, which aim to determineif a given data point was used in the training of the generative model. We find that syntheticdata is at risk of considerable leakage for outlier data points, especially for generative modelswithout formal privacy guarantees. We also find that higher privacy comes at a considerablecost in data utility, i.e. how well the synthetic data reflects the raw dataset. With thesefindings we reassert the results of previous works. We also present new contributions in theform evaluating attacks with a cross validation method, an investigation of the connectionbetween the deviation of the point and its susceptibility to attacks as well as a greater focuson different generative models compared to previous literature. We conclude that thesynthetic data generation methods investigated are subject to a significant trade-off betweenprivacy and utility.

Abstract [sv]

Syntetisk data har framhållits som en lovande ny metod för anonymiseringav data. Förespråkare menar att det tillåter sömlös delning av data med samma egenskapersom den underliggande datan men med bibehållet integritetsskydd. Denna studie fokuserar påriskerna med syntetisk data utifrån två aspekter: integritetsskydd och användbarhet. Gällandeintegritetsskydd undersöks i vilken grad känsliga uppgifter om underliggande data kanutvinnas ur den syntetiska datan. Detta görs genom att utvärdera hur känslig datan är motintrång som har som mål att avgöra huruvida en viss datapunkt ingår i det underliggandedatasetet eller inte. Vi finner att betydande läckor av den underliggande datan kan ske genomsyntetisk data, särskilt för generativa modeller utan formellt integritetsskydd. Vi finner ävenatt högre säkerhet hos syntetisk data kommer på bekostnad av användbarheten hos datan.Med dessa slutsatser har vi bekräftat resultat från tidigare studier, men även kommit medegna bidrag. Först och främst utvecklade vi en korsvalideringsmetod för att undersökaintegritetsskyddet av syntetisk data. Vidare undersökte vi länken mellan avvikelsen hos endatapunkt och dess känslighet mot attacker. Till sist fokuserade vi på andra generativamodeller i större utsträckning än vad tidigare studier har gjort. Sammanfattningsviskonstaterar vi att de undersökta metoderna för syntetisk datagenerering är föremål för enkritisk avvägning mellan säkerhet och användbarhet för data.

Place, publisher, year, edition, pages
2023. , p. 585-592
Series
TRITA-EECS-EX ; 2023:187
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
URN: urn:nbn:se:kth:diva-341775OAI: oai:DiVA.org:kth-341775DiVA, id: diva2:1823474
Supervisors
Examiners
Projects
Kandidatexjobb i elektroteknik 2023, KTH, StockholmAvailable from: 2024-01-02 Created: 2024-01-02

Open Access in DiVA

fulltext(211487 kB)663 downloads
File information
File name FULLTEXT01.pdfFile size 211487 kBChecksum SHA-512
69786101c351a58f7bd524c3aeee40c661028b577366c4a725033372b88c624c87c2183b6acca2d3d43bbd2bb2f3942326c69263e70c99cf1db027ce9c4e9ae2
Type fulltextMimetype application/pdf

By organisation
School of Electrical Engineering and Computer Science (EECS)
Electrical Engineering, Electronic Engineering, Information Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 663 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 248 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf