Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Privacy-respecting Features in Large Collections of Personal Data
KTH, School of Information and Communication Technology (ICT).
2016 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Public disclosure of anonymized datasets have repeatedly been shown to be privacy invasive to users, allowing attackers to extract personal sensitive information. In this thesis we give an overview of the privacy-preserving data publishing techniques currently available and their shortcomings. A better understanding of the privacy bounds of personal data is paramount, specifically mobility data. We study 12 months of smartphone GPS readings of 1000 students from Technical University of Denmark (DTU) and find high uniqueness values for user’s traces. In a dataset with GPS readings specified hourly, in a spatial resolution of two GPS decimal digits (0.7 sq. km) we find that four data points are enough to uniquely identify more than 90% of users. We then conducted two novel experiments: 1. dynamic experiment, where the resolution of data points are variant and 2. label experiment, where GPS readings are transformed into behavior labels like “Home” or “Work” with boolean values. The dynamic experiment shows that one data point under three hours time resolution and two GPS decimal digits spatial resolution are comparable, in terms of uniqueness, to three data points under daily time resolution and two GPS decimal digits. The label experiment shows that using labels instead of raw GPS coordinates brings down uniqueness percentages considerably and is, therefore, more privacy-preserving. The number of data points needed to identify a user increases from four to eighteen data points.

Abstract [da]

Offentliggørelse af anonyme datasæt har gentagne gange vist at være privatlivskrænkende for forbrugere, og gør det muligt for hackere at udtrække personfølsomme oplysninger. I dette speciale giver vi et overblik over nuværende privatlivsbeskyttende data offentliggørelses teknikker og deres mangler. En bedre forståelseaf de privatlivs afgrænsninger på personlig data er altafgørende. Specifikt mobilitets data. Vi undersøger 12 måneders smartphone GPS aflæsninger af 1000 studerende fra Danmarks tekniske universitet (DTU) og finder høje unikheds værdier for brugernes spor. I et datasæt med GPS aflæsninger, specificeret i timer, i en areal opløsning på to GPS decimaler (0,7 kvadrat km) vi oplever at 4 data punkter er nok til at unikt identificere mere end 90% af brugerne. Vi foretog derefter 2 experimenter: 1. Dynamisk experiment, hvor opløsningen af datapunkter er varierende og 2. Mærkat experiment hvor GPS aflæsninger er transformeret til opførsels mærkater som “hjem” eller “arbejde” med booleske værdier. Det dynamiske experiment viser at et data punkt under en 3 timers tids opløsning og to GPS decimalers areals opløsning er sammenligneligt, med hensyn til unikhed, til tre datapunkter under daglig tids opløsning og to GPS decimal. Mærkat experimentet viser at ved at bruge mærkater i stedet for de rå GPS cooridinater nedsætter man unikheds procenterne betydeligt og er derfor mere privatlivs bevarende. Antallet af datapunkter man skal bruge for at identificere en bruger stiger fra fire til atten data punkter.

Place, publisher, year, edition, pages
2016. , 93 p.
Series
TRITA-ICT-EX, 2016:140
National Category
Computer and Information Science
Identifiers
URN: urn:nbn:se:kth:diva-205322OAI: oai:DiVA.org:kth-205322DiVA: diva2:1088545
Subject / course
Electrical Engineering
Educational program
Master of Science -Security and Mobile Computing
Supervisors
Examiners
Available from: 2017-04-13 Created: 2017-04-13 Last updated: 2017-04-13Bibliographically approved

Open Access in DiVA

No full text

By organisation
School of Information and Communication Technology (ICT)
Computer and Information Science

Search outside of DiVA

GoogleGoogle Scholar

Total: 4 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf