Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Towards Decentralized Graph Learning
KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Programvaruteknik och datorsystem, SCS.ORCID-id: 0000-0002-0223-8907
2023 (engelsk)Doktoravhandling, med artikler (Annet vitenskapelig)
Abstract [en]

Current Machine Learning (ML) approaches typically present either a centralized or federated architecture. However, these architectures cannot easily keep up with some of the challenges introduced by recent trends, such as the growth in the number of IoT devices, increasing awareness about the privacy and security implications of extensive data collection, and the rise of graph-structured data and Graph Representation Learning. Systems based on either direct data collection or Federated Learning contain centralized, privileged systems that may act as scalability bottlenecks and dangerous single points of failure, while requiring users to trust the privacy protections and security practices in place. The combination of these issues ultimately leads to data waste, as opportunities to extract insights from available data are missed and thus the full societal benefits of advanced data analytics and ML are not realized.

In this thesis, we argue for a paradigm shift towards a completely decentralized and trustless architecture for privacy-aware Graph Representation Learning, which employs Gossip Learning and other gossip-based peer-to-peer techniques to achieve high levels of scalability and resilience while reducing the risk of privacy leaks. We then identify and pursue three key research directions necessary to achieve our vision: lifting unrealistic assumptions on Gossip Learning, identifying and developing specific use cases that are enabled or improved by gossip-based decentralization, and overcoming the obstacles to the deployment of decentralized training and inference for Graph Representation Learning models.

 Based on these key directions, our contributions are as follows. First, we analyze the robustness of Gossip Learning when several unrealistic but often assumed conditions are lifted. Then, we exploit Gossip Learning and gossip-based peer-to-peer protocols more in general across three use cases: the collaborative training of differentially-private Naive Bayes classifiers across organizations holding sensitive user data; the construction of decentralized, privacy-preserving data marketplaces; and the development and decentralization of early-stage IoT botnet detection systems based on Graph Representation Learning. Finally, we introduce a general framework for the fully-decentralized training of Graph Neural Networks, overcoming the typical requirement of these models to access non-local information during training and inference.

 The combination of these contributions removes major roadblocks towards decentralized graph learning, and also opens a new research direction aimed at further developing and optimizing the fully-decentralized training of Graph Representation Learning models.

Abstract [sv]

Dagens metoder för maskininlärning (ML) har vanligtvis antingen en centraliserad eller federerad arkitektur. Dessa arkitekturer kan dock inte lätt hålla jämna steg med några av de utmaningar som introducerats av de senaste trenderna, som till exempel ökningen av antalet IoT-enheter, ökad medvetenhet om integritets- och säkerhetskonsekvenserna av omfattande datainsamling samt ökningen av grafstrukturerad data och Graph Representation Learning. System baserade på antingen direkt datainsamling eller federerad inlärning innehåller centraliserade, privilegierade system som kan vara flaskhalsar och riskerar bli kritiska sårbarhetspunkter. Samtidigt måste användarna lita på integritetsskyddet och säkerhetspraxis som finns. Kombinationen av dessa problem leder i slutändan till ett ineffektivt nyttjande av data, eftersom möjligheter att utvinna insikter från tillgänglig data inte utnyttjas och därmed inte realiserar de fulla samhällsnyttorna som är möjliga med avancerad dataanalys och ML.

I denna avhandling argumenterar vi för ett paradigmskifte mot en helt decentraliserad och tillitslös arkitektur för integritetsmedveten Graph Representation Learning, som använder Gossip Learning och andra gossip-baserade peer-to-peer-tekniker för att uppnå höga nivåer av skalbarhet och motståndskraft, samtidigt som den minskar risken för integritetsläckor. Vi identifierar och driver sedan tre viktiga forskningsinriktningar som är nödvändiga för att uppnå vår vision; att lyfta orealistiska antaganden om Gossip Learning, identifiera och utveckla specifika användningsfall som möjliggörs eller förbättras av gossip-baserad decentralisering, samt övervinna hindren för utplacering av decentraliserad utbildning och inferens för Graph Representation Learning modeller.

Baserat på dessa nyckelriktlinjer våra bidrag är följande. Först analyserar vi robustheten i Gossip Learning när flera orealistiska men ofta antagna villkor upphävs. Vi utnyttjar sedan Gossip Learning och gossip-baserade peer-to-peer-protokoll mer generellt i tre användningsfall: kollaborativ inlärning av differentiellt privata Naive Bayes-klassificerare över entiteter med känslig användardata; byggandet av decentraliserade datamarknadsplatser som bevarar integriteten; samt utveckling och decentralisering av IoT-botnätdetekterings\-system i ett tidigt skede baserade på Graph Representation Learning. Slutligen introducerar vi ett allmänt ramverk för helt decentraliserad utbildning av Graph Neural Networks, som eliminerar de typiska kraven för dessa modeller för att få tillgång till icke-lokal information under träning och inferens.

Kombinationen av dessa bidrag tar bort stora hinder mot decentraliserad grafinlärning, och öppnar också en ny forskningsriktning som syftar till att vidareutveckla och optimera den helt decentraliserade utbildningen av Graph Representation Learning modeller.

sted, utgiver, år, opplag, sider
Stockholm: KTH Royal Institute of Technology, 2023. , s. vii, 59
Serie
TRITA-EECS-AVL ; 2023:42
HSV kategori
Forskningsprogram
Informations- och kommunikationsteknik
Identifikatorer
URN: urn:nbn:se:kth:diva-327016ISBN: 978-91-8040-584-3 (tryckt)OAI: oai:DiVA.org:kth-327016DiVA, id: diva2:1757592
Disputas
2023-06-09, Sal-C, Kistagången 16, Stockholm, 09:00 (engelsk)
Opponent
Veileder
Forskningsfinansiär
EU, Horizon 2020, 813162
Merknad

QC 20230517

Tilgjengelig fra: 2023-05-17 Laget: 2023-05-17 Sist oppdatert: 2023-05-26bibliografisk kontrollert
Delarbeid
1. Gossip Learning: Off the Beaten Path
Åpne denne publikasjonen i ny fane eller vindu >>Gossip Learning: Off the Beaten Path
2019 (engelsk)Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

The growing computational demands of model training tasks and the increased privacy awareness of consumers call for the development of new techniques in the area of machine learning. Fully decentralized approaches have been proposed, but are still in early research stages. This study analyses gossip learning, one of these state-of-the-art decentralized machine learning protocols, which promises high scalability and privacy preservation, with the goal of assessing its applicability to realworld scenarios.

Previous research on gossip learning presents strong and often unrealistic assumptions on the distribution of the data, the communication speeds of the devices and the connectivity among them. Our results show that lifting these requirements can, in certain scenarios, lead to slow convergence of the protocol or even unfair bias in the produced models. This paper identifies the conditions in which gossip learning can and cannot be applied, and introduces extensions that mitigate some of its limitations.

HSV kategori
Forskningsprogram
Datalogi
Identifikatorer
urn:nbn:se:kth:diva-263863 (URN)10.1109/BigData47090.2019.9006216 (DOI)000554828701025 ()2-s2.0-85081314125 (Scopus ID)
Konferanse
2019 IEEE International Conference on Big Data (IEEE Big Data 2019), December 9-12, 2019, Los Angeles, CA, USA
Merknad

Accepted paper. QC 20191122

Tilgjengelig fra: 2019-11-18 Laget: 2019-11-18 Sist oppdatert: 2023-05-17bibliografisk kontrollert
2. Federated Naive Bayes under Differential Privacy
Åpne denne publikasjonen i ny fane eller vindu >>Federated Naive Bayes under Differential Privacy
2022 (engelsk)Inngår i: Proceedings of the 19th International Conference on Security and Cryptography - SECRYPT / [ed] DiVimercati, SDC Samarati, P, Scitepress , 2022, s. 170-180Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Growing privacy concerns regarding personal data disclosure are contrasting with the constant need of such information for data-driven applications. To address this issue, the combination of federated learning and differential privacy is now well-established in the domain of machine learning. These techniques allow to train deep neural networks without collecting the data and while preventing information leakage. However, there are many scenarios where simpler and more robust machine learning models are preferable. In this paper, we present a federated and differentially-private version of the Naive Bayes algorithm for classification. Our results show that, without data collection, the same performance of a centralized solution can be achieved on any dataset with only a slight increase in the privacy budget. Furthermore, if certain conditions are met, our federated solution can outperform a centralized approach.

sted, utgiver, år, opplag, sider
Scitepress, 2022
Emneord
Federated Learning, Naive Bayes, Differential Privacy
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-319088 (URN)10.5220/0011275300003283 (DOI)000853004900014 ()2-s2.0-85174498579 (Scopus ID)
Konferanse
19th International Conference on Security and Cryptography (SECRYPT), JUL 11-13, 2022, Lisbon, Portugal
Merknad

QC 20220926

Part of proceedings: ISBN 978-989-758-590-6

Tilgjengelig fra: 2022-09-26 Laget: 2022-09-26 Sist oppdatert: 2024-08-28bibliografisk kontrollert
3. Towards a Realistic Decentralized Naive Bayeswith Differential Privacy
Åpne denne publikasjonen i ny fane eller vindu >>Towards a Realistic Decentralized Naive Bayeswith Differential Privacy
(engelsk)Manuskript (preprint) (Annet vitenskapelig)
Abstract [en]

This is an extended version of our work in [16]. In this paper,we introduce two novel algorithms to collaboratively train Naive Bayesmodels across multiple private data sources: Federated Naive Bayes andGossip Naive Bayes. Instead of directly providing access to their data,the data owners compute local updates that are then aggregated to builda global model. In order to also prevent indirect privacy leaks from theupdates or from the final model, our algorithms protect the exchangedinformation with differential privacy. We experimentally evaluate ourproposed approaches, examining different scenarios and focusing on potentialreal-world issues, such as different data owner offering differentamounts of data or requesting different levels of privacy. Our results showthat both Federated and Gossip Naive Bayes achieve similar accuracy toa “vanilla” Naive Bayes while maintaining reasonable privacy guarantees,while being extremely robust to heterogeneous data owners.

Emneord
Federated learning, Gossip Learning, Differential privacy, Naive Bayes
HSV kategori
Forskningsprogram
Datalogi
Identifikatorer
urn:nbn:se:kth:diva-325437 (URN)
Forskningsfinansiär
EU, Horizon 2020, 813162
Merknad

QC 20230405

Tilgjengelig fra: 2023-04-05 Laget: 2023-04-05 Sist oppdatert: 2023-05-17bibliografisk kontrollert
4. PDS2: A user-centered decentralized marketplace for privacy preserving data processing
Åpne denne publikasjonen i ny fane eller vindu >>PDS2: A user-centered decentralized marketplace for privacy preserving data processing
Vise andre…
2021 (engelsk)Inngår i: 2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS (ICDEW 2021), Institute of Electrical and Electronics Engineers (IEEE) , 2021, s. 92-99Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

We envision PDS2, a decentralized data marketplace in which consumers submit their tasks to be run within the platform, on the data of willing providers. The goal of PDS2 is to ensure that users maintain full control on their data and do not compromise their privacy, while being rewarded for the value that their data generates. In order to achieve this, our marketplace architecture employs blockchain technology, privacypreserving computation and decentralized machine learning. We then compare different potential solutions and identify the Ethereum blockchain, trusted execution environments and gossip learning as the most suitable for the implementation of PDS2. We also discuss the main open challenges that are left to tackle and possible directions for future work.

sted, utgiver, år, opplag, sider
Institute of Electrical and Electronics Engineers (IEEE), 2021
Serie
IEEE International Conference on Data Engineering Workshop, ISSN 1943-2895
Emneord
iot, blockchain, machine learning, privacy
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-300240 (URN)10.1109/ICDEW53142.2021.00024 (DOI)000681131300018 ()2-s2.0-85107689093 (Scopus ID)
Konferanse
37th IEEE International Conference on Data Engineering (IEEE ICDE), APR 19-22, 2021, ELECTR NETWORK
Merknad

Part of proceedings: ISBN 978-1-6654-4890-1, QC 20230117

Tilgjengelig fra: 2021-08-30 Laget: 2021-08-30 Sist oppdatert: 2023-05-17bibliografisk kontrollert
5. Towards a decentralized infrastructure for data marketplaces: narrowing the gap between academia and industry
Åpne denne publikasjonen i ny fane eller vindu >>Towards a decentralized infrastructure for data marketplaces: narrowing the gap between academia and industry
2022 (engelsk)Inngår i: DE '22: Proceedings of the 1st International Workshop on Data Economy, New York, NY, USA: Association for Computing Machinery (ACM), 2022, s. 49-56Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

One big challenge for Industry 4.0 is leveraging the large amount of data that remain unused after collection. A variety of commercial data marketplaces have emerged in recent years to tackle this task. Despite their different business models and target markets, such marketplaces share a number of common issues that slow the growth of the industry, including data discovery, transparency, data privacy and data valuation. Many academic designs have been proposed to address these issues, yet most of them remain unimplemented, due to complexity or inefficiency.

We argue that these issues can be addressed with a combination of blockchain-based infrastructure, privacy-preserving computing and machine learning-based valuation metrics. Furthermore, we discuss key enabling technologies in each of these areas that are feasible to deploy at scale and could thus be implemented in real-world marketplaces in the near future. We select such technologies based on their current maturity and their industrial prominence.

sted, utgiver, år, opplag, sider
New York, NY, USA: Association for Computing Machinery (ACM), 2022
HSV kategori
Forskningsprogram
Datalogi
Identifikatorer
urn:nbn:se:kth:diva-325815 (URN)
Konferanse
CoNEXT '22: The 18th International Conference on emerging Networking EXperiments and Technologies
Forskningsfinansiär
EU, Horizon 2020, 813162
Merknad

QC 20230425

Tilgjengelig fra: 2023-04-16 Laget: 2023-04-16 Sist oppdatert: 2023-05-17bibliografisk kontrollert
6. LiMNet: Early-Stage Detection of IoT Botnets with Lightweight Memory Networks
Åpne denne publikasjonen i ny fane eller vindu >>LiMNet: Early-Stage Detection of IoT Botnets with Lightweight Memory Networks
Vise andre…
2021 (engelsk)Inngår i: Computer Security – ESORICS 2021: 26th European Symposium on Research in Computer Security, Darmstadt, Germany, October 4–8, 2021, Proceedings, Part I / [ed] Elisa Bertino, Haya Shulman, Michael Waidner, Springer Nature , 2021Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

IoT devices have been growing exponentially in the last few years. This growth makes them an attractive target for attackers due to their low computational power and limited security features. Attackers use IoT botnets as an instrument to perform DDoS attacks which caused major disruptions of Internet services in the last decade. While many works have tackled the task of detecting botnet attacks, only a few have considered early-stage detection of these botnets during their propagation phase.

While previous approaches analyze each network packet individually to predict its maliciousness, we propose a novel deep learning model called LiMNet (Lightweight Memory Network), which uses an internal memory component to capture the behaviour of each IoT device over time. This memory incorporates both packet features and behaviour of the peer devices. With this information, LiMNet achieves almost maximum AUROC classification scores, between 98.8% and 99.7%, with a 14% improvement over state of the art. LiMNet is also lightweight, performing inference almost 8 times faster than previous approaches.

sted, utgiver, år, opplag, sider
Springer Nature, 2021
Serie
Lecture Notes in Computer Science ; 12972
Emneord
IoT, botnet detection, machine learning
HSV kategori
Forskningsprogram
Datalogi; Telekommunikation
Identifikatorer
urn:nbn:se:kth:diva-303027 (URN)10.1007/978-3-030-88418-5_29 (DOI)000772653800029 ()2-s2.0-85116855549 (Scopus ID)
Konferanse
Computer Security - ESORICS 2021 - 26th European Symposium on Research in Computer Security, Darmstadt, Germany, October 4-8, 2021
Forskningsfinansiär
EU, Horizon 2020, 813162
Merknad

Part of proceedings: ISBN 978-3-030-88417-8

QC 20230117

Tilgjengelig fra: 2021-10-05 Laget: 2021-10-05 Sist oppdatert: 2023-12-11bibliografisk kontrollert
7. Metasoma: Decentralized and CollaborativeEarly-Stage Detection of IoT Botnets
Åpne denne publikasjonen i ny fane eller vindu >>Metasoma: Decentralized and CollaborativeEarly-Stage Detection of IoT Botnets
Vise andre…
(engelsk)Manuskript (preprint) (Annet vitenskapelig)
Abstract [en]

Early-stage detection of botnets during their spreadingphase, before any attack, is fundamental to IoT security.Recently introduced lightweight memory networks represent thestate of the art in this domain. However, they require a centralsystem to capture and analyze all traffic in the network, whichmay not always be feasible in real-world scenarios.In this paper, we introduce a decentralized and collaborativealternative, in which the IoT devices themselves are responsiblefor this task without any central observer or coordinator. Ourresults show that the performance of this novel approach iscompetitive with similar centralized solutions, despite the lackof a global view of the network at any participating device.We also provide an extensive analysis of the security limitationsof our fully-decentralized detection system. We identify thepotential exploits that an attacker may attempt to perform, assesstheir impact on the IoT network as well as propose and evaluateeffective countermeasures.

Emneord
Security and Privacy, Botnet Detection, Industrial IoT (IIoT), Device-to-Device Communication, Deep Learning
HSV kategori
Forskningsprogram
Datalogi
Identifikatorer
urn:nbn:se:kth:diva-325436 (URN)
Forskningsfinansiär
EU, Horizon 2020, 813162
Merknad

QC 20230405

Tilgjengelig fra: 2023-04-05 Laget: 2023-04-05 Sist oppdatert: 2023-05-17bibliografisk kontrollert
8. Fully-Decentralized Training of GNNs using Layer-wise Self-Supervision
Åpne denne publikasjonen i ny fane eller vindu >>Fully-Decentralized Training of GNNs using Layer-wise Self-Supervision
(engelsk)Manuskript (preprint) (Annet vitenskapelig)
Abstract [en]

In existing literature, GNN training has been performed mostly in centralized, and sometimes federated, settings. In this work, we consider a fully-decentralized data-private scenario, where each node has limited knowledge of the surrounding graph. We propose the first architecture that enables GNN training in this fully-decentralized setting, by carefully combining several techniques, including decoupled learning, self-supervision and Gossip Learning. We implement two simulation tools to experimentally evaluate our solution. The results show that the proposed technique can be effectively used in scenarios where centralized or federated approaches are unfeasible or undesirable.

Emneord
Graph Neural Networks, Decentralized Learning, Self-Supervised Learning, Gossip Learning, Decoupled Learning
HSV kategori
Forskningsprogram
Datalogi
Identifikatorer
urn:nbn:se:kth:diva-324971 (URN)
Forskningsfinansiär
EU, Horizon 2020, 813162
Merknad

QC 20230322

Tilgjengelig fra: 2023-03-22 Laget: 2023-03-22 Sist oppdatert: 2023-05-17bibliografisk kontrollert

Open Access i DiVA

summary(1608 kB)354 nedlastinger
Filinformasjon
Fil SUMMARY01.pdfFilstørrelse 1608 kBChecksum SHA-512
bc669637be651450e26879078f57904829673447a10bbbe02f9de4e1043b958c1f7e52f7762fb907aa75c360c73aaefae3b47fc8ba6b1a78359f124fa9a2ac68
Type summaryMimetype application/pdf

Person

Giaretta, Lodovico

Søk i DiVA

Av forfatter/redaktør
Giaretta, Lodovico
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar
Antall nedlastinger er summen av alle nedlastinger av alle fulltekster. Det kan for eksempel være tidligere versjoner som er ikke lenger tilgjengelige

isbn
urn-nbn

Altmetric

isbn
urn-nbn
Totalt: 1771 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf