kth.sePublikationer
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Representation Learning on Graphs: Investigating and Overcoming Common Challenges
KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Programvaruteknik och datorsystem, SCS.ORCID-id: 0000-0002-5392-6531
2025 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

Graph Representation Learning (GRL) has emerged as a crucial area for modeling and understanding the structure of graph-structured data across diverse applications. This thesis advances GRL by addressing key challenges in both homogeneous and heterogeneous graphs, including modeling complex heterogeneous relational structures, designing generalizable augmentations for self-supervised learning, improving inductive link prediction in cold-start scenarios, and mitigating over-squashing in message-passing architectures.

Heterogeneous graphs present modeling difficulties due to the presence of multiple node and edge types. To address this, we propose a flexible random walk framework that removes the need for predefined domain knowledge such as meta-paths, enabling more effective and scalable modeling of complex relational structures.

In the self-supervised learning setting, current GRL methods often rely on manually designed graph augmentations that limit generalizability. This thesis introduces augmentation techniques that are task- and domain-agnostic, improving performance across varied graph types and structures.

Inductive link prediction remains challenging for GNNs, particularly in cold-start scenarios where target nodes lack topological context. We propose methods that support efficient and accurate inference without requiring access to neighborhood information of unseen nodes, addressing both scalability and generalization.

While GNNs are effective at capturing local structure, they often suffer from over-squashing, which restricts information propagation across long-range dependencies. To overcome this, we present strategies that improve the aggregation process, enabling GNNs to better preserve and prioritize critical signals from distant parts of the graph.

Through extensive experiments on benchmark datasets, the proposed methods demonstrate consistent improvements in node classification, link prediction, and graph property prediction tasks. Our approaches outperform strong baselines in settings involving heterogeneity, inductive generalization, and large-diameter graphs. Some methods significantly reduce inference cost, while others enhance model expressiveness and robustness by improving structural generalization. Collectively, these contributions show that principled and general-purpose solutions can effectively address long-standing challenges in graph representation learning.

Ort, förlag, år, upplaga, sidor
Stockholm: KTH Royal Institute of Technology, 2025. , s. ix, 70
Serie
TRITA-EECS-AVL ; 2025:92
Nyckelord [en]
Graph Machine Learning, Representation Learning
Nationell ämneskategori
Artificiell intelligens
Forskningsämne
Datalogi
Identifikatorer
URN: urn:nbn:se:kth:diva-370617ISBN: 978-91-8106-426-1 (digital)OAI: oai:DiVA.org:kth-370617DiVA, id: diva2:2001853
Disputation
2025-11-07, F3, Lindstedtsvägen 26 & 28, KTH Campus, Stocholm, 09:00 (Engelska)
Opponent
Handledare
Anmärkning

QC 20250929

Tillgänglig från: 2025-09-29 Skapad: 2025-09-29 Senast uppdaterad: 2025-09-29Bibliografiskt granskad
Delarbeten
1. SchemaWalk: Schema Aware Random Walks for Heterogeneous Graph Embedding
Öppna denna publikation i ny flik eller fönster >>SchemaWalk: Schema Aware Random Walks for Heterogeneous Graph Embedding
2022 (Engelska)Ingår i: WWW 2022 - Companion Proceedings of the Web Conference 2022, Association for Computing Machinery (ACM) , 2022, s. 1157-1166Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Heterogeneous Information Network (HIN) embedding has been a prevalent approach to learn representations off semantically-rich heterogeneous networks. Most HIN embedding methods exploit meta-paths to retain high-order structures, yet, their performance is conditioned on the quality of the (generated/manually-defined) meta-paths and their suitability for the specific label set. Whereas other methods adjust random walks to harness or skip certain heterogeneous structures (e.g. node type(s)), in doing so, the adjusted random walker may casually omit other node/edge types. Our key insight is with no domain knowledge, the random walker should hold no assumptions about heterogeneous structure (i.e. edge types). Thus, aiming for a flexible and general method, we utilize network schema as a unique blueprint of HIN, and propose SchemaWalk, a random walk to uniformly sample all edge types within the network schema. Moreover, we identify the starvation phenomenon which induces random walkers on HINs to under- or over-sample certain edge types. Accordingly, we design SchemaWalkHO to skip local deficient connectivity to preserve uniform sampling distribution. Finally, we carry out node classification experiments on four real-world HINs, and provide in-depth qualitative analysis. The results highlight the robustness of our method regardless to the graph structure in contrast with the state-of-the-art baselines. 

Ort, förlag, år, upplaga, sidor
Association for Computing Machinery (ACM), 2022
Nyckelord
Heterogeneous Information Network, Network Embeddings, Random Walk, Representation Learning, Domain Knowledge, Graphic methods, Information services, Random processes, Graph embeddings, Heterogeneous graph, Heterogeneous information, Heterogeneous structures, Information networks, Network embedding, Random walkers, Heterogeneous networks
Nationell ämneskategori
Datorsystem
Identifikatorer
urn:nbn:se:kth:diva-327049 (URN)10.1145/3487553.3524728 (DOI)001147592700198 ()2-s2.0-85137448476 (Scopus ID)
Konferens
31st ACM Web Conference, WWW 2022, 25 April 2022
Anmärkning

QC 20230523

Tillgänglig från: 2023-05-23 Skapad: 2023-05-23 Senast uppdaterad: 2025-09-29Bibliografiskt granskad
2. Data-Driven Self-Supervised Graph Representation Learning
Öppna denna publikation i ny flik eller fönster >>Data-Driven Self-Supervised Graph Representation Learning
2023 (Engelska)Ingår i: ECAI 2023: 26th European Conference on Artificial Intelligence, including 12th Conference on Prestigious Applications of Intelligent Systems, PAIS 2023 - Proceedings, IOS Press , 2023, s. 629-636Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Self-supervised graph representation learning (SSGRL) is a representation learning paradigm used to reduce or avoid manual labeling. An essential part of SSGRL is graph data augmentation. Existing methods usually rely on heuristics commonly identified through trial and error and are effective only within some application domains. Also, it is not clear why one heuristic is better than another. Moreover, recent studies have argued against some techniques (e.g., dropout: that can change the properties of molecular graphs or destroy relevant signals for graph-based document classification tasks). In this study, we propose a novel data-driven SSGRL approach that automatically learns a suitable graph augmentation from the signal encoded in the graph (i.e., the nodes' predictive feature and topological information). We propose two complementary approaches that produce learnable feature and topological augmentations. The former learns multi-view augmentation of node features, and the latter learns a high-order view of the topology. Moreover, the augmentations are jointly learned with the representation. Our approach is general that it can be applied to homogeneous and heterogeneous graphs. We perform extensive experiments on node classification (using nine homogeneous and heterogeneous datasets) and graph property prediction (using another eight datasets). The results show that the proposed method matches or outperforms the SOTA SSGRL baselines and performs similarly to semi-supervised methods. The anonymised source code is available at https://github.com/AhmedESamy/dsgrl/

Ort, förlag, år, upplaga, sidor
IOS Press, 2023
Nationell ämneskategori
Datavetenskap (datalogi) Datorgrafik och datorseende
Identifikatorer
urn:nbn:se:kth:diva-339683 (URN)10.3233/FAIA230325 (DOI)2-s2.0-85175858097 (Scopus ID)
Konferens
26th European Conference on Artificial Intelligence, ECAI 2023, Krakow, Poland, Sep 30 2023 - Oct 4 2023
Anmärkning

Part of ISBN 9781643684369

QC 20231116

Tillgänglig från: 2023-11-16 Skapad: 2023-11-16 Senast uppdaterad: 2025-09-29Bibliografiskt granskad
3. Graph2Feat: Inductive Link Prediction via Knowledge Distillation
Öppna denna publikation i ny flik eller fönster >>Graph2Feat: Inductive Link Prediction via Knowledge Distillation
2023 (Engelska)Ingår i: ACM Web Conference 2023: Companion of the World Wide Web Conference, WWW 2023, Association for Computing Machinery (ACM) , 2023, s. 805-812Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Link prediction between two nodes is a critical task in graph machine learning. Most approaches are based on variants of graph neural networks (GNNs) that focus on transductive link prediction and have high inference latency. However, many real-world applications require fast inference over new nodes in inductive settings where no information on connectivity is available for these nodes. Thereby, node features provide an inevitable alternative in the latter scenario. To that end, we propose Graph2Feat, which enables inductive link prediction by exploiting knowledge distillation (KD) through the Student-Teacher learning framework. In particular, Graph2Feat learns to match the representations of a lightweight student multi-layer perceptron (MLP) with a more expressive teacher GNN while learning to predict missing links based on the node features, thus attaining both GNN's expressiveness and MLP's fast inference. Furthermore, our approach is general; it is suitable for transductive and inductive link predictions on different types of graphs regardless of them being homogeneous or heterogeneous, directed or undirected. We carry out extensive experiments on seven real-world datasets including homogeneous and heterogeneous graphs. Our experiments demonstrate that Graph2Feat significantly outperforms SOTA methods in terms of AUC and average precision in homogeneous and heterogeneous graphs. Finally, Graph2Feat has the minimum inference time compared to the SOTA methods, and 100x acceleration compared to GNNs. The code and datasets are available on GitHub1.

Ort, förlag, år, upplaga, sidor
Association for Computing Machinery (ACM), 2023
Nyckelord
graph representation learning, heterogeneous networks, inductive link prediction, knowledge distillation
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:kth:diva-333310 (URN)10.1145/3543873.3587596 (DOI)001124276300163 ()2-s2.0-85159575698 (Scopus ID)
Konferens
2023 World Wide Web Conference, WWW 2023, Austin, United States of America, Apr 30 2023 - May 4 2023
Anmärkning

Part of ISBN 9781450394161

QC 20230801

Tillgänglig från: 2023-08-01 Skapad: 2023-08-01 Senast uppdaterad: 2025-09-29Bibliografiskt granskad
4. Leap: Inductive Link Prediction via Learnable Topology Augmentation
Öppna denna publikation i ny flik eller fönster >>Leap: Inductive Link Prediction via Learnable Topology Augmentation
2025 (Engelska)Ingår i: Machine Learning, Optimization, and Data Science - 10th International Conference, LOD 2024, Revised Selected Papers, Springer Nature , 2025, s. 448-463Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Link prediction is a crucial task in many downstream applications of graph machine learning. To this end, Graph Neural Network (GNN) is a widely used technique for link prediction, mainly in transductive settings, where the goal is to predict missing links between existing nodes. However, many real-life applications require an inductive setting that accommodates for new nodes, coming into an existing graph. Thus, recently inductive link prediction has attracted considerable attention, and a multi-layer perceptron (MLP) is the popular choice of most studies to learn node representations. However, these approaches have limited expressivity and do not fully capture the graph’s structural signal. Therefore, in this work we propose LEAP, an inductive link prediction method based on LEArnable toPology augmentation. Unlike previous methods, LEAP models the inductive bias from both the structure and node features, and hence is more expressive. To the best of our knowledge, this is the first attempt to provide structural contexts for new nodes via learnable augmentation in inductive settings. Extensive experiments on seven real-world homogeneous and heterogeneous graphs demonstrates that LEAP significantly surpasses SOTA methods. The improvements are up to 22% and 17% in terms of AUC and average precision, respectively. The code and datasets are available on GitHub (1https://github.com/AhmedESamy/LEAP/).

Ort, förlag, år, upplaga, sidor
Springer Nature, 2025
Nyckelord
Graph Neural Networks, Heterogeneous graphs, Inductive link prediction, Learnable augmentation
Nationell ämneskategori
Datavetenskap (datalogi) Teknisk mekanik
Identifikatorer
urn:nbn:se:kth:diva-361992 (URN)10.1007/978-3-031-82481-4_31 (DOI)2-s2.0-105000770530 (Scopus ID)
Konferens
10th International Conference on Machine Learning, Optimization, and Data Science, LOD 2024, Castiglione della Pescaia, Italy, Sep 22 2024 - Sep 25 2024
Anmärkning

Part of ISBN 9783031824807

QC 20250403

Tillgänglig från: 2025-04-03 Skapad: 2025-04-03 Senast uppdaterad: 2025-09-29Bibliografiskt granskad
5. HopNet: Addressing Over-Squashing with Learnable Rewiring in GNNs
Öppna denna publikation i ny flik eller fönster >>HopNet: Addressing Over-Squashing with Learnable Rewiring in GNNs
2025 (Engelska)Manuskript (preprint) (Övrigt vetenskapligt)
Abstract [en]

Graph Neural Networks (GNNs) have emerged as powerful tools for extracting insights from graph-structured data. However, GNNs often encounter challenges when aggregating information across distant and constrained connections, leading to performance degradation due to a phenomenon known as over-squashing. Previous solutions, such as static rewiring, address this issue by increasing graph density but compromise the inherent inductive bias of GNNs with respect to node distances. In this study, we introduce HopNet, a novel model designed to facilitate effective long-range information propagation through learnable rewiring. HopNet employs attention mechanisms to dynamically create targeted shortcuts, enabling efficient communication between distant nodes while maintaining a balance between local and global interactions. Extensive experiments across diverse tasks, including node classification, link prediction, graph regression, and graph classification, on established real-world benchmark datasets demonstrate HopNet’s ability to overcome GNN limitations, consistently achieving superior performance over state-of-the-art methods.

Nyckelord
Graph Neural Networks, Bottlenecks, Over-Squashing, Learnable Rewiring.
Nationell ämneskategori
Artificiell intelligens
Identifikatorer
urn:nbn:se:kth:diva-370615 (URN)
Anmärkning

Submitted to ACM WebConference 2026

QC 20250929

Tillgänglig från: 2025-09-29 Skapad: 2025-09-29 Senast uppdaterad: 2025-10-21Bibliografiskt granskad

Open Access i DiVA

fulltext(2025 kB)45 nedladdningar
Filinformation
Filnamn FULLTEXT01.pdfFilstorlek 2025 kBChecksumma SHA-512
148df4bb7be800158651bc8c60555bdde552f6b7bd9546fb422385aa6fb5a1b512956a9fed5b39d08b3d6f6ae9a2ae0d3447c3bf3fd38553c6b96980d3206119
Typ fulltextMimetyp application/pdf

Person

Samy, Ahmed E.

Sök vidare i DiVA

Av författaren/redaktören
Samy, Ahmed E.
Av organisationen
Programvaruteknik och datorsystem, SCS
Artificiell intelligens

Sök vidare utanför DiVA

GoogleGoogle Scholar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

isbn
urn-nbn

Altmetricpoäng

isbn
urn-nbn
Totalt: 1014 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf