kth.sePublications
Change search
Link to record
Permanent link

Direct link
Alternative names
Publications (6 of 6) Show all publications
Wang, H., Tu, S. & Gionis, A. (2025). Sequential Diversification with Provable Guarantees. In: Proceedings Of The Eighteenth Acm International Conference On Web Search And Data Mining, WSDM 2025: . Paper presented at 18th International Conference on Web Search and Data Mining-WSDM, MAR 10-14, 2025, Hannover, GERMANY (pp. 345-353). Association for Computing Machinery (ACM)
Open this publication in new window or tab >>Sequential Diversification with Provable Guarantees
2025 (English)In: Proceedings Of The Eighteenth Acm International Conference On Web Search And Data Mining, WSDM 2025, Association for Computing Machinery (ACM) , 2025, p. 345-353Conference paper, Published paper (Refereed)
Abstract [en]

Diversification is a useful tool for exploring large collections of information items. It has been used to reduce redundancy and cover multiple perspectives in information-search settings. Diversification finds applications in many different domains, including presenting search results of information-retrieval systems and selecting suggestions for recommender systems. Interestingly, existing measures of diversity are defined over sets of items, rather than evaluating sequences of items. This design choice comes in contrast with commonly-used relevance measures, which are distinctly defined over sequences of items, taking into account the ranking of items. The importance of employing sequential measures is that information items are almost always presented in a sequential manner, and during their information-exploration activity users tend to prioritize items with higher ranking. In this paper, we study the problem of maximizing sequential diversity. This is a new measure of diversity, which accounts for the ranking of the items, and incorporates item relevance and user behavior. The overarching framework can be instantiated with different diversity measures, and here we consider the measures of sum diversity and coverage diversity. The problem was recently proposed by Coppolillo et al. [11], where they introduce empirical methods that work well in practice. Our paper is a theoretical treatment of the problem: we establish the problem hardness and present algorithms with constant approximation guarantees for both diversity measures we consider. Experimentally, we demonstrate that our methods are competitive against strong baselines.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2025
Keywords
Diversification, Ranking algorithms, Approximation algorithms
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-366040 (URN)10.1145/3701551.3703531 (DOI)001476971200037 ()2-s2.0-105001669833 (Scopus ID)
Conference
18th International Conference on Web Search and Data Mining-WSDM, MAR 10-14, 2025, Hannover, GERMANY
Note

Part of ISBN 979-8-4007-1329-3

QC 20250703

Available from: 2025-07-03 Created: 2025-07-03 Last updated: 2025-08-22Bibliographically approved
Oettershagen, L., Wang, H. & Gionis, A. (2024). Finding Densest Subgraphs with Edge-Color Constraints. In: WWW 2024 - Proceedings of the ACM Web Conference: . Paper presented at 33rd ACM Web Conference, WWW 2024, Singapore, Singapore, May 13 2024 - May 17 2024 (pp. 936-947). Association for Computing Machinery (ACM)
Open this publication in new window or tab >>Finding Densest Subgraphs with Edge-Color Constraints
2024 (English)In: WWW 2024 - Proceedings of the ACM Web Conference, Association for Computing Machinery (ACM) , 2024, p. 936-947Conference paper, Published paper (Refereed)
Abstract [en]

We consider a variant of the densest subgraph problem in networks with single or multiple edge attributes. For example, in a social network, the edge attributes may describe the type of relationship between users, such as friends, family, or acquaintances, or different types of communication. For conceptual simplicity, we view the attributes as edge colors. The new problem we address is to find a diverse densest subgraph that fulfills given requirements on the numbers of edges of specific colors. When searching for a dense social network community, our problem will enforce the requirement that the community is diverse according to criteria specified by the edge attributes. We show that the decision versions for finding exactly, at most, and at least h colored edges densest subgraph, where h is a vector of color requirements, are NP-complete, for already two colors. For the problem of finding a densest subgraph with at least h colored edges, we provide a linear-time constant-factor approximation algorithm when the input graph is sparse. On the way, we introduce the related at least h (non-colored) edges densest subgraph problem, show its hardness, and also provide a linear-time constant-factor approximation. In our experiments, we demonstrate the efficacy and efficiency of our new algorithms.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2024
Keywords
densest subgraph, density, diversity, social networks
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-347330 (URN)10.1145/3589334.3645647 (DOI)2-s2.0-85194105710 (Scopus ID)
Conference
33rd ACM Web Conference, WWW 2024, Singapore, Singapore, May 13 2024 - May 17 2024
Note

Part of ISBN [9798400701719]QC 20240612

Available from: 2024-06-10 Created: 2024-06-10 Last updated: 2024-12-03Bibliographically approved
Li, P., Wang, H. & Bohm, C. (2024). Scalable Graph Classification via Random Walk Fingerprints. In: Proceedings - 24th IEEE International Conference on Data Mining, ICDM 2024: . Paper presented at 24th IEEE International Conference on Data Mining, ICDM 2024, Abu Dhabi, United Arab Emirates, December 9-12, 2024 (pp. 231-240). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Scalable Graph Classification via Random Walk Fingerprints
2024 (English)In: Proceedings - 24th IEEE International Conference on Data Mining, ICDM 2024, Institute of Electrical and Electronics Engineers (IEEE) , 2024, p. 231-240Conference paper, Published paper (Refereed)
Abstract [en]

Graph classification has long been a focus of net-work mining, with graph kernel methods and representation learning at the forefront. Despite their success, many of these studies require heavy computation, making them impractical for large-scale datasets. In this paper, we design a novel structural feature extraction technique that leverages node subsets and random walk probabilities, presenting a scalable, unsupervised, and easily interpretable alternative. Initially, we partition each graph based on the structural roles of nodes. This process creates soft alignments of node subsets across graphs of varying sizes. Then, we measure the connection strengths within and between these subsets, which form the fingerprints for graph classification. Additionally, this technique can seamlessly incorporate node features. Through empirical assessment encompassing a broad range of graph datasets, we demonstrate that our method achieves high levels of computational efficiency while maintaining robust classification accuracy. Code and data are available at https://github.com/KXDY233/RWF.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
Keywords
Feature Extraction, Graph Classification, Scalability
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-361437 (URN)10.1109/ICDM59182.2024.00030 (DOI)2-s2.0-86000239902 (Scopus ID)
Conference
24th IEEE International Conference on Data Mining, ICDM 2024, Abu Dhabi, United Arab Emirates, December 9-12, 2024
Note

Part of ISBN 9798331506681

QC 20250320

Available from: 2025-03-19 Created: 2025-03-19 Last updated: 2025-03-20Bibliographically approved
Li, P., Wang, H., Li, K. & Böhm, C. (2023). Influence without Authority: Maximizing Information Coverage in Hypergraphs. In: 2023 SIAM International Conference on Data Mining, SDM 2023: . Paper presented at 2023 SIAM International Conference on Data Mining, SDM 2023, Minneapolis, United States of America, Apr 27 2023 - Apr 29 2023 (pp. 10-18). Society for Industrial and Applied Mathematics
Open this publication in new window or tab >>Influence without Authority: Maximizing Information Coverage in Hypergraphs
2023 (English)In: 2023 SIAM International Conference on Data Mining, SDM 2023, Society for Industrial and Applied Mathematics, 2023, p. 10-18Conference paper, Published paper (Refereed)
Abstract [en]

In many social networks, besides peer-to-peer communication, people share information via groups. An interesting problem arises in this scenario: for such networks, which are the best groups to start information diffusion so that the number of eventually informed nodes can be maximized? In this study, we formulate a novel information coverage maximization problem in the context of hypergraphs, wherein nodes are connected by arbitrary-size hyperedges (i.e., groups). In contrast to the existing literature on influence maximization, which aims to find authority nodes with high influence, we are interested in identifying the key groups. To address this problem, we present a new information diffusion model for hypergraphs, namely HypergraphIndependent-Cascade (HIC). HIC generalizes the popular independent cascade model to hypergraphs to allow capturing group-level information diffusion. We prove the NP-hardness of the proposed problem under HIC, and the submodular monotone property of the information coverage function. Further, inspired by the Degree Discount algorithm, we derive a new heuristic method named Influence Discount (InfDis). Extensive experiments provide empirical evidence for the effectiveness and efficiency of our approach.

Place, publisher, year, edition, pages
Society for Industrial and Applied Mathematics, 2023
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-350606 (URN)001284687600024 ()2-s2.0-85167355714 (Scopus ID)
Conference
2023 SIAM International Conference on Data Mining, SDM 2023, Minneapolis, United States of America, Apr 27 2023 - Apr 29 2023
Note

Part of ISBN 9781611977653

QC 20240718

Available from: 2024-07-18 Created: 2024-07-18 Last updated: 2024-09-27Bibliographically approved
Adriaens, F., Wang, H. & Gionis, A. (2023). Minimizing hitting time between disparate groups with shortcut edges. In: KDD 2023: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. Paper presented at 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2023, Long Beach, United States of America, Aug 6 2023 - Aug 10 2023 (pp. 1-10). Association for Computing Machinery (ACM)
Open this publication in new window or tab >>Minimizing hitting time between disparate groups with shortcut edges
2023 (English)In: KDD 2023: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Association for Computing Machinery (ACM) , 2023, p. 1-10Conference paper, Published paper (Refereed)
Abstract [en]

Structural bias or segregation of networks refers to situations where two or more disparate groups are present in the network, so that the groups are highly connected internally, but loosely connected to each other. Examples include polarized communities in social networks, antagonistic content in video-sharing or news-feed platforms, etc. In many cases it is of interest to increase the connectivity of disparate groups so as to, e.g., minimize social friction, or expose individuals to diverse viewpoints. A commonly-used mechanism for increasing the network connectivity is to add edge shortcuts between pairs of nodes. In many applications of interest, edge shortcuts typically translate to recommendations, e.g., what video to watch, or what news article to read next. The problem of reducing structural bias or segregation via edge shortcuts has recently been studied in the literature, and random walks have been an essential tool for modeling navigation and connectivity in the underlying networks. Existing methods, however, either do not offer approximation guarantees, or engineer the objective so that it satisfies certain desirable properties that simplify the optimization task. In this paper we address the problem of adding a given number of shortcut edges in the network so as to directly minimize the average hitting time and the maximum hitting time between two disparate groups. The objectives we study are more natural than objectives considered earlier in the literature (e.g., maximizing hitting-time reduction) and the optimization task is significantly more challenging. Our algorithm for minimizing average hitting time is a greedy bicriteria that relies on supermodularity. In contrast, maximum hitting time is not supermodular. Despite, we develop an approximation algorithm for that objective as well, by leveraging connections with average hitting time and the asymmetric k-center problem.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2023
Keywords
edge augmentation, polarization, random walks, social networks
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-337891 (URN)10.1145/3580305.3599434 (DOI)2-s2.0-85171335636 (Scopus ID)
Conference
29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2023, Long Beach, United States of America, Aug 6 2023 - Aug 10 2023
Note

Part of ISBN 9798400701030

QC 20231010

Available from: 2023-10-10 Created: 2023-10-10 Last updated: 2024-12-03Bibliographically approved
Huang, C., Pan, L., Yang, Q., Wang, H. & Shao, J. (2021). Flexible, Robust, Scalable Semi-supervised Learning via Reliability Propagation. In: Bailey, J Miettinen, P Koh, YS Tao, D Wu, X (Ed.), 2021 IEEE International Conference on Data Mining (ICDM): . Paper presented at 21st IEEE International Conference on Data Mining, ICDM 2021, Virtual, Online, 7 December 2021 through 10 December 2021 (pp. 200-209). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Flexible, Robust, Scalable Semi-supervised Learning via Reliability Propagation
Show others...
2021 (English)In: 2021 IEEE International Conference on Data Mining (ICDM) / [ed] Bailey, J Miettinen, P Koh, YS Tao, D Wu, X, Institute of Electrical and Electronics Engineers (IEEE) , 2021, p. 200-209Conference paper, Published paper (Refereed)
Abstract [en]

Semi-supervised learning aims to generate a model with a better performance using plenty of unlabeled data. However, most existing methods treat unlabeled data equally without considering whether it is safe or not, which may lead to the degradation of prediction performance. In this paper, towards reliable semi-supervised learning, we propose a data-driven algorithm, called Reliability Propagation (RP), to learn the reliability of each unlabeled instance. The basic idea is to take local label regularity as a prior, and then perform reliability propagation on an adaptive graph. As a result, the most reliable unlabeled instances could be selected to construct a safer classifier. Beyond, the distributed RP algorithm is introduced to scale up to large volumes of data. In contrast to existing approaches, RP exploits the structural information and shed light on the soft instance selection for unlabeled data in a classifier-independent way. Experiments on both synthetic and real-world data have demonstrated that RP allows extracting most reliable unlabeled instances and supports a gained prediction performance compared to other algorithms.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2021
Series
IEEE International Conference on Data Mining, ISSN 1550-4786
Keywords
Semi-supervised learning, classification, reliability propagation
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-311677 (URN)10.1109/ICDM51629.2021.00030 (DOI)000780454100021 ()2-s2.0-85125192455 (Scopus ID)
Conference
21st IEEE International Conference on Data Mining, ICDM 2021, Virtual, Online, 7 December 2021 through 10 December 2021
Note

Part of proceedings: ISBN 978-1-6654-2398-4

QC 20220502

Available from: 2022-05-02 Created: 2022-05-02 Last updated: 2022-06-25Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0009-0008-6463-392X

Search in DiVA

Show all publications