kth.sePublications KTH
Change search
Link to record
Permanent link

Direct link
Alternative names
Publications (10 of 71) Show all publications
Ciaperoni, M., Gionis, A. & Mannila, H. (2026). Sample and Expand: Discovering Low-Rank Submatrices With Quality Guarantees. In: Machine Learning and Knowledge Discovery in Databases. Research Track - European Conference, ECML PKDD 2025, Proceedings: . Paper presented at European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2025, Porto, Portugal, September 15-19, 2025 (pp. 78-95). Springer Nature
Open this publication in new window or tab >>Sample and Expand: Discovering Low-Rank Submatrices With Quality Guarantees
2026 (English)In: Machine Learning and Knowledge Discovery in Databases. Research Track - European Conference, ECML PKDD 2025, Proceedings, Springer Nature , 2026, p. 78-95Conference paper, Published paper (Refereed)
Abstract [en]

The problem of approximating a matrix by a low-rank one has been extensively studied. This problem assumes, however, that the whole matrix has a low-rank structure. This assumption is often false for real-world matrices. We consider the problem of discovering submatrices from the given matrix with bounded deviations from their low-rank approximations. We introduce an effective two-phase method for this task: first, we use sampling to discover small nearly low-rank submatrices, and then they are expanded while preserving proximity to a low-rank approximation. An extensive experimental evaluation confirms that the method we introduce compares favorably to existing approaches.

Place, publisher, year, edition, pages
Springer Nature, 2026
Keywords
Low-rank approximation, submatrix detection
National Category
Signal Processing
Identifiers
urn:nbn:se:kth:diva-372514 (URN)10.1007/978-3-032-06096-9_5 (DOI)2-s2.0-105018666993 (Scopus ID)
Conference
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2025, Porto, Portugal, September 15-19, 2025
Note

Part of ISBN 9783032060952

QC 20251107

Available from: 2025-11-07 Created: 2025-11-07 Last updated: 2025-11-07Bibliographically approved
Calikus, E., De Francisci Morales, G. & Gionis, A. (2026). Who is at Risk? Analyzing the Risk of Radicalization Among Reddit Users. In: Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track and Demo Track - European Conference, ECML PKDD 2025, Proceedings: . Paper presented at European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2025, Porto, Portugal, September 15-19, 2025 (pp. 375-392). Springer Nature
Open this publication in new window or tab >>Who is at Risk? Analyzing the Risk of Radicalization Among Reddit Users
2026 (English)In: Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track and Demo Track - European Conference, ECML PKDD 2025, Proceedings, Springer Nature , 2026, p. 375-392Conference paper, Published paper (Refereed)
Abstract [en]

Online radicalization is a growing societal concern. Extremist groups actively exploit online media to reach wide audiences, spreading ideologies that incite hate and violence. The lack of transparency and conscious use of social media worsens this issue, as users often remain unaware of being targeted by disinformation or radical propaganda. This work analyzes the risk of online radicalization and provides insights for individuals, platforms, and policymakers to mitigate its harmful effects. We conduct a data-driven study to analyze Reddit users’ radicalization risk. We build a temporal classification model using interpretable machine learning to predict the risk of radicalization with features based on recro, a recent social theory of Internet-mediated radicalization. Our findings reveal recro features are strong indicators, with features from later stages having greater influence. We also analyze risk distributions across communities, showing higher risk in controversial groups but also identifying extremists in generic and neutral communities. This result highlights the importance of critical thinking when engaging with online content.

Place, publisher, year, edition, pages
Springer Nature, 2026
Keywords
online safety, risk prediction, social media analysis
National Category
Information Systems, Social aspects Sociology (Excluding Social Work, Social Anthropology, Demography and Criminology)
Identifiers
urn:nbn:se:kth:diva-372800 (URN)10.1007/978-3-032-06129-4_22 (DOI)2-s2.0-105020023589 (Scopus ID)
Conference
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2025, Porto, Portugal, September 15-19, 2025
Note

Part of ISBN 9783032061287

QC 20251113

Available from: 2025-11-13 Created: 2025-11-13 Last updated: 2025-11-13Bibliographically approved
Dalleiger, S. & Gionis, A. (2025). Creating Coherence in Federated Non-Negative Matrix Factorization. In: : . Paper presented at 39th Annual AAAI Conference on Artificial Intelligence, AAAI 2025, Philadelphia, United States of America, Feb 25 2025 - Mar 4 2025 (pp. 16135-16143). Association for the Advancement of Artificial Intelligence (AAAI)
Open this publication in new window or tab >>Creating Coherence in Federated Non-Negative Matrix Factorization
2025 (English)Conference paper, Published paper (Refereed)
Abstract [en]

In many real-world applications, data is inherently decentralized, necessitating data analysis methods that prioritize privacy while delivering interpretable results. Federated Non-Negative Matrix Factorization (FedNMF) meets this requirement by factorizing latent components from distributed data that cannot be freely shared among clients. A significant challenge in FedNMF arises when clients converge on different solutions due to prolonged independent optimization, leading to drift and incoherent models. While Federated Learning (FL) typically mitigates drift through frequent synchronizations and strong regularization, it often overlooks critical properties of Non-Negative Matrix Factorization, such as permutation invariance. As a result, solutions from FedNMF clients may be misidentified by FL drift as distinct, despite being equivalent. Using an alignment-aware drift, we create coherence through proximal optimization and barycenter aggregation for FedNMF. We analyze the computational complexity of our approach, provide efficient heuristics, and ensure the convergence of our algorithms. On a diverse set of real-world and synthetic datasets, we demonstrate the effectiveness of our methods.

Place, publisher, year, edition, pages
Association for the Advancement of Artificial Intelligence (AAAI), 2025
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-363467 (URN)10.1609/aaai.v39i15.33772 (DOI)2-s2.0-105004003405 (Scopus ID)
Conference
39th Annual AAAI Conference on Artificial Intelligence, AAAI 2025, Philadelphia, United States of America, Feb 25 2025 - Mar 4 2025
Note

QC 20250519

Available from: 2025-05-15 Created: 2025-05-15 Last updated: 2025-05-19Bibliographically approved
Dalleiger, S. & Gionis, A. (2025). Creating Coherence in Federated Non-Negative Matrix Factorization. In: Walsh, T Shah, J Kolter, Z (Ed.), Thirty-Ninth Aaai Conference On Artificial Intelligence,  AAAI-25, VOL 39 NO 15: . Paper presented at 39th AAAI Conference on Artificial Intelligence, FEB 25-MAR 04, 2025, Philadelphia, PA (pp. 16135-16143). ASSOC ADVANCEMENT ARTIFICIAL INTELLIGENCE
Open this publication in new window or tab >>Creating Coherence in Federated Non-Negative Matrix Factorization
2025 (English)In: Thirty-Ninth Aaai Conference On Artificial Intelligence,  AAAI-25, VOL 39 NO 15 / [ed] Walsh, T Shah, J Kolter, Z, ASSOC ADVANCEMENT ARTIFICIAL INTELLIGENCE , 2025, p. 16135-16143Conference paper, Published paper (Refereed)
Abstract [en]

In many real-world applications, data is inherently decentralized, necessitating data analysis methods that prioritize privacy while delivering interpretable results. Federated Non-Negative Matrix Factorization (FedNMF) meets this requirement by factorizing latent components from distributed data that cannot be freely shared among clients. A significant challenge in FedNMF arises when clients converge on different solutions due to prolonged independent optimization, leading to drift and incoherent models. While Federated Learning (FL) typically mitigates drift through frequent synchronizations and strong regularization, it often overlooks critical properties of Non-Negative Matrix Factorization, such as permutation invariance. As a result, solutions from FedNMF clients may be misidentified by FL drift as distinct, despite being equivalent. Using an alignment-aware drift, we create coherence through proximal optimization and barycenter aggregation for FedNMF. We analyze the computational complexity of our approach, provide efficient heuristics, and ensure the convergence of our algorithms. On a diverse set of real-world and synthetic datasets, we demonstrate the effectiveness of our methods. Replication Material - doi.org/10.5281/zenodo.14501532

Place, publisher, year, edition, pages
ASSOC ADVANCEMENT ARTIFICIAL INTELLIGENCE, 2025
Series
AAAI Conference on Artificial Intelligence, ISSN 2159-5399
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-371923 (URN)001477532300097 ()
Conference
39th AAAI Conference on Artificial Intelligence, FEB 25-MAR 04, 2025, Philadelphia, PA
Note

QC 20251027

Available from: 2025-10-27 Created: 2025-10-27 Last updated: 2025-10-27Bibliographically approved
Zhang, G., Sarpe, I. & Gionis, A. (2025). Efficient and Practical Approximation Algorithms for Advertising in Content Feeds. In: WWW 2025 - Proceedings of the ACM Web Conference: . Paper presented at 34th ACM Web Conference, WWW 2025, Sydney, Australia, Apr 28 2025 - May 2 2025 (pp. 3193-3203). Association for Computing Machinery (ACM)
Open this publication in new window or tab >>Efficient and Practical Approximation Algorithms for Advertising in Content Feeds
2025 (English)In: WWW 2025 - Proceedings of the ACM Web Conference, Association for Computing Machinery (ACM) , 2025, p. 3193-3203Conference paper, Published paper (Refereed)
Abstract [en]

Content feeds provided by platforms such as X (formerly Twitter) and TikTok are consumed by users on a daily basis. In this paper, we revisit the native advertising problem in content feeds, initiated by Ieong et al. Given a sequence of organic items (e.g., videos or posts) relevant to a user’s interests or to an information search, the goal is to place ads within the organic content so as to maximize a reward function (e.g., number of clicks), while accounting for two considerations: (1) an ad can only be inserted after a relevant content item; (2) the users’ attention decays after consuming content or ads. These considerations provide a natural model for capturing both the advertisement effectiveness and the user experience. In this paper, we design fast and practical 2-approximation greedy algorithms for the associated optimization problem, improving over the best-known practical algorithm that only achieves an approximation factor of 4. Our algorithms exploit a counter-intuitive observation, namely, while top items are seemingly more important due to the decaying attention of the user, taking good care of the bottom items is key for obtaining improved approximation guarantees. We then provide the first comprehensive empirical evaluation on the problem, showing the strong empirical performance of our methods.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2025
Keywords
Ad Allocation, Approximation Algorithms, Externalities, Matching, Newsfeed Advertising
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-363994 (URN)10.1145/3696410.3714902 (DOI)2-s2.0-105005143402 (Scopus ID)
Conference
34th ACM Web Conference, WWW 2025, Sydney, Australia, Apr 28 2025 - May 2 2025
Note

Part of ISBN 9798400712746

QC 20250603

Available from: 2025-06-02 Created: 2025-06-02 Last updated: 2025-06-03Bibliographically approved
Gadekar, A., Gionis, A. & Thejaswi, S. (2025). Fair Clustering for Data Summarization: Improved Approximation Algorithms and Complexity Insights. In: WWW 2025 - Proceedings of the ACM Web Conference: . Paper presented at 34th ACM Web Conference, WWW 2025, Sydney, Australia, April 28 - May 2, 2025 (pp. 4458-4469). Association for Computing Machinery (ACM)
Open this publication in new window or tab >>Fair Clustering for Data Summarization: Improved Approximation Algorithms and Complexity Insights
2025 (English)In: WWW 2025 - Proceedings of the ACM Web Conference, Association for Computing Machinery (ACM) , 2025, p. 4458-4469Conference paper, Published paper (Refereed)
Abstract [en]

Data summarization tasks are often modeled as k-clustering problems, where the goal is to choose k data points, called cluster centers, that best represent the dataset by minimizing a clustering objective. A popular objective is to minimize the maximum distance between any data point and its nearest center, which is formalized as the k-center problem. While in some applications all data points can be chosen as centers, in the general setting, centers must be chosen from a predefined subset of points, referred as facilities or suppliers; this is known as the k-supplier problem. In this work, we focus on fair data summarization modeled as the fair k-supplier problem, where data consists of several groups, and a minimum number of centers must be selected from each group while minimizing the k-supplier objective. The groups can be disjoint or overlapping, leading to two distinct problem variants each with different computational complexity. We present 3-approximation algorithms for both variants, improving the previously known factor of 5. For disjoint groups, our algorithm runs in polynomial time, while for overlapping groups, we present a fixed-parameter tractable algorithm, where the exponential runtime depends only on the number of groups and centers. We show that these approximation factors match the theoretical lower bounds, assuming standard complexity theory conjectures. Finally, using an open-source implementation, we demonstrate the scalability of our algorithms on large synthetic datasets and assess the price of fairness on real-world data, comparing solution quality with and without fairness constraints.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2025
Keywords
Algorithmic bias, Algorithmic fairness, Fair clustering, Fair k-center, Fair k-supplier, Responsible computing
National Category
Computer Sciences Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-363997 (URN)10.1145/3696410.3714857 (DOI)2-s2.0-105005145530 (Scopus ID)
Conference
34th ACM Web Conference, WWW 2025, Sydney, Australia, April 28 - May 2, 2025
Note

Part of ISBN 9798400712746

QC 20250603

Available from: 2025-06-02 Created: 2025-06-02 Last updated: 2025-06-03Bibliographically approved
Vombatkere, K., Gionis, A. & Terzi, E. (2025). Forming coordinated teams that balance task coverage and expert workload. Data mining and knowledge discovery, 39(3), Article ID 19.
Open this publication in new window or tab >>Forming coordinated teams that balance task coverage and expert workload
2025 (English)In: Data mining and knowledge discovery, ISSN 1384-5810, E-ISSN 1573-756X, Vol. 39, no 3, article id 19Article in journal (Refereed) Published
Abstract [en]

We study a new formulation of the team-formation problem, where the goal is to form teams to work on a given set of tasks requiring different skills. Deviating from the classic problem setting where one is asking to cover all skills of each given task, we aim to cover as many skills as possible while also trying to minimize the maximum workload among the experts. We do this by combining penalization terms for the coverage and load constraints into one objective. We call the corresponding assignment problem Balanced-Coverage, and show that it is NP-hard. We also consider a variant of this problem, where the experts are organized into a graph, which encodes how well they work together. Utilizing such a coordination graph, we aim to find teams to assign to tasks such that each team’s radius does not exceed a given threshold. We refer to this problem as Network-Balanced-Coverage. We develop a generic template algorithm for approximating both problems in polynomial time, and we show that our template algorithm for Balanced-Coverage has provable guarantees. We describe a set of computational speedups that we can apply to our algorithms and make them scale for reasonably large datasets. From the practical point of view, we demonstrate how to efficiently tune the two parts of the objective and tailor their importance to a particular application. Our experiments with a variety of real-world datasets demonstrate the utility of our problem formulation as well as the efficiency of our algorithms in practice.

Place, publisher, year, edition, pages
Springer Nature, 2025
Keywords
Data mining algorithms, Greedy, Social network, Submodular optimization, Team formation
National Category
Computer Sciences Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-361452 (URN)10.1007/s10618-025-01090-x (DOI)001439607800001 ()2-s2.0-86000301651 (Scopus ID)
Note

QC 20250319

Available from: 2025-03-19 Created: 2025-03-19 Last updated: 2025-03-19Bibliographically approved
Tu, S., Stankovic, A., Neumann, S. & Gionis, A. (2025). Optirefine: densest subgraphs and maximum cuts with k refinements. Data mining and knowledge discovery, 39(6), Article ID 82.
Open this publication in new window or tab >>Optirefine: densest subgraphs and maximum cuts with k refinements
2025 (English)In: Data mining and knowledge discovery, ISSN 1384-5810, E-ISSN 1573-756X, Vol. 39, no 6, article id 82Article in journal (Refereed) Published
Abstract [en]

Data-analysis tasks often involve an iterative process, which requires refining previous solutions. For instance, when analyzing social networks, we may obtain initial communities based on noisy metadata, and we want to improve them by adding influential nodes and removing non-important ones, without making too many changes. However, classic optimization algorithms, which typically find solutions from scratch, potentially return communities that are very dissimilar to the initial one. To mitigate these issues, we introduce the OptiRefine framework. The framework optimizes initial solutions by making a small number of refinements, thereby ensuring that the new solution remains close to the initial solution and simultaneously achieving a near-optimal solution for the optimization problem. We apply the OptiRefine framework to two classic graph-optimization problems: densest subgraph and maximum cut. For the densest-subgraph problem, we optimize a given subgraph’s density by adding or removing k nodes. We show that this novel problem is a generalization of k-densest subgraph, and provide constant-factor approximation algorithms for refinements. We also study a version of maximum cut in which the goal is to improve a given cut. We provide connections to the maximum cut with cardinality constraints and provide an optimal approximation algorithm in most parameter regimes under the Unique Games Conjecture for refinements. We evaluate our theoretical methods and scalable heuristics on synthetic and real-world data and show that they are highly effective in practice.

Place, publisher, year, edition, pages
Springer Nature, 2025
Keywords
Densest subgraph, Discrete optimization, Maximum cut, Semidefinite programming, Sum-of-squares
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-370712 (URN)10.1007/s10618-025-01142-2 (DOI)001568937400001 ()2-s2.0-105015555730 (Scopus ID)
Note

Not duplicate with DiVA 1958027

QC 20250930

Available from: 2025-09-30 Created: 2025-09-30 Last updated: 2025-09-30Bibliographically approved
Preti, G., Riondato, M., Gionis, A. & Morales, G. D. (2025). POLARIS: Sampling from the Multigraph Configuration Model with Prescribed Color Assortativity. In: Proceedings Of The Eighteenth Acm International Conference On Web Search And Data Mining, Wsdm 2025: . Paper presented at 18th International Conference on Web Search and Data Mining-WSDM, MAR 10-14, 2025, Hannover, GERMANY (pp. 30-39). Association for Computing Machinery (ACM)
Open this publication in new window or tab >>POLARIS: Sampling from the Multigraph Configuration Model with Prescribed Color Assortativity
2025 (English)In: Proceedings Of The Eighteenth Acm International Conference On Web Search And Data Mining, Wsdm 2025, Association for Computing Machinery (ACM) , 2025, p. 30-39Conference paper, Published paper (Refereed)
Abstract [en]

We introduce POLARIS, a network null model for colored multigraphs that preserves the Joint Color Matrix. POLARIS is specifically designed for studying network polarization, where vertices belong to a side in a debate or a partisan group, represented by a vertex color, and relations have different strengths, represented by an integer-valued edge multiplicity. The key feature of POLARIS is preserving the Joint Color Matrix (JCM) of the multigraph, which specifies the number of edges connecting vertices of any two given colors. The JCM is the basic property that determines color assortativity, a fundamental aspect in studying homophily and segregation in polarized networks. By using POLARIS, network scientists can test whether a phenomenon is entirely explained by the JCM of the observed network or whether other phenomena might be at play. Technically, our null model is an extension of the configuration model: an ensemble of colored multigraphs characterized by the same degree sequence and the same JCM.To sample from this ensemble, we develop a suite of Markov Chain Monte Carlo algorithms, collectively named POLARIS-*. It includes POLARIS-B, an adaptation of a generic Metropolis-Hastings algorithm, and POLARIS.C, a faster, specialized algorithm with higher acceptance probabilities. This new null model and the associated algorithms provide a more nuanced toolset for examining polarization in social networks, thus enabling statistically sound conclusions.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2025
Keywords
Hypothesis Testing, Null Model, Polarization
National Category
Mathematical sciences
Identifiers
urn:nbn:se:kth:diva-366054 (URN)10.1145/3701551.3703560 (DOI)001476971200004 ()2-s2.0-105001669828 (Scopus ID)
Conference
18th International Conference on Web Search and Data Mining-WSDM, MAR 10-14, 2025, Hannover, GERMANY
Note

Part of ISBN 979-8-4007-1329-3

QC 20250703

Available from: 2025-07-03 Created: 2025-07-03 Last updated: 2025-08-22Bibliographically approved
Thejaswi, S., Lauri, J. & Gionis, A. (2025). Restless reachability problems in temporal graphs. Knowledge and Information Systems
Open this publication in new window or tab >>Restless reachability problems in temporal graphs
2025 (English)In: Knowledge and Information Systems, ISSN 0219-1377, E-ISSN 0219-3116Article in journal (Refereed) Published
Abstract [en]

We study a family of reachability problems under waiting-time restrictions in temporal and vertex-colored temporal graphs. Given a temporal graph and a set of source vertices, we find the set of vertices that are reachable from a source via a time-respecting path, where the difference in timestamps between consecutive edges is at most a resting time. Given a vertex-colored temporal graph and a multiset query of colors, we find the set of vertices reachable from a source via a time-respecting path such that the vertex colors of the path agree with the multiset query and the difference in timestamps between consecutive edges is at most a resting time. These kinds of problems have applications in understanding the spread of a disease in a network, tracing contacts in epidemic outbreaks, finding signaling pathways in the brain network, and recommending tours for tourists, among others. We present an algebraic algorithmic framework based on constrained multilinear sieving for solving the restless reachability problems we propose. In particular, parameterized by the length k of a path sought, we show that the proposed problems can be solved in O(2(k)m Delta) time and O(n Delta) space, where n is the number of vertices, m the number of edges, and Delta the maximum resting time of an input temporal graph. The approach can be extended to extract paths and connected subgraphs in both static and temporal graphs, thus improving the work of Bj & ouml;rklund et al. (in Proceedings of the European symposium on algorithms, 2014) and Thejaswi et al. (Big Data 8:335-362, 2020). In addition, we prove that our algorithms for the restless reachability problems in vertex-colored temporal graphs are optimal under plausible complexity-theoretic assumptions. Finally, with an open-source implementation, we demonstrate that our algorithm scales to large graphs with up to one billion temporal edges, despite the problems being NP-hard. Specifically, we present extensive experiments to evaluate our scalability claims both on synthetic and on real-world graphs. Our implementation is efficiently engineered and highly optimized. For instance, we can solve the restless reachability problem by restricting the path length to 9 in a real-world graph dataset with over 36 million directed edges in less than one hour on a commodity desktop with a 4-core Haswell CPU.

Place, publisher, year, edition, pages
Springer Nature, 2025
Keywords
Algebraic fingerprinting, Multilinear sieving, Restless paths, Restless reachability, Temporal paths, Temporal reachability
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-362965 (URN)10.1007/s10115-025-02405-6 (DOI)001456994000001 ()2-s2.0-105001652273 (Scopus ID)
Note

QC 20250505

Available from: 2025-05-05 Created: 2025-05-05 Last updated: 2025-05-05Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-5211-112X

Search in DiVA

Show all publications