kth.sePublications KTH
7891011121310 of 17
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Fairness and Diversity-Aware Algorithms: Ranking, Streaming, and Graph Analysis
KTH, School of Electrical Engineering and Computer Science (EECS), Theoretical Computer Science.ORCID iD: 0009-0008-6463-392X
2026 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

As algorithmic systems increasingly shape human experiences, ensuring fairness and diversity has become a central challenge. This thesis studies fairness and diversity through the lens of algorithm design and optimization theory, providing formal frameworks and efficient algorithms across three domains: ranking-based recommendation, streaming recommendation, and graph analysis.

The first part of the thesis investigates diversity maximization in recommender systems with stochastic user engagement. We first study how to rank items in recommendation systems, where users engage with content sequentially and probabilistically. We introduce two novel diversity measures, sequential sum diversity and sequential coverage diversity, which account for uncertainty in user engagement. Our goal is to find a ranking of items that maximizes these sequential diversity measures. We show that sequential coverage diversity is ordered submodular, enabling a greedy 1/2-approximation. For sequential sum diversity, we provide polynomial-time constant-factor approximation algorithms. Separately, we study a streaming setting where items arrive continuously and users may visit the system multiple times at arbitrary moments. For this setting, we aim to design a streaming algorithm that maximizes a stochastic coverage diversity measure. We show that a classic greedy algorithm achieves a tight 1/2-competitive ratio but requires memory linear in the stream length. With sublinear memory and an upper bound T' on the number of user visits T, we propose STORM, which achieves a 1/4(T'-T+1)-competitive ratio. We further propose STORM++, improving the competitive ratio to 1/8delta, where the integer parameter delta controls the tradeoff between solution quality and computational cost.

The second part of the thesis studies diversity as a constraint in densest subgraph discovery and addresses the problem of finding dense communities in networks with heterogeneous relationship types. We model relationship types as edge colors and formulate the At Least h Colored Edges Densest Subgraph problem (ALHCEDGESDSP), which seeks subgraphs that are both dense and contain at least h_i edges of each color i. We prove that even the simplest variant of this problem is NP-hard and develop constant-factor approximation algorithms. Our key technical contribution links the edge-constrained and node-constrained versions of the densest subgraph problem. We first show that algorithms for the At Least k Nodes Densest Subgraph problem (DalkS) can approximate the At Least h Edges Densest Subgraph problem (ATLEASTHEDGESDSP), and then extend the algorithm for DalkS to handle colored edge constraints for solving ALHCEDGESDSP.

The third part of the thesis studies graph interventions for fairness in networks. We examine two fairness measures, PageRank fairness and hitting-time fairness, developing methods to balance influence and improve accessibility across groups. For each demographic group, the sum of PageRank scores within it quantifies the influence of that group. PageRank fairness measures how far the current group-wise influence deviates from a given target. We formulate the PageRank fairness problem as an optimization problem that adjusts edge weights such that the resulting graph achieves a group-wise influence distribution as close to the target as possible. The optimization problem involves a nonconvex objective over a convex feasible set under practical constraints, such as not adding new edges and limiting the magnitude of weight changes. We solve this PageRank fairness maximization problem using efficient projected gradient descent, proving convergence to a stationary point. For hitting-time fairness in bipartite graphs, we formulate two problems, minimizing the average (BMAH) and the maximum hitting time (BMMH) from one group to another via strategic edge additions. We provide a (2+epsilon)-approximation for BMAH by combining fast random walk simulation with greedy supermodular minimization. For the more challenging BMMH problem, we develop two approaches: the first leverages its connection to the BMAH problem, and the second employs a method based on the asymmetric k-center problem. Both approaches yield provable approximation guarantees for BMMH.

The algorithms and analysis techniques presented in this thesis contribute to the growing body of work on fairness and diversity in algorithmic systems. By formalizing new problem variants that capture realistic constraints in interactive and networked settings, and by providing approximation algorithms with provable guarantees, this work expands the toolkit available for addressing fairness and diversity challenges in computational systems.

Abstract [sv]

Eftersom algoritmiska system i allt högre grad formar mänskliga upplevelser har det blivit en central utmaning att säkerställa rättvisa och mångfald. Denna avhandling studerar rättvisa och mångfald genom algoritm design och optimerings teori, och tillhandahåller formella ramverk och effektiva algoritmer inom tre domäner: rankningsbaserad rekommendation, strömmande rekommendation och grafanalys.

Den första delen av avhandlingen undersöker mångfaldsmaximering i rekommendationssystem med stokastiskt användarengagemang. Vi studerar först hur man rangordnar objekt i rekommendationssystem, där användare engagerar sig med innehåll sekventiellt och probabilistiskt. Vi introducerar två nya mångfaldsmått, sekventiell summa-mångfald och sekventiell täcknings-mångfald, som tar hänsyn till osäkerhet i användarengagemang. Vårt mål är att hitta en rangordning av objekt som maximerar dessa sekventiella mångfaldsmått. Vi visar att sekventiell täcknings-mångfald är ordnad submodulär, vilket möjliggör en girig 1/2-approximation. För sekventiell summa-mångfald tillhandahåller vi polynomtids konstant-faktor approximationsalgoritmer. Separat studerar vi en strömmande miljö där objekt anländer kontinuerligt och användare kan besöka systemet flera gånger vid godtyckliga tidpunkter. För denna miljö strävar vi efter att designa en strömmande algoritm som maximerar ett stokastiskt täcknings-mångfaldsmått. Vi visar att en klassisk girig algoritm uppnår ett tight 1/2-konkurrensförhållande men kräver minne linjärt i strömlängden. Med sublinjärt minne och en övre gräns T' på antalet användarbesök T, föreslår vi STORM, som uppnår ett 1/4(T'-T+1)-konkurrensförhållande. Vi föreslår vidare STORM++ som förbättrar konkurrensförhållandet till 1/8delta, där heltalsparametern delta kontrollerar avvägningen mellan lösningskvalitet och beräkningskostnad.

Den andra delen av avhandlingen studerar mångfald som en bivillkor i upptäckt av tätaste delgrafer och behandlar problemet med att hitta täta gemenskaper i nätverk med heterogena relationstyper. Vi modellerar relationstyper som kantfärger och formulerar problemet At Least h Colored Edges Densest Subgraph (ALHCEDGESDSP), som söker delgrafer som är både täta och innehåller åtminstone hi kanter av varje färg i. Vi bevisar att även den enklaste varianten av detta problem är NP-svår och utvecklar konstant-faktor approximationsalgoritmer. Vårt viktigaste tekniska bidrag kopplar samman kant-begränsade och nod-begränsade versionerna av tätaste delgraf-problemet. Vi visar först att algoritmer för problemet At Least k Nodes Densest Subgraph (DalkS) kan approximera problemet At Least h edges Densest Subgraph (ATLEASTHEDGESDSP), och utökar sedan algoritmen för DalkS för att hantera färgade kantbegränsningar för att lösa ALHCEDGESDSP.

Den tredje delen av avhandlingen studerar grafinterventioner för rättvisa i nätverk. Vi undersöker två rättvisemått, PageRank-rättvisa och träfftids-rättvisa, och utvecklar metoder för att balansera inflytande och förbättra tillgänglighet mellan grupper. För varje demografisk grupp kvantifierar summan av PageRank-poäng inom den gruppens inflytande. PageRank-rättvisa mäter hur långt den nuvarande gruppvisa influensfördelningen avviker från ett givet mål. Vi formulerar PageRank-rättvisaproblemet som ett optimeringsproblem som justerar kantvikter så att den resulterande grafen uppnår en gruppvis influensfördelning så nära målet som möjligt. Optimeringsproblemet involverar ett icke-konvext mål över en konvex genomförbar mängd under praktiska begränsningar, såsom att inte lägga till nya kanter och begränsa kantviktningsskalan. Vi löser detta PageRank rättvisemaximerings-problem med hjälp av effektiv projicerad gradientnedstigning, och bevisar konvergens till en stationär punkt. För träfftids-rättvisa i bipartita grafer formulerar vi två problem, att minimera medelvärdet (BMAH) och det maximala träfftidsvärdet (BMMH) från en grupp till en annan via strategiska kanttillägg. Vi tillhandahåller en (2+epsilon)-approximation för BMAH genom att kombinera snabb slumpvandrings-simulering med girig supermodulär minimering. För det mer utmanande BMMH-problemet utvecklar vi två tillvägagångssätt, det första utnyttjar dess koppling till BMAH-problemet, och det andra använder en metod baserad på det asymmetriska k center-problemet. Båda tillvägagångssätten ger bevisliga approximationsgarantier för BMMH.

Algoritmerna och analystekniker som presenteras i denna avhandling bidrar till den växande mängden arbete om rättvisa och mångfald i algoritmiska system. Genom att formalisera nya problemvarianter som fångar realistiska begränsningar i interaktiva och nätverksbaserade miljöer, och genom att tillhandahåller approximationsalgoritmer med bevisliga garantier, utökar detta arbete verktygslådan som finns tillgänglig för att hantera utmaningar kring rättvisa och mångfald i beräkningssystem.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2026. , p. xii, 61
Series
TRITA-EECS-AVL ; 2026:34
Keywords [en]
Submodular optimization, Diversity maximization, Graph intervention, PageRank, Fairness, Diversity
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-379082ISBN: 978-91-8106-581-7 (print)OAI: oai:DiVA.org:kth-379082DiVA, id: diva2:2051542
Public defence
2026-05-05, F3 (Flodis), Lindstedtsvägen 26, Stockholm, 14:00 (English)
Opponent
Supervisors
Note

QC 20260408

Available from: 2026-04-08 Created: 2026-04-08 Last updated: 2026-04-08Bibliographically approved
List of papers
1. Sequential Diversification with Provable Guarantees
Open this publication in new window or tab >>Sequential Diversification with Provable Guarantees
2025 (English)In: WSDM2025: Proceedings of the Eighteenth ACM International Conference onWeb Search and Data Mining, Association for Computing Machinery (ACM) , 2025, p. 345-353Conference paper, Published paper (Refereed)
Abstract [en]

Diversification is a useful tool for exploring large collections of information items. It has been used to reduce redundancy and cover multiple perspectives in information-search settings. Diversification finds applications in many different domains, including presenting search results of information-retrieval systems and selecting suggestions for recommender systems.

Interestingly, existing measures of diversity are defined over sets of items, rather than evaluating sequences of items. This design choice comes in contrast with commonly-used relevance measures, which are distinctly defined over sequences of items, taking into account the ranking of items. The importance of employing sequential measures is that information items are almost always presented in a sequential manner, and during their information-exploration activity users tend to prioritize items with higher ranking.

In this paper, we study the problem of maximizing sequential diversity. This is a new measure of diversity, which accounts for the ranking of the items, and incorporates item relevance and user behavior. The overarching framework can be instantiated with different diversity measures, and here we consider the measures of sum diversity and coverage diversity. The problem was recently proposed by Coppolillo et al. [11], where they introduce empirical methods that work well in practice. Our paper is a theoretical treatment of the problem: we establish the problem hardness and present algorithms with constant approximation guarantees for both diversity measures we consider. Experimentally, we demonstrate that our methods are competitive against strong baselines.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2025
Keywords
Diversification, Ranking algorithms, Approximation algorithms
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-379074 (URN)10.1145/3701551.3703531 (DOI)001476971200037 ()2-s2.0-105001669833 (Scopus ID)
Conference
Proceedings of the Eighteenth ACM International Conference on Web Search and Data Mining (WSDM '25), March 10--14, 2025, Hannover, Germany
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP), 834862
Note

QC 20260408

Available from: 2026-04-08 Created: 2026-04-08 Last updated: 2026-04-08Bibliographically approved
2. Streaming Stochastic Submodular Maximization with On-Demand User Requests
Open this publication in new window or tab >>Streaming Stochastic Submodular Maximization with On-Demand User Requests
2025 (English)In: Neurips2025: 39th Conference on Neural Information Processing Systems, 2025, p. 1-31Conference paper, Published paper (Refereed)
Abstract [en]

We explore a novel problem in streaming submodular maximization, inspired by the dynamics of news-recommendation platforms. We consider a setting where users can visit a news website at any time, and upon each visit, the website must display up to k news items. User interactions are inherently stochastic: each news item presented to the user is consumed with a certain acceptance probability by the user, and each news item covers certain topics. Our goal is to design a streaming algorithm that maximizes the expected total topic coverage.

To address this problem, we establish a connection to submodular maximization subject to a matroid constraint. We show that we can effectively adapt previous methods to address our problem when the number of user visits is known in advance or linear-size memory in the stream length is available. However, in more realistic scenarios where only an upper bound on the visits and sublinear memory is available, the algorithms fail to guarantee any bounded performance. To overcome these limitations, we introduce a new online streaming algorithm that achieves a competitive ratio of 1/(8δ), where δ controls the approximation quality. Moreover, it requires only a single pass over the stream, and uses memory independent of the stream length. Empirically, our algorithms consistently outperform the baselines.

Keywords
Streaming Submodular Optimization
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-379077 (URN)
Conference
The Thirty-Ninth Annual Conference on Neural Information Processing Systems, San Diego, CA, Dec 2nd-7th, 2025
Note

QC 20260408

Available from: 2026-04-08 Created: 2026-04-08 Last updated: 2026-04-08Bibliographically approved
3. Finding Densest Subgraphs with Edge-Color Constraints
Open this publication in new window or tab >>Finding Densest Subgraphs with Edge-Color Constraints
2024 (English)In: WWW 2024 - Proceedings of the ACM Web Conference, Association for Computing Machinery (ACM) , 2024, p. 936-947Conference paper, Published paper (Refereed)
Abstract [en]

We consider a variant of the densest subgraph problem in networks with single or multiple edge attributes. For example, in a social network, the edge attributes may describe the type of relationship between users, such as friends, family, or acquaintances, or different types of communication. For conceptual simplicity, we view the attributes as edge colors. The new problem we address is to find a diverse densest subgraph that fulfills given requirements on the numbers of edges of specific colors. When searching for a dense social network community, our problem will enforce the requirement that the community is diverse according to criteria specified by the edge attributes. We show that the decision versions for finding exactly, at most, and at least h colored edges densest subgraph, where h is a vector of color requirements, are NP-complete, for already two colors. For the problem of finding a densest subgraph with at least h colored edges, we provide a linear-time constant-factor approximation algorithm when the input graph is sparse. On the way, we introduce the related at least h (non-colored) edges densest subgraph problem, show its hardness, and also provide a linear-time constant-factor approximation. In our experiments, we demonstrate the efficacy and efficiency of our new algorithms.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2024
Keywords
densest subgraph, density, diversity, social networks
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-347330 (URN)10.1145/3589334.3645647 (DOI)2-s2.0-85194105710 (Scopus ID)
Conference
33rd ACM Web Conference, WWW 2024, Singapore, Singapore, May 13 2024 - May 17 2024
Note

Part of ISBN [9798400701719]QC 20240612

Available from: 2024-06-10 Created: 2024-06-10 Last updated: 2026-04-08Bibliographically approved
4. Fairness-aware PageRank via Edge Reweighting
Open this publication in new window or tab >>Fairness-aware PageRank via Edge Reweighting
2026 (English)In: WSDM 2026: Proceedings of the Nineteenth ACM International Conference on Web Search and Data Mining, Association for Computing Machinery (ACM) , 2026Conference paper, Published paper (Refereed)
Abstract [en]

Link-analysis algorithms, such as PageRank, are instrumental in understanding the structural dynamics of networks by evaluating the importance of individual vertices based on their connectivity. Recently, with the rising importance of responsible AI, the question of fairness in link-analysis algorithms has gained traction.

In this paper, we present a new approach for incorporating group fairness into the PageRank algorithm by reweighting the transition probabilities in the underlying transition matrix. We formulate the problem of achieving fair PageRank by seeking to minimize the fairness loss, which is the difference between the original group-wise PageRank distribution and a target PageRank distribution. We further define a group-adapted fairness notion, which accounts for group homophily by considering random walks with group-biased restart for each group. Since the fairness loss is non-convex, we propose an efficient projected gradient-descent method for computing locally-optimal edge weights. Unlike earlier approaches, we do not recommend adding new edges to the network, nor do we adjust the restart vector. Instead, we keep the topology of the underlying network unchanged and only modify the relative importance of existing edges. We empirically compare our approach with state-of-the-art baselines and demonstrate the efficacy of our method, where very small changes in the transition matrix lead to significant improvement in the fairness of the PageRank algorithm.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2026
Keywords
Link analysis, PageRank, Graph analysis, Fairness
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-379076 (URN)10.1145/3773966.3777991 (DOI)2-s2.0-105033148907 (Scopus ID)
Conference
Proceedings of the Nineteenth ACM International Conference on Web Search and Data Mining (WSDM '26), February 22-26, 2026, Boise, ID, USA
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

QC 20260408

Available from: 2026-04-08 Created: 2026-04-08 Last updated: 2026-04-08Bibliographically approved
5. Minimizing hitting time between disparate groups with shortcut edges
Open this publication in new window or tab >>Minimizing hitting time between disparate groups with shortcut edges
2023 (English)In: KDD 2023: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Association for Computing Machinery (ACM) , 2023, p. 1-10Conference paper, Published paper (Refereed)
Abstract [en]

Structural bias or segregation of networks refers to situations where two or more disparate groups are present in the network, so that the groups are highly connected internally, but loosely connected to each other. Examples include polarized communities in social networks, antagonistic content in video-sharing or news-feed platforms, etc. In many cases it is of interest to increase the connectivity of disparate groups so as to, e.g., minimize social friction, or expose individuals to diverse viewpoints. A commonly-used mechanism for increasing the network connectivity is to add edge shortcuts between pairs of nodes. In many applications of interest, edge shortcuts typically translate to recommendations, e.g., what video to watch, or what news article to read next. The problem of reducing structural bias or segregation via edge shortcuts has recently been studied in the literature, and random walks have been an essential tool for modeling navigation and connectivity in the underlying networks. Existing methods, however, either do not offer approximation guarantees, or engineer the objective so that it satisfies certain desirable properties that simplify the optimization task. In this paper we address the problem of adding a given number of shortcut edges in the network so as to directly minimize the average hitting time and the maximum hitting time between two disparate groups. The objectives we study are more natural than objectives considered earlier in the literature (e.g., maximizing hitting-time reduction) and the optimization task is significantly more challenging. Our algorithm for minimizing average hitting time is a greedy bicriteria that relies on supermodularity. In contrast, maximum hitting time is not supermodular. Despite, we develop an approximation algorithm for that objective as well, by leveraging connections with average hitting time and the asymmetric k-center problem.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2023
Keywords
edge augmentation, polarization, random walks, social networks
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-337891 (URN)10.1145/3580305.3599434 (DOI)001118896300001 ()2-s2.0-85171335636 (Scopus ID)
Conference
29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2023, Long Beach, United States of America, Aug 6 2023 - Aug 10 2023
Note

Part of ISBN 9798400701030

QC 20231010

Available from: 2023-10-10 Created: 2023-10-10 Last updated: 2026-04-08Bibliographically approved

Open Access in DiVA

Kappa summary(1422 kB)15 downloads
File information
File name FULLTEXT01.pdfFile size 1422 kBChecksum SHA-512
302fd3850db049024ced02ac4fb8686bf38d77893de5bcdd0528f3a0e64ddac71728ab935ce7dff7a81160cc9327cc537265714dabceea7198d501eee40f049b
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Wang, Honglian
By organisation
Theoretical Computer Science
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 229 hits
7891011121310 of 17
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf