kth.sePublications KTH
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Inference and Online Learning in Structured Stochastic Systems
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Decision and Control Systems (Automatic Control).ORCID iD: 0000-0002-7106-3039
2023 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

This thesis contributes to the field of stochastic online learning problems, with a collection of six papers each addressing unique aspects of online learning and inference problems under specific structures. The first four papers focus on exploration and inference problems, uncovering fundamental information-theoretic limits and efficient algorithms under various structures. The last two papers focus on maximizing rewards by efficiently leveraging these structures.

The first paper addresses the complex problem of learning to cluster items based on binary user feedback for multiple questions. It establishes information-theoretical error lower bounds for both uniform and adaptive selection strategies under a fixed budget of rounds or users, and proposes an adaptive algorithm that efficiently allocates the budget.The second paper tackles the challenge of uncovering hidden communities in the Labeled Stochastic Block Model using single-shot observations of labels. It introduces a computationally efficient algorithm, Instance-Adaptive Clustering, which is the first to match instance-specific lower bounds on the expected number of misclassified items.The third paper delves into the best-arm identification or simple regret minimization problem within a Bayesian setting. It takes into consideration a prior distribution for the bandit problem and the expectation of simple regret with respect to that distribution, defining it as Bayesian simple regret.It characterizes the rate of Bayesian simple regret assuming certain continuity conditions on the prior, revealing that the leading term of Bayesian simple regret stems from parameters where the gap between optimal and suboptimal actions is less than . The fourth paper contributes to the fixed budget best-arm identification problem for two-arm bandits with Bernoulli rewards. It demonstrates the optimality of uniform sampling, which evenly samples the arms.It proves that no algorithm can outperform uniform sampling while being at least as good as uniform sampling for some bandit instances.The fifth paper revisits the regret minimization problem in sparse stochastic contextual linear bandits. It introduces a new algorithm, the Thresholded Lasso Bandit, which estimates the linear reward function and its sparse support, and then selects an arm based on these estimations. The algorithm achieves superior regret upper bounds compared to previous algorithms and numerically outperforms them.The sixth and final paper provides a theoretical analysis of recommendation systems in an online setting under unknown user-item preference probabilities and some structures. It derives regret lower bounds based on various structural assumptions and designs optimal algorithms that achieve these bounds. The analysis reveals the relative weights of the different components of regret, providing valuable insights into the efficient algorithms for online recommendation systems.

This thesis addresses the technical challenge of structured stochastic online learning problems, providing new insights into the power and limitations of adaptivity in these problems.

Abstract [sv]

Denna avhandling bidrar till området för stokastiska online inlärningsproblem, med en samling av sex papper som var och en behandlar unika aspekter av online inlärning och inferensproblem under specifika strukturer. De första fyra pappren fokuserar på utforskning och inferensproblem, avslöjar grundläggande informationsteoretiska gränser och effektiva algoritmer under olika strukturer. De två sista pappren fokuserar på att maximera belöningar genom att effektivt utnyttja dessa strukturer.

Det första pappret behandlar det komplexa problemet att lära sig att klustra objekt baserat på binär användarfeedback för flera frågor. Det fastställer informationsteoretiska fel nedre gränser för både uniform och adaptiv urvalsstrategier under en fast budget av rundor eller användare, och föreslår en adaptiv algoritm som effektivt allokerar budgeten.Det andra pappret tar sig an utmaningen att avslöja dolda samhällen i den märkta stokastiska blockmodellen med enstaka observationer av etiketter. Det introducerar en beräkningsmässigt effektiv algoritm, Instance-Adaptive Clustering, som är den första att matcha instansspecifika nedre gränser för det förväntade antalet felklassificerade objekt.Det tredje pappret gräver djupt i problemet med bästa armidentifiering eller enkel ångerminimering inom en Bayesiansk miljö. Det tar hänsyn till en fördelning för banditproblemet och förväntan om enkel ånger med avseende på den fördelningen, vilket definierar det som Bayesiansk enkel ånger. Det karakteriserar hastigheten för Bayesiansk enkel ånger under antagande av vissa kontinuitetsvillkor på det tidigare, vilket avslöjar att den ledande termen för Bayesiansk enkel ånger kommer från parametrar där gapet mellan optimala och suboptimala handlingar är mindre än . Det fjärde pappret bidrar till det fasta budget bästa arm identifieringsproblemet för två-arm banditer med Bernoulli belöningar. Det demonstrerar optimaliteten av uniform provtagning, som jämnt provtar armarna. Det bevisar att ingen algoritm kan överträffa uniform provtagning samtidigt som den är minst lika bra som uniform provtagning för vissa banditinstanser.Det femte pappret återbesöker ångerminimeringsproblemet i glesa stokastiska kontextuella linjära banditer. Det introducerar en ny algoritm, Thresholded Lasso Bandit, som uppskattar den linjära belöningsfunktionen och dess glesa stöd, och sedan väljer en arm baserat på dessa uppskattningar. Algoritmen uppnår överlägsna ånger övre gränser jämfört med tidigare algoritmer och överträffar dem numeriskt.Det sjätte och sista pappret ger en teoretisk analys av rekommendationssystem i en online miljö under okända användarobjekt preferens sannolikheter och vissa strukturer. Det härleder ånger nedre gränser baserat på olika strukturella antaganden och utformar optimala algoritmer som uppnår dessa gränser. Analysen avslöjar de relativa vikterna av de olika komponenterna i ånger, vilket ger värdefulla insikter i effektiva algoritmer för online rekommendationssystem.

Denna avhandling behandlar den tekniska utmaningen med strukturerade stokastiska onlineinlärningsproblem, och ger nya insikter i kraften och begränsningarna av anpassningsförmåga i dessa problem.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2023. , p. viii, 37
Series
TRITA-EECS-AVL ; 2023:71
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Research subject
Electrical Engineering
Identifiers
URN: urn:nbn:se:kth:diva-338762ISBN: 978-91-8040-730-4 (print)OAI: oai:DiVA.org:kth-338762DiVA, id: diva2:1807286
Public defence
2023-11-16, F3, Lindstedtsvägen 26, Stockholm, 14:30 (English)
Opponent
Supervisors
Note

QC 20231025

Available from: 2023-10-25 Created: 2023-10-25 Last updated: 2025-12-03Bibliographically approved
List of papers
1. Optimal Clustering from Noisy Binary Feedback
Open this publication in new window or tab >>Optimal Clustering from Noisy Binary Feedback
(English)Manuscript (preprint) (Other academic)
Abstract [en]

We consider the problem of solving large-scale labeling tasks with minimal effort put on the users. Examples of such tasks include those in some of the recent CAPTCHA systems, where users clicks (binary answers) constitute the only data available to label images. Specifically, we study the generic problem of clustering a set of items from binary user feedback. Items are grouped into initially unknown non-overlapping clusters. To recover these clusters, the learner sequentially presents to users a finite list of items together with a question with a binary answer selected from a fixed finite set. For each of these items, the user provides a noisy answer whose expectation is determined by the item cluster and the question and by an item-specific parameter characterizing the {\it hardness} of classifying the item. The objective is to devise an algorithm with a minimal cluster recovery error rate. We derive problem-specific information-theoretical lower bounds on the error rate satisfied by any algorithm, for both uniform and adaptive (list, question) selection strategies. For uniform selection, we present a simple algorithm built upon the K-means algorithm and whose performance almost matches the fundamental limits. For adaptive selection, we develop an adaptive algorithm that is inspired by the derivation of the information-theoretical error lower bounds, and in turn allocates the budget in an efficient way. The algorithm learns to select items hard to cluster and relevant questions more often. We compare the performance of our algorithms with or without the adaptive selection strategy numerically and illustrate the gain achieved by being adaptive.

National Category
Other Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-290430 (URN)
Note

QC 20210222

Available from: 2021-02-18 Created: 2021-02-18 Last updated: 2023-10-25Bibliographically approved
2. Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model
Open this publication in new window or tab >>Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model
(English)Manuscript (preprint) (Other academic)
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-333936 (URN)
Note

QC 20230815

Available from: 2023-08-14 Created: 2023-08-14 Last updated: 2023-10-25Bibliographically approved
3. Rate-optimal Bayesian Simple Regret in Best Arm Identification
Open this publication in new window or tab >>Rate-optimal Bayesian Simple Regret in Best Arm Identification
(English)Manuscript (preprint) (Other academic)
Abstract [en]

We consider best arm identification in the multi-armed bandit problem. Assuming certain continuityconditions of the prior, we characterize the rate of the Bayesian simple regret. Differing from Bayesianregret minimization (Lai, 1987), the leading term in the Bayesian simple regret derives from the regionwhere the gap between optimal and suboptimal arms is smaller than √(log(T)/T). We propose a simple andeasy-to-compute algorithm with its leading term matching with the lower bound up to a constant factor;simulation results support our theoretical findings.

National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-333929 (URN)
Note

QC 20230815

Available from: 2023-08-14 Created: 2023-08-14 Last updated: 2023-10-25Bibliographically approved
4. On Uniformly Optimal Algorithms for Best Arm Identification in Two-Armed Bandits with Fixed Budget
Open this publication in new window or tab >>On Uniformly Optimal Algorithms for Best Arm Identification in Two-Armed Bandits with Fixed Budget
(English)Manuscript (preprint) (Other academic)
National Category
Other Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-338761 (URN)
Note

QC 20231025

Available from: 2023-10-25 Created: 2023-10-25 Last updated: 2024-09-11Bibliographically approved
5. Thresholded Lasso Bandit
Open this publication in new window or tab >>Thresholded Lasso Bandit
2022 (English)In: International Conference On Machine Learning, Vol 162 / [ed] Chaudhuri, K Jegelka, S Song, L Szepesvari, C Niu, G Sabato, S, ML Research Press , 2022, p. 878-928Conference paper, Published paper (Refereed)
Abstract [en]

In this paper, we revisit the regret minimization problem in sparse stochastic contextual linear bandits, where feature vectors may be of large dimension d, but where the reward function depends on a few, say s(0) << d, of these features only. We present Thresholded Lasso bandit, an algorithm that (i) estimates the vector defining the reward function as well as its sparse support, i.e., significant feature elements, using the Lasso framework with thresholding, and (ii) selects an arm greedily according to this estimate projected on its support. The algorithm does not require prior knowledge of the sparsity index s0 and can be parameter-free under some symmetric assumptions. For this simple algorithm, we establish non-asymptotic regret upper bounds scaling as O(log d+root T) in general, and as O(log d + log T) under the so-called margin condition (a probabilistic condition on the separation of the arm rewards). The regret of previous algorithms scales as O(log d+ root T log(dT)) and O(log T log d) in the two settings, respectively. Through numerical experiments, we confirm that our algorithm outperforms existing methods.

Place, publisher, year, edition, pages
ML Research Press, 2022
Series
Proceedings of Machine Learning Research, ISSN 2640-3498
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-324879 (URN)000899944900039 ()2-s2.0-85139903196 (Scopus ID)
Conference
38th International Conference on Machine Learning (ICML), JUL 17-23, 2022, Baltimore, MD
Note

QC 20250922

Available from: 2023-03-20 Created: 2023-03-20 Last updated: 2025-09-22Bibliographically approved
6. Regret in Online Recommendation Systems
Open this publication in new window or tab >>Regret in Online Recommendation Systems
2020 (English)Conference paper, Published paper (Refereed)
Abstract [en]

This paper proposes a theoretical analysis of recommendation systems in an online setting, where items are sequentially recommended to users over time. In each round, a user, randomly picked from a population of $m$ users, requests a recommendation. The decision-maker observes the user and selects an item from a catalogue of $n$ items. Importantly, an item cannot be recommended twice to the same user. The probabilities that a user likes each item are unknown. The performance of the recommendation algorithm is captured through its regret, considering as a reference an Oracle algorithm aware of these probabilities. We investigate various structural assumptions on these probabilities: we derive for each structure regret lower bounds, and devise algorithms achieving these limits. Interestingly, our analysis reveals the relative weights of the different components of regret: the component due to the constraint of not presenting the same item twice to the same user, that due to learning the chances users like items, and finally that arising when learning the underlying structure. 

National Category
Other Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-290379 (URN)2-s2.0-85100367193 (Scopus ID)
Conference
34th Conference on Neural Information Processing Systems (NeurIPS 2020)
Note

QC 20210222

Available from: 2021-02-18 Created: 2021-02-18 Last updated: 2023-10-25Bibliographically approved

Open Access in DiVA

summary(985 kB)240 downloads
File information
File name SUMMARY01.pdfFile size 985 kBChecksum SHA-512
56ca24f4226bece8bbf7db770df2949fd2e1ead1b9af9b18ddc7a8ddafaa20885168406bbc4865a347413b0d9f91c9c248f8b002c2e6a12fa742aa0bbb03657e
Type fulltextMimetype application/pdf

Authority records

Ariu, Kaito

Search in DiVA

By author/editor
Ariu, Kaito
By organisation
Decision and Control Systems (Automatic Control)
Electrical Engineering, Electronic Engineering, Information Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 0 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 1163 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf