kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
An Information-Theoretic Approach to Generalization Theory
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Information Science and Engineering.ORCID iD: 0000-0002-0862-1333
2024 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

In this thesis, we investigate the in-distribution generalization of machine learning algorithms, focusing on establishing rigorous upper bounds on the generalization error. We depart from traditional complexity-based approaches by introducing and analyzing information-theoretic bounds that quantify the dependence between a learning algorithm and the training data.

We consider two categories of generalization guarantees:

  • Guarantees in expectation. These bounds measure performance in the average case. Here, the dependence between the algorithm and the data is often captured by the mutual information or other information measures based on ƒ-divergences. While these measures offer an intuitive interpretation, they might overlook the geometry of the algorithm's hypothesis class. To address this limitation, we introduce bounds using the Wasserstein distance, which incorporates geometric considerations at the cost of being mathematically more involved. Furthermore, we propose a structured, systematic method to derive bounds capturing the dependence between the algorithm and an individual datum, and between the algorithm and subsets of the training data, conditioned on knowing the rest of the data. These types of bounds provide deeper insights, as we demonstrate by applying them to derive generalization error bounds for the stochastic gradient Langevin dynamics algorithm.      
  • PAC-Bayesian guarantees. These bounds measure the performance level with high probability. Here, the dependence between the algorithm and the data is often measured by the relative entropy. We establish connections between the Seeger--Langford and Catoni's bounds, revealing that that the former is optimized by the Gibbs posterior. Additionally, we introduce novel, tighter bounds for various types of loss functions, including those with a bounded range, cumulant generating function, moment, or variance. To achieve this, we introduce a new technique to optimize parameters in probabilistic statements.

We also study the limitations of these approaches. We present a counter-example where most of the existing (relative entropy-based) information-theoretic bounds fail, and where traditional approaches do not. Finally, we explore the relationship between privacy and generalization. We show that algorithms with a bounded maximal leakage generalize. Moreover, for discrete data, we derive new bounds for differentially private algorithms that vanish as the number of samples increases, thus guaranteeing their generalization even with a constant privacy parameter. This is in contrast with previous bounds in the literature, that require the privacy parameter to decrease with the number of samples to ensure generalization.

Abstract [sv]

I denna avhandling undersöker vi maskininlärningsalgoritmers generaliseringsförmåga för likafördelad data, med fokus på att upprätta stränga övre gränser för generaliseringsfelet. Vi avviker från traditionella komplexitetsbaserade metoder genom att introducera och analysera informationsteoretiska gränser som kvantifierar beroendet mellan en inlärningsalgoritm och träningsdata.

Vi studerar två kategorier av generaliseringsgarantier:

  • Väntevärdegarantier. Dessa gränser mäter prestanda i genomsnitt. I detta fall fångas beroendet mellan algoritmen och data ofta av ömsesidig information eller andra informationsmått baserade på ƒ-divergenser. Trots att dessa mått erbjuder en intuitiv tolkning, kan de förbise geometrin hos algoritmens hypotesklass. För att hantera denna begränsning introducerar vi gränser som använder Wassersteinavstånd, vilket innehåller geometriska överväganden på bekostnad av att vara mer matematiskt invecklat. Dessutom föreslår vi en strukturerad, systematisk metod för att härleda gränser som fångar beroendet mellan algoritmen och en enskild datapunkt, samt mellan algoritmen och delmängder av träningsdata, under förutsättning att resten av datan är känd. Dessa typer av gränser ger djupare insikter, vilket vi demonstrerar genom att tillämpa dem för att härleda generaliseringsgarantier för algoritmen stokastisk gradient Langevindynamik.   
  • PAC-Bayesiska garantier. Dessa gränser mäter prestandanivån med hög sannolikhet. I detta fall fångas beroendet mellan algoritmen och data ofta av relativ entropi. Vi härleder samband mellan Seeger--Langfords- och Catonis gränser, vilket avslöjar att den förra optimeras av Gibbsfördelningen. Dessutom introducerar vi nya, starkare gränser för olika typer av kostnadsfunktioner, inklusive de med begränsad värdemängd, momentgenererande funktion, moment, eller varians. För att uppnå detta introducerar vi en ny teknik för att optimera parametrar i sannolikhetsuttryck.

Vi studerar också begränsningarna för dessa metoder. Vi presenterar ett motexempel där de flesta av de existerande (relativ entropibaserade) informationsteoretiska gränserna misslyckas, och där traditionella metoder fungerar. Slutligen utforskar vi sambandet mellan dataintegritet och generaliseringsförmåga. Vi visar att algoritmer med ett begränsad maximalt läckage generaliserar. Dessutom härleder vi för diskreta data nya gränser för algoritmer med differentiellt dataintegritet, som försvinner när antal sampel ökar, vilket garanterar deras generaliseringsförmåga även med en konstant dataintegritetsparameter. Detta står i kontrast till tidigare gränser i litteraturen, som kräver att dataintegritetsparametern minskar med antalet sampel för att säkerställa generaliseringsförmåga.

Place, publisher, year, edition, pages
KTH Royal Institute of Technology, 2024. , p. 259
Series
TRITA-EECS-AVL ; 2024:31
Keywords [en]
Generalization, Information-Theoretic Bounds
National Category
Computer Sciences
Research subject
Electrical Engineering
Identifiers
URN: urn:nbn:se:kth:diva-344833ISBN: 978-91-8040-878-3 (print)OAI: oai:DiVA.org:kth-344833DiVA, id: diva2:1848094
Public defence
2024-04-23, F3, Lindstedtsvägen 26, Stockholm, 13:00 (English)
Opponent
Supervisors
Funder
Swedish Research Council, 2019-03606
Note

QC 20240402

Available from: 2024-04-02 Created: 2024-04-02 Last updated: 2024-04-15Bibliographically approved
List of papers
1. On Random Subset Generalization Error Bounds and the Stochastic Gradient Langevin Dynamics Algorithm
Open this publication in new window or tab >>On Random Subset Generalization Error Bounds and the Stochastic Gradient Langevin Dynamics Algorithm
2021 (English)In: 2020 IEEE Information Theory Workshop (ITW), Institute of Electrical and Electronics Engineers (IEEE) , 2021Conference paper, Published paper (Refereed)
Abstract [en]

In this work, we unify several expected generalization error bounds based on random subsets using the framework developed by Hellstrom and Durisi [1]. First, we recover the bounds based on the individual sample mutual information from Bu et al. [2] and on a random subset of the dataset from Negrea et al. [3]. Then, we introduce their new, analogous bounds in the randomized subsample setting from Steinke and Zakynthinou [4], and we identify some limitations of the framework. Finally, we extend the bounds from Haghifam et al. [5] for Langevin dynamics to stochastic gradient Langevin dynamics and we refine them for loss functions with potentially large gradient norms.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2021
National Category
Probability Theory and Statistics
Identifiers
urn:nbn:se:kth:diva-305533 (URN)10.1109/ITW46852.2021.9457578 (DOI)000713953900009 ()2-s2.0-85112729675 (Scopus ID)
Conference
IEEE Information Theory Workshop (ITW), APR 11-15, 2021, ELECTR NETWORK
Note

QC 20211215 

conference ISBN 978-1-7281-5962-1

Available from: 2021-12-15 Created: 2021-12-15 Last updated: 2024-04-02Bibliographically approved
2. Upper Bounds on the Generalization Error of Private Algorithms for Discrete Data
Open this publication in new window or tab >>Upper Bounds on the Generalization Error of Private Algorithms for Discrete Data
2021 (English)In: IEEE Transactions on Information Theory, ISSN 0018-9448, E-ISSN 1557-9654, Vol. 67, no 11, p. 7362-7379Article in journal (Refereed) Published
Abstract [en]

In this work, we study the generalization capability of algorithms from an information-theoretic perspective. It has been shown that the expected generalization error of an algorithm is bounded from above by a function of the relative entropy between the conditional probability distribution of the algorithm's output hypothesis, given the dataset with which it was trained, and its marginal probability distribution. We build upon this fact and introduce a mathematical formulation to obtain upper bounds on this relative entropy. Assuming that the data is discrete, we then develop a strategy using this formulation, based on the method of types and typicality, to find explicit upper bounds on the generalization error of stable algorithms, i.e., algorithms that produce similar output hypotheses given similar input datasets. In particular, we show the bounds obtained with this strategy for the case of epsilon-DP and mu-GDP algorithms.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2021
Keywords
Generalization, differential privacy, mutual information, mixture approximation
National Category
Telecommunications Control Engineering Signal Processing
Identifiers
urn:nbn:se:kth:diva-304555 (URN)10.1109/TIT.2021.3111480 (DOI)000709078200028 ()2-s2.0-85114716254 (Scopus ID)
Note

QC 20211108

Available from: 2021-11-08 Created: 2021-11-08 Last updated: 2024-04-02Bibliographically approved
3. Tighter Expected Generalization Error Bounds via Wasserstein Distance
Open this publication in new window or tab >>Tighter Expected Generalization Error Bounds via Wasserstein Distance
2021 (English)In: Advances in Neural Information Processing Systems, Neural information processing systems foundation , 2021, p. 19109-19121Conference paper, Published paper (Refereed)
Abstract [en]

This work presents several expected generalization error bounds based on the Wasserstein distance. More specifically, it introduces full-dataset, single-letter, and random-subset bounds, and their analogues in the randomized subsample setting from Steinke and Zakynthinou [1]. Moreover, when the loss function is bounded and the geometry of the space is ignored by the choice of the metric in the Wasserstein distance, these bounds recover from below (and thus, are tighter than) current bounds based on the relative entropy. In particular, they generate new, non-vacuous bounds based on the relative entropy. Therefore, these results can be seen as a bridge between works that account for the geometry of the hypothesis space and those based on the relative entropy, which is agnostic to such geometry. Furthermore, it is shown how to produce various new bounds based on different information measures (e.g., the lautum information or several f-divergences) based on these bounds and how to derive similar bounds with respect to the backward channel using the presented proof techniques. 

Place, publisher, year, edition, pages
Neural information processing systems foundation, 2021
Keywords
Entropy, Error analysis, 'current, Generalization error bounds, Hypothesis space, Information measures, Loss functions, Random subsets, Relative entropy, Wasserstein distance, Geometry
National Category
Discrete Mathematics
Identifiers
urn:nbn:se:kth:diva-316284 (URN)000925183302038 ()2-s2.0-85125691301 (Scopus ID)
Conference
35th Conference on Neural Information Processing Systems, NeurIPS 2021, 6 December - 14 December, 2021, Virtual/Online
Note

Part of proceedings: ISBN 978-1-7138-4539-3

QC 20230921

Available from: 2022-08-15 Created: 2022-08-15 Last updated: 2024-04-02Bibliographically approved
4. Limitations of information: theoretic generalization bounds for gradient descent methods in stochastic convex optimization
Open this publication in new window or tab >>Limitations of information: theoretic generalization bounds for gradient descent methods in stochastic convex optimization
Show others...
2023 (English)In: Proceedings of ALT 2023 / [ed] Shipra Agrawal, Francesco Orabona, ML Research Press , 2023, p. 663-706Conference paper, Published paper (Refereed)
Abstract [en]

To date, no “information-theoretic” frameworks for reasoning about generalization error have been shown to establish minimax rates for gradient descent in the setting of stochastic convex optimization. In this work, we consider the prospect of establishing such rates via several existing information-theoretic frameworks: input-output mutual information bounds, conditional mutual information bounds and variants, PAC-Bayes bounds, and recent conditional variants thereof. We prove that none of these bounds are able to establish minimax rates. We then consider a common tactic employed in studying gradient methods, whereby the final iterate is corrupted by Gaussian noise, producing a noisy “surrogate” algorithm. We prove that minimax rates cannot be established via the analysis of such surrogates. Our results suggest that new ideas are required to analyze gradient descent using information-theoretic techniques. 

Place, publisher, year, edition, pages
ML Research Press, 2023
Series
Proceedings of Machine Learning Research, ISSN 2640-3498 ; 201
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-328375 (URN)001227262400022 ()2-s2.0-85161238002 (Scopus ID)
Conference
34th International Conference on Algorithmic Learning Theory, ALT 2023, Singapore, 20 - 23 February 2023
Note

QC 20231204

Available from: 2023-06-08 Created: 2023-06-08 Last updated: 2024-07-16Bibliographically approved
5. More PAC-Bayes bounds: From bounded losses, to losses with general tail behaviors, to anytime validity
Open this publication in new window or tab >>More PAC-Bayes bounds: From bounded losses, to losses with general tail behaviors, to anytime validity
(English)Manuscript (preprint) (Other academic)
Abstract [en]

In this paper, we present new high-probability PAC-Bayes bounds for different types of losses.  Firstly, for losses with a bounded range, we recover a strengthened version of Catoni's bound that holds uniformly for all parameter values. This leads to new fast-rate and mixed-rate bounds that are interpretable and tighter than previous bounds in the literature. In particular, the fast-rate bound is equivalent to the Seeger--Langford bound. Secondly, for losses with more general tail behaviors, we introduce two new parameter-free bounds: a PAC-Bayes Chernoff analogue when the loss' cumulative generating function is bounded, and a bound when the loss' second moment is bounded. These two bounds are obtained using a new technique based on a discretization of the space of possible events for the "in probability" parameter optimization problem. This technique is both simpler and more general than previous approaches optimizing over a grid on the parameters' space. Finally, using a simple technique that is applicable to any existing bound, we extend all previous results to anytime-valid bounds.

Keywords
Generalization bounds, PAC-Bayes bounds, concentration inequalities, rate of convergence (fast, slow, mixed), tail behavior, parameter optimization
National Category
Other Electrical Engineering, Electronic Engineering, Information Engineering
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-344903 (URN)10.48550/arXiv.2306.12214 (DOI)
Funder
Swedish Research Council, 2019-03606
Note

QC 20240403

Available from: 2024-04-02 Created: 2024-04-02 Last updated: 2024-04-15Bibliographically approved
6. A note on generalization bounds for losses with finite moments
Open this publication in new window or tab >>A note on generalization bounds for losses with finite moments
(English)Manuscript (preprint) (Other academic)
Abstract [en]

This paper studies the truncation method from [1] to derive high-probability PAC-Bayes bounds for unbounded losses with heavy tails. Assuming that the p-th moment is bounded, the resulting bounds interpolate between a slow rate 1/√n when p=2, and a fast rate 1/n when p→∞ and the loss is essentially bounded. Moreover, the paper derives a high-probability PAC-Bayes bound for losses with a bounded variance. This bound has an exponentially better dependence on the confidence parameter and the dependency measure than previous bounds in the literature. Finally, the paper extends all results to guarantees in expectation and single-draw PAC-Bayes. In order to so, it obtains analogues of the PAC-Bayes fast rate bound for bounded losses from [2] in these settings.

National Category
Probability Theory and Statistics
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-344900 (URN)10.48550/arXiv.2403.16681 (DOI)
Funder
Swedish Research Council, 2019-03606
Note

QC 20240403

Available from: 2024-04-02 Created: 2024-04-02 Last updated: 2024-04-15Bibliographically approved

Open Access in DiVA

Kappa(2746 kB)704 downloads
File information
File name FULLTEXT04.pdfFile size 2746 kBChecksum SHA-512
033adddc4ed7f6ee9aa227a3ca2c141c324d72901384697dddd58962aafae3b33336c84699246bd5ceadf4a63b61e1fc4dcba79185d4a1e9c410b30c75b95d4b
Type fulltextMimetype application/pdf

Authority records

Rodríguez Gálvez, Borja

Search in DiVA

By author/editor
Rodríguez Gálvez, Borja
By organisation
Information Science and Engineering
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 1007 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 3059 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf