kth.sePublications KTH
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Deep Double Descent via Smooth Interpolation
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Robotics, Perception and Learning, RPL.ORCID iD: 0000-0002-0242-4419
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Robotics, Perception and Learning, RPL.ORCID iD: 0000-0003-4535-2520
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Robotics, Perception and Learning, RPL.ORCID iD: 0000-0003-0579-3372
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Robotics, Perception and Learning, RPL.ORCID iD: 0000-0001-5211-6388
2023 (English)In: Transactions on Machine Learning Research, E-ISSN 2835-8856, Vol. 2023, no 4Article in journal (Refereed) Published
Abstract [en]

The ability of overparameterized deep networks to interpolate noisy data, while at the same time showing good generalization performance, has been recently characterized in terms of the double descent curve for the test error. Common intuition from polynomial regression suggests that overparameterized networks are able to sharply interpolate noisy data, without considerably deviating from the ground-truth signal, thus preserving generalization ability. At present, a precise characterization of the relationship between interpolation and generalization for deep networks is missing. In this work, we quantify sharpness of fit of the training data interpolated by neural network functions, by studying the loss landscape w.r.t. to the input variable locally to each training point, over volumes around cleanly- and noisily-labelled training samples, as we systematically increase the number of model parameters and training epochs. Our findings show that loss sharpness in the input space follows both model- and epoch-wise double descent, with worse peaks observed around noisy labels. While small interpolating models sharply fit both clean and noisy data, large interpolating models express a smooth loss landscape, where noisy targets are predicted over large volumes around training data points, in contrast to existing intuition.

Place, publisher, year, edition, pages
Transactions on Machine Learning Research (TMLR) , 2023. Vol. 2023, no 4
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:kth:diva-346450Scopus ID: 2-s2.0-86000152632OAI: oai:DiVA.org:kth-346450DiVA, id: diva2:1857952
Note

QC 20250320

Available from: 2024-05-15 Created: 2024-05-15 Last updated: 2025-03-20Bibliographically approved
In thesis
1. On Label Noise in Image Classification: An Aleatoric Uncertainty Perspective
Open this publication in new window or tab >>On Label Noise in Image Classification: An Aleatoric Uncertainty Perspective
2024 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Deep neural networks and large-scale datasets have revolutionized the field of machine learning. However, these large networks are susceptible to overfitting to label noise, resulting in generalization degradation. In response, the thesis closely examines the problem both from an empirical and theoretical perspective. We empirically analyse the input smoothness of networks as they overfit to label noise, and we theoretically explore the connection to aleatoric uncertainty. These analyses improve our understanding of the problem and have led to our novel methods aimed at enhancing robustness against label noise in classification.

Abstract [sv]

Djupa neurala nätverk och storskaliga dataset har revolutionerat maskininlärningsområdet. Dock är dessa stora nätverk känsliga för överanpassning till felmarkerade etiketter, vilket leder till försämrad generalisering. Som svar på detta undersöker avhandlingen noggrant problemet både från en empirisk och teoretisk synvinkel. Vi analyserar empiriskt nätverkens känslighet försmå ändringar i indatan när de över anpassar till felmarkerade etiketter, och vi utforskar teoretiskt kopplingen till aleatorisk osäkerhet. Dessa analyser förbättrar vår förståelse av problemet och har lett till våra nya metoder med syfte att vara robusta mot felmarkerade etiketter i klassificering.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2024. p. xi, 68
Series
TRITA-EECS-AVL ; 2024:45
Keywords
Label noise, aleatoric uncertainty, noisy labels, robustness, etikettbrus, osäkerhet, felmarkerade etiketter, robusthet
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-346453 (URN)978-91-8040-925-4 (ISBN)
Public defence
2024-06-03, https://kth-se.zoom.us/w/61097277235, F3 (Flodis), Lindstedsvägen 26 & 28, Stockholm, 09:00 (English)
Opponent
Supervisors
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

QC 20240516

Available from: 2024-05-16 Created: 2024-05-16 Last updated: 2025-12-03Bibliographically approved
2. On Implicit Smoothness Regularization in Deep Learning
Open this publication in new window or tab >>On Implicit Smoothness Regularization in Deep Learning
2024 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

State of the art neural networks provide a rich class of function approximators,fueling the remarkable success of gradient-based deep learning on complex high-dimensional problems, ranging from natural language modeling to imageand video generation and understanding. Modern deep networks enjoy sufficient expressive power to shatter common classification benchmarks, as wellas interpolate noisy regression targets. At the same time, the same models areable to generalize well whilst perfectly fitting noisy training data, even in the absence of external regularization constraining model expressivity. Efforts towards making sense of the observed benign overfitting behaviour uncovered its occurrence in overparameterized linear regression as well as kernel regression,extending classical empirical risk minimization to the study of minimum norm interpolators. Existing theoretical understanding of the phenomenon identi-fies two key factors affecting the generalization ability of interpolating models.First, overparameterization – corresponding to the regime in which a model counts more parameters than the number of constraints imposed by the train-ing sample – effectively reduces model variance in proximity of the training data. Second, the structure of the learner – which determines how patterns in the training data are encoded in the learned representation – controls the ability to separate signal from noise when attaining interpolation. Analyzingthe above factors for deep finite-width networks respectively entails characterizing the mechanisms driving feature learning and norm-based capacity control in practical settings, thus posing a challenging open problem. The present thesis explores the problem of capturing effective complexity of finite-width deep networks trained in practice, through the lens of model function geometry, focusing on factors implicitly restricting model complexity. First,model expressivity is contrasted to effective nonlinearity for models undergoing double descent, highlighting constrained effective complexity afforded by over parameterization. Second, the geometry of interpolation is studied in the presence of noisy targets, observing robust interpolation over volumesof size controlled by model scale. Third, the observed behavior is formally tied to parameter-space curvature, connecting parameter-space geometry tothe input space’s. Finally, the thesis concludes by investigating whether the findings translate to the context of self-supervised learning, relating the geometry of representations to downstream robustness, and highlighting trends in keeping with neural scaling laws. The present work isolates input-space smoothness as a key notion for characterizing effective complexity of model functions expressed by overparameterized deep networks.

Abstract [sv]

Toppmoderna neurala nätverk erbjuder en rik klass funktionsapproximatorer,vilket stimulerar den anmärkningsvärda utvecklingen av gradientbaserad djupinlärning för komplexa högdimensionella problem, allt från modellering avnaturligt språk till bild- och videogenerering och förståelse. Moderna djupanätverk har tillräckligt mycket expressiv kraft för att kunna slå vanliga klassificeringsbenchmarks, samt interpolera brusiga regressionsmål. Samma modeller kan generalisera väl samtidigt som de kan anpassas perfekt till brusigträningsdata, även i frånvaro av extern regularisering som begränsar modellens uttrycksförmåga. Ansträngningar för att förstå det observerade så kallade benign overfitting-beteendet har påvisat dess förekomst i överparameteriserad linjär regression såväl som i kärnbaserad regression, vilket utvidgar klassisk empirisk riskminimering till studiet av miniminorm interpolatorer. Befintlig teoretisk förståelse av fenomenet identifierar två nyckelfaktorer som påverkargeneraliseringsförmågan hos interpolerande modeller. För det första reducerar överparameterisering - motsvarande regimen där en modell har fler paramet-rar än antalet villkor som ställs av träningsproven - effektivt modellvarianseni närheten av träningsdatan. För det andra styr inlärningens struktur - som bestämmer hur mönster i träningsdata kodas i den inlärda representationen- förmågan att separera signal från brus när interpolering uppnås. Att analysera ovanstående faktorer för nätverk med djup ändlig bredd innebär att karakterisera de mekanismer som driver funktionsinlärning och normbaserad kapacitetskontroll i praktiska sammanhang, vilket utgör ett utmanande öppet problem. Den föreliggande avhandlingen utforskar problemet med att fånga den effektiva komplexiteten hos djupa nätverk med ändlig bredd som tränas i praktiken, sett genom linsen av modellfunktionens geometri, med fokus på faktorer som implicit begränsar modellens komplexitet. För det första kontrasteras modellexpressivitet till effektiv olinjäritet för modeller som genomgår så kallad double descent, vilket framhäver begränsad effektiv komplexitet som ges av överparameterisering. För det andra studeras interpolationens geometri i närvaro av brusiga mål, och observerar robust interpolation över volymer av storlekar bestämda av modellskalan. För det tredje kopplas det observerade beteendet formellt till parameter-rymdens krökning, vilket kopplar parameterrymdens geometri till in datarymdens. Slutligen avslutas avhand-lingen med att undersöka huruvida resultaten kan översättas till kontexten av självövervakad inlärning, relaterar representationernas geometri till nedströms robusthet, och belyser trender i linje med neurala skalningslagar. Det föreliggande arbetet isolerar indatarymdens jämnhet som ett nyckelbegrepp för att karakterisera effektiv komplexitet hos modellfunktioner uttryckta av överparameteriserade djupa nätverk.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2024. p. 94
Series
TRITA-EECS-AVL ; 2024:80
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-354917 (URN)978-91-8106-077-5 (ISBN)
Public defence
2024-11-07, https://kth-se.zoom.us/j/62717697317, Kollegiesalen, Brinellvägen 6, Stockholm, 15:00 (English)
Opponent
Supervisors
Note

QC 20241017

Available from: 2024-10-17 Created: 2024-10-17 Last updated: 2025-12-03Bibliographically approved

Open Access in DiVA

fulltext(11263 kB)206 downloads
File information
File name FULLTEXT01.pdfFile size 11263 kBChecksum SHA-512
bc15c622dc504c0315f05887c87bdf9c58cc11f833b43b62a9e1072c1c9ee282a271d28e1a66b639a21a6d34c3491f984add4488a1f3256b0a5a76c7ba1d56af
Type fulltextMimetype application/pdf

Other links

ScopusPaperCode

Authority records

Gamba, MatteoEnglesson, ErikBjörkman, MårtenAzizpour, Hossein

Search in DiVA

By author/editor
Gamba, MatteoEnglesson, ErikBjörkman, MårtenAzizpour, Hossein
By organisation
Robotics, Perception and Learning, RPL
In the same journal
Transactions on Machine Learning Research
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 206 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 533 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf