kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Smaller generalization error derived for a deep residual neural network compared with shallow networks
King Abdullah Univ Sci & Technol KAUST, Comp Elect & Math Sci & Engn Div CEMSE, Thuwal 23955, Saudi Arabia.
KTH, School of Engineering Sciences (SCI), Mathematics (Dept.). H-Ai AB, Box 5216, S-10245 Stockholm, Sweden.ORCID iD: 0000-0001-6061-3456
Univ Delaware, Dept Math Sci, Newark, DE 19717 USA.
KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Numerical Analysis, NA.ORCID iD: 0000-0003-2669-359X
Show others and affiliations
2022 (English)In: IMA Journal of Numerical Analysis, ISSN 0272-4979, E-ISSN 1464-3642, Vol. 43, no 5, p. 2585-2632Article in journal (Refereed) Published
Abstract [en]

Estimates of the generalization error are proved for a residual neural network with L random Fourier features layers z¯+1 = ¯z + ReK k=1 b¯k eiωkz¯ + ReK k=1 c¯k eiω k·x. An optimal distribution for the frequencies (ωk, ω k) of the random Fourier features eiωkz¯ and eiω k·x is derived. This derivation is based on the corresponding generalization error for the approximation of the function values f(x). The generalization error turns out to be smaller than the estimate ˆf 2 L1(Rd) /(KL) of the generalization error for random Fourier features, with one hidden layer and the same total number of nodes KL, in the case of the L∞-norm of f is much less than the L1-norm of its Fourier transform ˆf . This understanding of an optimal distribution for random features is used to construct a new training method for a deep residual network. Promising performance of the proposed new algorithm is demonstrated in computational experiments.

Place, publisher, year, edition, pages
Oxford University Press (OUP) , 2022. Vol. 43, no 5, p. 2585-2632
Keywords [en]
residual network, deep random feature networks, supervised learning, error estimates, layer-by-layer algorithm
National Category
Computational Mathematics
Identifiers
URN: urn:nbn:se:kth:diva-336842DOI: 10.1093/imanum/drac049ISI: 000853541200001Scopus ID: 2-s2.0-85174497733OAI: oai:DiVA.org:kth-336842DiVA, id: diva2:1799091
Note

QC 20250513

Available from: 2023-09-21 Created: 2023-09-21 Last updated: 2025-05-13Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Kiessling, JonasSandberg, MattiasSzepessy, Anders

Search in DiVA

By author/editor
Kiessling, JonasSandberg, MattiasSzepessy, Anders
By organisation
Mathematics (Dept.)Numerical Analysis, NA
In the same journal
IMA Journal of Numerical Analysis
Computational Mathematics

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 56 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf