kth.sePublications KTH
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Deep Neural Network Weight Initialization from Hyperparameter Tuning Trials
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.ORCID iD: 0000-0001-7236-4637
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.ORCID iD: 0000-0003-0422-6560
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.ORCID iD: 0000-0002-2748-8929
Hopsworks AB.ORCID iD: 0000-0002-9484-6714
Show others and affiliations
2024 (English)In: Neural Information Processing, Springer Nature , 2024Conference paper, Published paper (Refereed)
Abstract [en]

Training of deep neural networks from scratch requires initialization of the neural network weights as a first step. Over the years, many policies and techniques for weight initialization have been proposed and widely used, including Kaiming initialization and different variants of random initialization. On the other hand, another requirement for starting the training stage is to choose and set suitable hyperparameter values, which are usually obtained by performing several hyperparameter tuning trials. In this paper, we study the suitability of weight initialization using weights obtained from different epochs of hyperparameter tuning trials and compare it to Kaiming uniform (random) weight initialization for image classification tasks. Based on an experimental evaluation using ResNet-18, ResNet-152, and InceptionV3 models, and CIFAR-10, CIFAR-100, Tiny ImageNet, and Food-101 datasets, we show that weight initialization from hyperparameter tuning trials can speed up the training of deep neural networks by up to 2x while maintaining or improving the best test accuracy of the trained models, when compared to random initialization.

Place, publisher, year, edition, pages
Springer Nature , 2024.
Keywords [en]
weight initialization, deep neural network training, hyperparameter tuning, model training, deep learning
National Category
Computer Sciences Artificial Intelligence
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-358848DOI: 10.1007/978-981-96-6954-7_5OAI: oai:DiVA.org:kth-358848DiVA, id: diva2:1941547
Conference
ICONIP: International Conference on Neural Information Processing, December 2-6, Auckland, New Zeeland
Note

QC 20250303

Available from: 2025-02-28 Created: 2025-02-28 Last updated: 2025-07-01Bibliographically approved
In thesis
1. Tools and Methods for Distributed and Large-Scale Training of Deep Neural Networks
Open this publication in new window or tab >>Tools and Methods for Distributed and Large-Scale Training of Deep Neural Networks
2025 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Deep Neural Networks (DNNs) have been at the forefront of recent breakthroughs in Machine Learning (ML) and Deep Learning (DL). DNNs are increasingly used in various tasks, from Earth observation and analysis of satellite images to medical diagnosis and smart chatbots. A major contributor to these advances has been the abundance of training data, computation resources, and frameworks that enable efficient training of ever-larger and more complex DNNs, within a paradigm referred to as distributed DL, and in particular, distributed training, which is the focus of this doctoral dissertation. In distributed training, the data and computation are distributed across several workers as opposed to single-host training in which both the data and computation reside and happen on a single worker. In this setting, distributed training can help overcome the limitations of single-host training, such as memory constraints, computational bottlenecks, and data availability.

However, distributed training comes with a number of challenges that need to be carefully addressed in order to have a system that efficiently makes use of it. These challenges include, but are not limited to, efficient distribution of computation and data across the workers, the presence of straggler workers in a cluster (workers that get significantly behind in their computation step compared to the other workers), especially in synchronous execution settings, and communication and synchronization among the workers. This implies that the system should provide scalability in both the computation and the data dimensions.

On the other hand, from a programming and usability point of view, using the distributed training paradigm typically requires knowledge of distributed computing principles and experience with distributed and data-intensive computing frameworks as well as applying major changes to the code used for single-host training. Furthermore, as training a DNN involves several steps and stages (e.g., data preparation, hyperparameter tuning, model training, etc.), it would be desirable to possibly reuse the computational results of different steps in each other (e.g., reusing weights learned during hyperparameter tuning trials, for weight initialization of the model training step) in order to improve training time. Finally, when developing larger and more complex DNNs, we also need to know about each design choice's contributions.

The contributions of this doctoral dissertation address the aforementioned challenges, and collectively optimize large-scale DNN training, making it more accessible, efficient, and computationally sustainable while reducing the redundancy in ML/DL workflows, and providing usable tools for conducting ablation studies. 

Abstract [sv]

Deepa neurala nätverk (DNNs) har varit i framkant av de senaste genombrotten inom maskininlärning (ML) och djupinlärning (DL). DNN används i allt större utsträckning inom en rad olika områden, från jordobservation och analys av satellitbilder till medicinsk diagnostik och smarta chattbotar. En stor bidragande faktor till dessa framsteg är tillgången på stora mängder träningsdata, kraftfulla beräkningsresurser och ramverk som möjliggör effektiv träning av allt större och mer komplexa DNNs inom ett paradigm som kallas distribuerad DL. Inom detta område är distribuerad träning fokus för denna doctoral dissertation. I distribuerad träning fördelas data och beräkningar över flera arbetarnoder, till skillnad från träning på en enskild värd där både data och beräkningar hanteras av en enda nod. I denna kontext kan distribuerad träning bidra till att övervinna begränsningar såsom minnesbegränsningar, beräkningsflaskhalsar och begränsad datatillgång.

Distribuerad träning innebär dock flera utmaningar som måste hanteras noggrant för att säkerställa effektiv resursanvändning. Dessa utmaningar inkluderar, men är inte begränsade till, effektiv fördelning av beräkningar och data mellan noder, förekomsten av stragglers (arbetarnoder som hamnar efter i sina beräkningar jämfört med andra), särskilt i synkrona exekveringsmiljöer, samt kommunikation och synkronisering mellan noderna. För att systemet ska vara skalbart behöver det kunna hantera både ökande beräkningsbehov och större datamängder.

Ur ett programmerings- och användbarhetsperspektiv kräver distribuerad träning ofta djupgående kunskap om distribuerad beräkning och erfarenhet av dataintensiva ramverk. Dessutom innebär det ofta omfattande anpassningar av kod som används för träning på en enskild värd. Eftersom träning av en DNN innefattar flera steg och faser (t.ex. datapreparering, hyperparametertuning, modellträning etc.), vore det önskvärt att återanvända beräkningsresultat från olika steg (t.ex. vikter inlärda under hyperparametertuning för att initialisera modellträningen) för att förbättra träningseffektiviteten. Slutligen, vid utveckling av större och mer komplexa DNNs, är det också viktigt att förstå varje designvals inverkan.

Denna doctoral dissertation adresserar de ovan nämnda utmaningarna och optimerar storskalig DNN-träning genom att göra den mer tillgänglig, effektiv och beräkningsmässigt hållbar, samtidigt som redundansen i ML/DL-arbetsflöden minskas och användbara verktyg för ablationsstudier tillhandahålls.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2025. p. 47
Series
TRITA-EECS-AVL ; 2025:28
Keywords
Distributed Deep Learning, Ablation Studies, Data-parallel Training, Deep Neural Networks, Systems for Machine Learning, Weight Initialization, Hyperparameter Optimization, Distribuerad djupinlärning, Ablationsstudier, Dataparallell träning, Djupa neurala nätverk, System för maskininlärning, Viktinitialisering, Hyperparameteroptimering
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-360720 (URN)978-91-8106-214-4 (ISBN)
Public defence
2025-03-27, Zoom: https://kth-se.zoom.us/j/69403203069, Sal-A, Electrum, Kistagången 16, Stockholm, 09:00 (English)
Opponent
Supervisors
Funder
EU, Horizon 2020, 825258Vinnova, 2016–05193
Note

QC 20250304

Available from: 2025-03-04 Created: 2025-03-04 Last updated: 2025-03-10Bibliographically approved

Open Access in DiVA

fulltext(549 kB)68 downloads
File information
File name FULLTEXT01.pdfFile size 549 kBChecksum SHA-512
e55009230e672795f3fea0ea5fc637415b51c812f9439d9fbda1d927fa155364f2296d0262b6bda6115f4fdeea7ea339ac07d7c4baa7fc5837962a8ccc3e54f3
Type fulltextMimetype application/pdf

Other links

Publisher's full textConference

Authority records

Sheikholeslami, SinaWang, TianzePayberah, Amir H.Dowling, JimVlassov, Vladimir

Search in DiVA

By author/editor
Sheikholeslami, SinaWang, TianzePayberah, Amir H.Dowling, JimVlassov, Vladimir
By organisation
Software and Computer systems, SCS
Computer SciencesArtificial Intelligence

Search outside of DiVA

GoogleGoogle Scholar
Total: 68 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 266 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf