Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Neural Network Architecture Design: Towards Low-complexity and Scalable Solutions
KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Teknisk informationsvetenskap. (Saikat's Chattejee's research group)ORCID-id: 0000-0002-8534-7622
2021 (engelsk)Doktoravhandling, monografi (Annet vitenskapelig)
Abstract [en]

 Over the past few years, deep neural networks have been at the center of attention in machine learning literature thanks to the advances in computational capabilities of modern graphical processing units (GPUs). This progress has made it possible to train large scale neural networks by using thousands, and even millions, of training samples to achieve outstanding estimation accuracy in various applications that were not simply possible before. Besides, the lack of a coherent understanding of neural networks theory has shifted the focus of current machine learning researches from a theoretical view to experimental studies by using clusters of GPU. Therefore, the current deep learning literature is still a novice when it encounters real-world scenarios where the number of training samples is small or the computational resources are limited. In this thesis, we focus on developing new neural network architectures while taking such practical constraints into account. 

 First, we propose a layer-wise training approach for multilayer neural networks that can guarantee a reduction of the training loss as the network gets deeper. While being computationally efficient, this approach provides us with an estimation of the appropriate size of the network, i.e., the number of neurons and layers. The proposed approach also enjoys a scalable training algorithm, making it attractive for distributed learning scenarios over a network of agents. Second, we focus on designing a deep neural network architecture to handle small data learning regimes, where the number of training samples is limited. To this end, we combine kernel methods and densely connected networks and show its classification capabilities in few-shot learning scenarios. Due to the use of kernel representation, the proposed approach is capable of handling large dimensional samples and feature vectors since the complexity of the training algorithm is mainly determined by the number of samples rather than their dimensions. And third, we solely focus on designing a deep neural network architecture with very-low computational requirements, making it suitable for power-limited applications such as learning on the edge devices. In particular, we use a combination of random weights and ReLU activation functions to achieve an accurate estimation as the network gets deeper. 

 In the next part of the thesis, we present some applications of the proposed architectures and show how they can contribute to the current machine learning literature. First, we give an example of how we can incorporate incremental learning setup into an adaptive size multilayer neural network by using our proposed network. Then, webring new insight from an information-theoretic point of view on the signal flow of a multilayer neural network. We also show examples of how it is possible to use our techniques to improve the performance of state-of-the-art deep networks. And finally, we briefly show the favorable characteristics of our training algorithms that make them suitable for a variety of distributed learning scenarios over a network. 

sted, utgiver, år, opplag, sider
Sweden: KTH Royal Institute of Technology, 2021. , s. 125
Serie
TRITA-EECS-AVL ; 2021:10
HSV kategori
Forskningsprogram
Elektro- och systemteknik
Identifikatorer
URN: urn:nbn:se:kth:diva-289462ISBN: 978-91-7873-773-4 (tryckt)OAI: oai:DiVA.org:kth-289462DiVA, id: diva2:1524368
Disputas
2021-02-22, F3, Lindstedtsvägen 26, Stockholm, 13:00 (engelsk)
Opponent
Veileder
Merknad

QC 20210209

Tilgjengelig fra: 2021-02-09 Laget: 2021-02-01 Sist oppdatert: 2022-06-25bibliografisk kontrollert

Open Access i DiVA

Alireza M. Javid(17545 kB)4115 nedlastinger
Filinformasjon
Fil FULLTEXT01.pdfFilstørrelse 17545 kBChecksum SHA-512
9bf4b8006842f18d72ba04a30555adb5821cc9ce51284c4781b7519ecd5f77adc477e96ce6a16c410e8e27cbdda762a26a6ad7c1f279ec7ee95186c21285f10a
Type fulltextMimetype application/pdf

Andre lenker

zoom link for online defense

Søk i DiVA

Av forfatter/redaktør
M. Javid, Alireza
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar
Totalt: 4116 nedlastinger
Antall nedlastinger er summen av alle nedlastinger av alle fulltekster. Det kan for eksempel være tidligere versjoner som er ikke lenger tilgjengelige

isbn
urn-nbn

Altmetric

isbn
urn-nbn
Totalt: 2142 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf