kth.sePublikationer KTH
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Neural Network Architecture Design: Towards Low-complexity and Scalable Solutions
KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Teknisk informationsvetenskap. (Saikat's Chattejee's research group)ORCID-id: 0000-0002-8534-7622
2021 (Engelska)Doktorsavhandling, monografi (Övrigt vetenskapligt)
Abstract [en]

 Over the past few years, deep neural networks have been at the center of attention in machine learning literature thanks to the advances in computational capabilities of modern graphical processing units (GPUs). This progress has made it possible to train large scale neural networks by using thousands, and even millions, of training samples to achieve outstanding estimation accuracy in various applications that were not simply possible before. Besides, the lack of a coherent understanding of neural networks theory has shifted the focus of current machine learning researches from a theoretical view to experimental studies by using clusters of GPU. Therefore, the current deep learning literature is still a novice when it encounters real-world scenarios where the number of training samples is small or the computational resources are limited. In this thesis, we focus on developing new neural network architectures while taking such practical constraints into account. 

 First, we propose a layer-wise training approach for multilayer neural networks that can guarantee a reduction of the training loss as the network gets deeper. While being computationally efficient, this approach provides us with an estimation of the appropriate size of the network, i.e., the number of neurons and layers. The proposed approach also enjoys a scalable training algorithm, making it attractive for distributed learning scenarios over a network of agents. Second, we focus on designing a deep neural network architecture to handle small data learning regimes, where the number of training samples is limited. To this end, we combine kernel methods and densely connected networks and show its classification capabilities in few-shot learning scenarios. Due to the use of kernel representation, the proposed approach is capable of handling large dimensional samples and feature vectors since the complexity of the training algorithm is mainly determined by the number of samples rather than their dimensions. And third, we solely focus on designing a deep neural network architecture with very-low computational requirements, making it suitable for power-limited applications such as learning on the edge devices. In particular, we use a combination of random weights and ReLU activation functions to achieve an accurate estimation as the network gets deeper. 

 In the next part of the thesis, we present some applications of the proposed architectures and show how they can contribute to the current machine learning literature. First, we give an example of how we can incorporate incremental learning setup into an adaptive size multilayer neural network by using our proposed network. Then, webring new insight from an information-theoretic point of view on the signal flow of a multilayer neural network. We also show examples of how it is possible to use our techniques to improve the performance of state-of-the-art deep networks. And finally, we briefly show the favorable characteristics of our training algorithms that make them suitable for a variety of distributed learning scenarios over a network. 

Ort, förlag, år, upplaga, sidor
Sweden: KTH Royal Institute of Technology, 2021. , s. 125
Serie
TRITA-EECS-AVL ; 2021:10
Nationell ämneskategori
Signalbehandling
Forskningsämne
Elektro- och systemteknik
Identifikatorer
URN: urn:nbn:se:kth:diva-289462ISBN: 978-91-7873-773-4 (tryckt)OAI: oai:DiVA.org:kth-289462DiVA, id: diva2:1524368
Disputation
2021-02-22, F3, Lindstedtsvägen 26, Stockholm, 13:00 (Engelska)
Opponent
Handledare
Anmärkning

QC 20210209

Tillgänglig från: 2021-02-09 Skapad: 2021-02-01 Senast uppdaterad: 2022-06-25Bibliografiskt granskad

Open Access i DiVA

Alireza M. Javid(17545 kB)4112 nedladdningar
Filinformation
Filnamn FULLTEXT01.pdfFilstorlek 17545 kBChecksumma SHA-512
9bf4b8006842f18d72ba04a30555adb5821cc9ce51284c4781b7519ecd5f77adc477e96ce6a16c410e8e27cbdda762a26a6ad7c1f279ec7ee95186c21285f10a
Typ fulltextMimetyp application/pdf

Övriga länkar

zoom link for online defense

Sök vidare i DiVA

Av författaren/redaktören
M. Javid, Alireza
Av organisationen
Teknisk informationsvetenskap
Signalbehandling

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 4113 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

isbn
urn-nbn

Altmetricpoäng

isbn
urn-nbn
Totalt: 2135 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf