kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Improved highway network block for training very deep neural networks
Interdisciplinary Centre for Security, Reliability and Trust (SnT), University of Luxembourg, Luxembourg City, 1855, Luxembourg. (Signal Processing)ORCID iD: 0000-0003-2298-6774
2020 (English)In: IEEE Access, E-ISSN 2169-3536, Vol. 8, p. 176758-176773Article in journal (Refereed) Published
Abstract [en]

Very deep networks are successful in various tasks with reported results surpassing human performance. However, training such very deep networks is not trivial. Typically, the problems of learning the identity function and feature reuse can work together to plague optimization of very deep networks. In this paper, we propose a highway network with gate constraints that addresses the aforementioned problems, and thus alleviates the difficulty of training. Namely, we propose two variants of highway network, HWGC and HWCC, employing feature summation and concatenation respectively. The proposed highway networks, besides being more computationally efficient, are shown to have more interesting learning characteristics such as natural learning of hierarchical and robust representations due to a more effective usage of model depth, fewer gates for successful learning, better generalization capacity and faster convergence than the original highway network. Experimental results show that our models outperform the original highway network and many state-of-the-art models. Importantly, we observe that our second model with feature concatenation and compression consistently outperforms our model with feature summation of similar depth, the original highway network, many state-of-the-art models and even ResNets on four benchmarking datasets which are CIFAR-10, CIFAR-100, Fashion-MNIST, SVHN and imagenet-2012 (ILSVRC) datasets. Furthermore, the second proposed model is more computationally efficient than the state-of-the-art in view of training, inference time and GPU memory resource, which strongly supports real-time applications. Using a similar number of model parameters for the CIFAR-10, CIFAR-100, Fashion-MNIST and SVHN datasets, the significantly shallower proposed model can surpass the performance of ResNet-110 and ResNet-164 that are roughly 6 and 8 times deeper, respectively. Similarly, for the imagenet dataset, the proposed models surpass the performance of ResNet-101 and ResNet-152 that are roughly three times deeper.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE) , 2020. Vol. 8, p. 176758-176773
National Category
Signal Processing
Identifiers
URN: urn:nbn:se:kth:diva-295054DOI: 10.1109/ACCESS.2020.3026423ISI: 000575080200001Scopus ID: 2-s2.0-85102736729OAI: oai:DiVA.org:kth-295054DiVA, id: diva2:1555367
Note

QC 20210527

Available from: 2021-05-18 Created: 2021-05-18 Last updated: 2024-03-15Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Ottersten, Björn

Search in DiVA

By author/editor
Ottersten, Björn
In the same journal
IEEE Access
Signal Processing

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 14 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf