kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Highway network block with gates constraints for training very deep networks
Interdisciplinary Centre for Security, Reliability and Trust (SnT), University of Luxembourg, Luxembourg, L-1855, Luxembourg. (Signal Processing)ORCID iD: 0000-0003-2298-6774
2018 (English)In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Institute of Electrical and Electronics Engineers (IEEE), 2018, p. 1739-1748Conference paper, Published paper (Refereed)
Abstract [en]

In this paper, we propose to reformulate the learning of the highway network block to realize both early optimization and improved generalization of very deep networks while preserving the network depth. Gate constraints are duly employed to improve optimization, latent representations and parameterization usage in order to efficiently learn hierarchical feature transformations which are crucial for the success of any deep network. One of the earliest very deep models with over 30 layers that was successfully trained relied on highway network blocks. Although, highway blocks suffice for alleviating optimization problem via improved information flow, we show for the first time that further in training such highway blocks may result into learning mostly untransformed features and therefore a reduction in the effective depth of the model; this could negatively impact model generalization performance. Using the proposed approach, 15-layer and 20-layer models are successfully trained with one gate and a 32-layer model using three gates. This leads to a drastic reduction of model parameters as compared to the original highway network. Extensive experiments on CIFAR-10, CIFAR-100, Fashion-MNIST and USPS datasets are performed to validate the effectiveness of the proposed approach. Particularly, we outperform the original highway network and many state-of-the-art results. To the best our knowledge, on the Fashion-MNIST and USPS datasets, the achieved results are the best reported in literature.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2018. p. 1739-1748
Series
IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, ISSN 2160-7508
National Category
Signal Processing
Identifiers
URN: urn:nbn:se:kth:diva-287029DOI: 10.1109/CVPRW.2018.00217ISI: 000457636800210Scopus ID: 2-s2.0-85060840774OAI: oai:DiVA.org:kth-287029DiVA, id: diva2:1555324
Conference
31st Meeting of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2018, 18 June 2018 - 22 June 2018
Note

QC 20210603

Available from: 2021-05-18 Created: 2021-05-18 Last updated: 2024-03-15Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Ottersten, Björn

Search in DiVA

By author/editor
Ottersten, Björn
Signal Processing

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 6 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf