kth.sePublications KTH
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A Novel CNN–ViT Model with Cascade Upsampling for Efficient Crack Segmentation
LESIA Laboratory, Department of Computer Science, University of Biskra, Biskra 07000, Algeria.
Key Laboratory of Basic Pharmacology and Joint International Laboratory of Ethnic Medicine, Ministry of Education, Zunyi Medical University, Zunyi, China.
VSC Laboratory, University of Biskra, Biskra 07000, Algeria.
RLP Laboratory, Department of Computer Science, University of Biskra, Biskra, 07000, Algeria.
Show others and affiliations
2026 (English)In: Sensors, E-ISSN 1424-8220, Vol. 26, no 5, article id 1667Article in journal (Refereed) Published
Abstract [en]

Accurate crack segmentation in civil infrastructure imagery remains challenging because of the prevalence of thin, low-contrast, and spatially discontinuous defects that often appear amid textured surfaces, shadows, and acquisition noise. Although Transformer-based models improve global context modeling, many existing solutions incur substantial computational and memory overhead, which limits their use in practical, resource-constrained inspection settings. In this work, we introduce an efficient hybrid segmentation architecture that combines a convolutional encoder for high-fidelity local representation with a lightweight Transformer bottleneck for global dependency modeling, followed by a progressive decoder that restores spatial resolution through multi-level skip-feature fusion. To better accommodate severe foreground sparsity and preserve fine crack structures, the framework is trained with a composite Dice–Binary Cross-Entropy objective and employs a tokenization strategy designed to preserve fine spatial details while enabling efficient global context modeling. We validate the proposed approach on four public benchmarks, demonstrating consistent improvements over representative convolutional, Transformer-based, and hybrid baselines, while ablation studies confirm the contribution of each design component. Finally, runtime profiling shows favorable latency and memory characteristics, supporting real-time or near real-time deployment on embedded and edge inspection platforms.

Place, publisher, year, edition, pages
MDPI AG , 2026. Vol. 26, no 5, article id 1667
Keywords [en]
convolutional neural networks, crack segmentation, deep learning, edge computing, optical digital imaging, structural health monitoring, vision transformer
National Category
Computer graphics and computer vision Computer Engineering
Identifiers
URN: urn:nbn:se:kth:diva-378789DOI: 10.3390/s26051667ISI: 001713879700001PubMedID: 41829631Scopus ID: 2-s2.0-105032639910OAI: oai:DiVA.org:kth-378789DiVA, id: diva2:2049435
Note

QC 20260330

Available from: 2026-03-30 Created: 2026-03-30 Last updated: 2026-03-30Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textPubMedScopus

Authority records

Habib, Mustapha

Search in DiVA

By author/editor
Habib, Mustapha
By organisation
Building Technology and Design
In the same journal
Sensors
Computer graphics and computer visionComputer Engineering

Search outside of DiVA

GoogleGoogle Scholar

doi
pubmed
urn-nbn

Altmetric score

doi
pubmed
urn-nbn
Total: 20 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf