A Novel CNN–ViT Model with Cascade Upsampling for Efficient Crack SegmentationShow others and affiliations
2026 (English)In: Sensors, E-ISSN 1424-8220, Vol. 26, no 5, article id 1667
Article in journal (Refereed) Published
Abstract [en]
Accurate crack segmentation in civil infrastructure imagery remains challenging because of the prevalence of thin, low-contrast, and spatially discontinuous defects that often appear amid textured surfaces, shadows, and acquisition noise. Although Transformer-based models improve global context modeling, many existing solutions incur substantial computational and memory overhead, which limits their use in practical, resource-constrained inspection settings. In this work, we introduce an efficient hybrid segmentation architecture that combines a convolutional encoder for high-fidelity local representation with a lightweight Transformer bottleneck for global dependency modeling, followed by a progressive decoder that restores spatial resolution through multi-level skip-feature fusion. To better accommodate severe foreground sparsity and preserve fine crack structures, the framework is trained with a composite Dice–Binary Cross-Entropy objective and employs a tokenization strategy designed to preserve fine spatial details while enabling efficient global context modeling. We validate the proposed approach on four public benchmarks, demonstrating consistent improvements over representative convolutional, Transformer-based, and hybrid baselines, while ablation studies confirm the contribution of each design component. Finally, runtime profiling shows favorable latency and memory characteristics, supporting real-time or near real-time deployment on embedded and edge inspection platforms.
Place, publisher, year, edition, pages
MDPI AG , 2026. Vol. 26, no 5, article id 1667
Keywords [en]
convolutional neural networks, crack segmentation, deep learning, edge computing, optical digital imaging, structural health monitoring, vision transformer
National Category
Computer graphics and computer vision Computer Engineering
Identifiers
URN: urn:nbn:se:kth:diva-378789DOI: 10.3390/s26051667ISI: 001713879700001PubMedID: 41829631Scopus ID: 2-s2.0-105032639910OAI: oai:DiVA.org:kth-378789DiVA, id: diva2:2049435
Note
QC 20260330
2026-03-302026-03-302026-03-30Bibliographically approved