kth.sePublications KTH
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
EMSNet: Efficient Multimodal Symmetric Network for Semantic Segmentation of Urban Scene From Remote Sensing Imagery
Zhejiang Univ Technol, Coll Informat Engn, Hangzhou 310023, Peoples R China..
Zhejiang Univ Technol, Coll Informat Engn, Hangzhou 310023, Peoples R China..
Zhejiang Univ Technol, Inst Cyberspace Secur, Hangzhou 310023, Peoples R China..
Zhejiang Univ Technol, Inst Cyberspace Secur, Hangzhou 310023, Peoples R China..
Show others and affiliations
2025 (English)In: IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, ISSN 1939-1404, E-ISSN 2151-1535, Vol. 18, p. 5878-5892Article in journal (Refereed) Published
Abstract [en]

High-resolution remote sensing imagery (RSI) plays a pivotal role in the semantic segmentation (SS) of urban scenes, particularly in urban management tasks such as building planning and traffic flow analysis. However, the dense distribution of objects and the prevalent background noise in RSI make it challenging to achieve stable and accurate results from a single view. Integrating digital surface models (DSM) can achieve high-precision SS. But this often requires extensive computational resources. It is essential to address the tradeoff between accuracy and computational cost and optimize the method for deployment on edge devices. In this article, we introduce an efficient multimodal symmetric network (EMSNet) designed to perform SS by leveraging both optical and DSM images. Unlike other multimodal methods, EMSNet adopts a dual encoder-decoder structure to build a direct connection between DSM data and the final result, making full use of the advanced DSM. Between branches, we propose a continuous feature interaction to guide the DSM branch by RGB features. Within each branch, multilevel feature fusion captures low spatial and high semantic information, improving the model's scene perception. Meanwhile, knowledge distillation (KD) further improves the performance and generalization of EMSNet. Experiments on the Potsdam and Vaihingen datasets demonstrate the superiority of our method over other baseline models. Ablation experiments validate the effectiveness of each component. Besides, the KD strategy is confirmed by comparing it with the segment anything model (SAM). It enables the proposed multimodal SS network to match SAM's performance with only one-fifth of the parameters, computation, and latency.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE) , 2025. Vol. 18, p. 5878-5892
Keywords [en]
Optical sensors, Optical imaging, Remote sensing, Feature extraction, Accuracy, Buildings, Biomedical optical imaging, Decoding, Computational modeling, Remote sensing image interpretation, segment anything, semantic segmentation, symmetric multimodal
National Category
Computer graphics and computer vision
Identifiers
URN: urn:nbn:se:kth:diva-361297DOI: 10.1109/JSTARS.2025.3531422ISI: 001432389100006Scopus ID: 2-s2.0-85216074232OAI: oai:DiVA.org:kth-361297DiVA, id: diva2:1944880
Note

QC 20250317

Available from: 2025-03-17 Created: 2025-03-17 Last updated: 2025-03-17Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Zhang, Puzhao

Search in DiVA

By author/editor
Zhang, Puzhao
By organisation
Geoinformatics
In the same journal
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Computer graphics and computer vision

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 60 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf