kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Multi-Sensor Remote Sensing for Urban Mapping and Change Detection Using Deep Learning
KTH, School of Architecture and the Built Environment (ABE), Urban Planning and Environment, Geoinformatics.ORCID iD: 0000-0003-3560-638x
2024 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Driven by the rapid growth in population, urbanization is progressing at an unprecedented rate in many places around the world. Earth observation (EO) has become a vital tool for monitoring urbanization on a global scale. Modern satellite missions, in particular, provide new opportunities for urban mapping and change detection (CD) through high-resolution imagery and frequent revisits. These missions have enabled multi-modal approaches by integrating data from different satellites, such as Sentinel-1 Synthetic Aperture Radar (SAR) and Sentinel-2 MultiSpectral Instrument (MSI). Concurrently, EO data analysis has evolved from traditional machine learning methods to deep learning (DL) models, particularly Convolutional Neural Networks (ConvNets). However, current DL methods for urban mapping and CD face several challenges, such as reliance on large labeled datasets for supervised training, the limited transferability of DL models across geographic regions, the effective integration of multi-modal EO data, and using satellite image time series (SITS) for CD. To address these challenges, this thesis aims to develop novel DL methods for robust urban mapping and CD using multi-source EO data.

First, a semi-supervised learning (SSL) method is introduced, leveraging multi-modal Sentinel-1 SAR and Sentinel-2 MSI data to improve the geographic transferability of urban mapping models. This method employs a dual stream ConvNet architecture to map built-up areas separately from SAR and optical images. By assuming consistent maps should be produced for both modalities, an unsupervised loss for unlabeled data is introduced to penalize discrepancies between them. Extensive evaluation using annotations from the SpaceNet 7 multi-temporal building monitoring dataset demonstrated that this SSL approach (F1 score 0.694) outperforms several supervised approaches (F1 scores ranging from 0.574 to 0.651). Furthermore, it produces built-up area maps that rival or surpass global human settlement maps like GHS-BUILT-S2 and WSF 2019.

For urban CD, a new network architecture is proposed for fusing bi-temporal Sentinel-1 SAR and Sentinel-2 MSI image pairs. This architecture uses a dual stream design to process each modality through separate ConvNets before combining the extracted features at a later stage. The proposed strategy outperforms other ConvNet-based approaches, both with uni-modal and multi-modal data. Additionally, it achieves state-of-the-art (SOTA) performance on the Onera Satellite CD dataset (F1 score 0.600).

Building on this, a second network architecture was developed to adapt the transferability improvement approach for urban CD. This approach uses bi-temporal Sentinel-1 SAR and Sentinel-2 MSI image pairs and outputs urban changes using a difference decoder while mapping built-up areas with a semantic decoder. Similar to the urban mapping method, inconsistencies in built-up area maps across modalities are penalized on unlabeled data. Evaluation on the SpaceNet 7 dataset, enhanced with Sentinel-1 SAR and Sentinel-2 MSI data, shows that the method performs well under limited label conditions, achieving an F1 score of 0.555 with all available labels, and delivering reasonable CD results (F1 score of 0.491) even with only 10 \% of the labeled data. In contrast, supervised multi-modal methods and SSL methods using optical data failed to exceed an F1 score of 0.402 under this condition.

A third urban CD method focuses on detecting changes in consecutive images of SITS (i.e., continuous urban CD). This method introduces a temporal feature refinement module that uses self-attention to enhance ConvNet-based multi-temporal representations of buildings. Additionally, a multi-task integration module employing Markov networks is proposed to generate optimal building map time series based on segmentation and dense change outputs. The proposed method effectively identifies urban changes in high-resolution SITS from PlanetScope (F1 score 0.551) and Gaofen-2 (F1 score 0.440), demonstrating superior performance compared to bi-temporal and multi-temporal urban CD and segmentation methods on two challenging datasets.

Finally, the thesis develops a baseline network for multi-hazard building damage detection using the xBD dataset, which contains bi-temporal images captured before and after natural disasters. The study examines model transferability across disaster types by employing a comprehensive dataset split and proposes incorporating disaster-specific information into the baseline model to account for disaster-specific damage characteristics. The disaster-adaptive model demonstrates improved generalization to unseen events compared to several competing methods.

This thesis addresses key challenges in urban mapping and urban CD, including multi-hazard building damage detection. By advancing methods that leverage multi-sensor EO data and DL techniques, this thesis makes major contributions to timely and reliable urban data production, thereby supporting sustainable urban planning and urban Sustainable Development Goal (SDG) indicators monitoring.

Abstract [sv]

Urbaniseringen drivs på av den snabba befolkningstillväxten och går framåt i en aldrig tidigare skådad takt på många platser runt om i världen. Jordobservation (EO) har blivit ett viktigt verktyg för att övervaka urbaniseringen på global nivå. I synnerhet moderna satellituppdrag ger nya möjligheter till stadskartläggning och upptäckt av förändringar (CD) genom högupplösta bilder och frekventa återbesök. Dessa uppdrag har möjliggjort multimodala tillvägagångssätt genom att integrera data från olika satelliter, t.ex. Sentinel-1 Synthetic Aperture Radar (SAR) och Sentinel-2 MultiSpectral Instrument (MSI). Samtidigt har analysen av EO-data utvecklats från traditionella maskininlärningsmetoder till modeller för djupinlärning (DL), i synnerhet Convolutional Neural Networks (ConvNets). Nuvarande DL-metoder för stadskartläggning och CD står dock inför flera utmaningar, till exempel beroende av stora märkta dataset för övervakad träning, den begränsade överförbarheten av DL-modeller över geografiska regioner, effektiv integration av multimodala EO-data och användning av satellitbildstidsserier (SITS) för CD. För att ta itu med dessa utmaningar syftar denna avhandling till att utveckla nya djupinlärningsmetoder för robust stadskartläggning och förändringsdetektering med hjälp av EO-data från flera källor.

Först introduceras en SSL-metod (semi-supervised learning) som utnyttjar multimodala Sentinel-1 SAR- och Sentinel-2 MSI-data för att förbättra den geografiska överförbarheten av stadskartläggningsmodeller. Metoden använder en ConvNet-arkitektur med dubbla flöden för att kartlägga bebyggda områden separat från SAR- och optiska bilder. Genom att anta att konsekventa kartor ska produceras för båda modaliteterna införs en oövervakad förlust för omärkta data för att straffa avvikelser mellan dem. En omfattande utvärdering med hjälp av annoteringar från SpaceNet 7 multi-temporala dataset för byggnadsövervakning visade att denna SSL-metod (F1-poäng 0,694) överträffar flera övervakade metoder (F1-poäng från 0,574 till 0,651). Dessutom producerar den kartor över uppbyggda områden som konkurrerar med eller överträffar globala kartor över mänskliga bosättningar som GHS-BUILT-S2 och WSF 2019.

För CD i städer föreslås en ny nätverksarkitektur för sammanslagning av bi-temporala Sentinel-1 SAR- och Sentinel-2 MSI-bildpar. Denna arkitektur använder en dubbel strömdesign för att bearbeta varje modalitet genom separata ConvNets innan de extraherade funktionerna kombineras i ett senare skede. Den föreslagna strategin överträffar andra ConvNet-baserade metoder, både med uni-modal och multimodal data. Dessutom uppnår den toppmodern (SOTA) prestanda på Onera Satellite CD-dataset (F1-poäng 0,600).

På grundval av detta utvecklades en andra nätverksarkitektur för att anpassa metoden för förbättring av överförbarheten för CD i städer. Denna metod använder bi-temporala Sentinel-1 SAR- och Sentinel-2 MSI-bildpar och matar ut stadsförändringar med hjälp av en differensavkodare samtidigt som bebyggda områden kartläggs med en semantisk avkodare. I likhet med metoden för stadskartläggning straffas inkonsekvenser i kartor över bebyggda områden över modaliteter på omärkta data. Utvärdering på SpaceNet 7-datasetet, förbättrat med Sentinel-1 SAR och Sentinel-2 MSI-data, visar att metoden fungerar bra under begränsade etikettförhållanden, uppnår en F1-poäng på 0,555 med alla tillgängliga etiketter och levererar rimliga CD-resultat (F1-poäng på 0,491) även med endast 10 \% av de märkta data. Däremot lyckades inte övervakade multimodala metoder och SSL-metoder som använder optiska data överstiga en F1-poäng på 0,402 under detta villkor.

En tredje urban CD-metod fokuserar på att upptäcka förändringar i på varandra följande bilder av SITS (dvs. kontinuerlig urban CD). Denna metod introducerar en temporal funktionsförfiningsmodul som använder självupp-märksamhet för att förbättra ConvNet-baserade multitemporala representationer av byggnader. Dessutom föreslås en integrationsmodul med flera uppgifter som använder Markov-nätverk för att generera optimala tidsserier för byggnadskartor baserat på segmentering och täta förändringsutgångar. Den föreslagna metoden identifierar effektivt stadsförändringar i högupplösta SITS från PlanetScope (F1-poäng 0,551) och Gaofen-2 (F1-poäng 0,440), vilket visar överlägsen prestanda jämfört med bi-temporala och multi-temporala urbana CD- och segmenteringsmetoder på två utmanande dataset.

Slutligen utvecklar avhandlingen ett baslinjenätverk för detektering av byggnadsskador med flera faror med hjälp av xBD-datasetet, som innehåller bi-temporala bilder tagna före och efter naturkatastrofer. Studien undersöker modellens överförbarhet mellan olika katastroftyper genom att använda en omfattande datasetdelning och föreslår att katastrofspecifik information in-förlivas i baslinjemodellen för att ta hänsyn till katastrofspecifika skadeegenskaper. Den katastrofadaptiva modellen visar förbättrad generalisering till osedda händelser jämfört med flera konkurrerande metoder.

Denna avhandling behandlar viktiga utmaningar inom stadskartläggning och urban CD, inklusive detektering av byggnadsskador med flera faror. Genom att utveckla metoder som utnyttjar EO-data från flera sensorer och DL-tekniker ger den här avhandlingen viktiga bidrag till snabb och tillförlitlig produktion av stadsdata, vilket stöder hållbar stadsplanering och indikatorer för hållbara utvecklingsmål (SDG) i städer.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2024. , p. 86
Series
TRITA-ABE-DLT ; 2440
Keywords [en]
Remote Sensing, Semantic Segmentation, Domain Adaptation, Urban Mapping, Change Detection, Synthetic Aperture Radar, Optical, Data Fusion
National Category
Earth Observation
Research subject
Geodesy and Geoinformatics, Geoinformatics
Identifiers
URN: urn:nbn:se:kth:diva-356875ISBN: 978-91-8106-157-4 (print)OAI: oai:DiVA.org:kth-356875DiVA, id: diva2:1916164
Public defence
2024-12-13, D37, Lindstedtsvägen 5, KTH Campus, https://kth-se.zoom.us/j/65114181594, Stockholm, 09:00 (English)
Opponent
Supervisors
Note

QC241126

Available from: 2024-11-26 Created: 2024-11-26 Last updated: 2025-03-24Bibliographically approved
List of papers
1. Unsupervised domain adaptation for global urban extraction using Sentinel-1 SAR and Sentinel-2 MSI data
Open this publication in new window or tab >>Unsupervised domain adaptation for global urban extraction using Sentinel-1 SAR and Sentinel-2 MSI data
2022 (English)In: Remote Sensing of Environment, ISSN 0034-4257, E-ISSN 1879-0704, Vol. 280, p. 113192-, article id 113192Article in journal (Refereed) Published
Abstract [en]

Accurate and up-to-date maps of built-up areas are crucial to support sustainable urban development. Earth Observation (EO) is a valuable data source to cover this demand. In particular, Sentinel-1 Synthetic Aperture Radar (SAR) and Sentinel-2 MultiSpectral Instrument (MSI) missions offer new opportunities to map built-up areas on a global scale. Using Sentinel-2 images, recent urban mapping efforts achieved promising results by training Convolutional Neural Networks (CNNs) on available built-up data. However, these results strongly depend on the availability of local reference data for fully supervised training or assume that the application of CNNs to unseen areas (i.e. across-region generalization) produces satisfactory results. To alleviate these short-comings, it is desirable to leverage Semi-Supervised Learning (SSL) algorithms that can take advantage of un-labeled data, especially because satellite data is plentiful. In this paper, we propose a novel Domain Adaptation (DA) approach using SSL that jointly exploits Sentinel-1 SAR and Sentinel-2 MSI to improve across-region generalization for built-up area mapping. Specifically, two identical sub-networks are incorporated into the proposed model to perform built-up area segmentation from SAR and optical images separately. Assuming that consistent built-up area segmentation should be obtained across data modality, we design an unsupervised loss for unlabeled data that penalizes inconsistent segmentation from the two sub-networks. Therefore, we propose to use complementary data modalities as real-world perturbations for consistency regularization. For the final prediction, the model takes both data modalities into account. Experiments conducted on a test set comprised of sixty representative sites across the world showed that the proposed DA approach achieves strong improvements (F1 score 0.694) over fully supervised learning from Sentinel-1 SAR data (F1 score 0.574), Sentinel-2 MSI data (F1 score 0.580) and their input-level fusion (F1 score 0.651). To demonstrate the effectiveness of DA, we also performed a comparison with two state-of-the-art products, namely GHS-BUILT-S2 and WSF 2019, on the test set. The comparison showed that our model is capable of producing built-up area maps with comparable or even better quality than the state-of-the-art global human settlement maps. Therefore, the multi-modal DA offers great potential to be adapted to produce easily updateable human settlements maps at a global scale.

Place, publisher, year, edition, pages
Elsevier BV, 2022
Keywords
Built-up area mapping, Deep learning, Data fusion, Semi-supervised learning, Domain adaptation, Semantic segmentation
National Category
Computer graphics and computer vision Human Geography
Identifiers
urn:nbn:se:kth:diva-320678 (URN)10.1016/j.rse.2022.113192 (DOI)000863232800001 ()2-s2.0-85135516585 (Scopus ID)
Note

QC 20221031

Available from: 2022-10-31 Created: 2022-10-31 Last updated: 2025-02-01Bibliographically approved
2. Sentinel-1 and Sentinel-2 Data Fusion for Urban Change Detection Using a Dual Stream U-Net
Open this publication in new window or tab >>Sentinel-1 and Sentinel-2 Data Fusion for Urban Change Detection Using a Dual Stream U-Net
2022 (English)In: IEEE Geoscience and Remote Sensing Letters, ISSN 1545-598X, E-ISSN 1558-0571, Vol. 19, article id 4019805Article in journal (Refereed) Published
Abstract [en]

Urbanization is progressing rapidly around the world. With sub-weekly revisits at global scale, Sentinel-1 synthetic aperture radar (SAR) and Sentinel-2 multispectral imager (MSI) data can play an important role for monitoring urban sprawl to support sustainable development. In this letter, we proposed an urban change detection (CD) approach featuring a new network architecture for the fusion of SAR and optical data. Specifically, a dual stream concept was introduced to process different data modalities separately, before combining extracted features at a later decision stage. The individual streams are based on U-Net architecture that is one of the most popular fully convolutional networks used for semantic segmentation. The effectiveness of the proposed approach was demonstrated using the Onera Satellite CD (OSCD) dataset. The proposed strategy outperformed other U-Net-based approaches in combination with unimodal data and multimodal data with feature level fusion. Furthermore, our approach achieved state-of-the-art performance on the urban CD problem posed by the OSCD dataset. Our Sentinel-1 SAR data and code are available on https://github.com/SebastianHafner/DS_UNet.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2022
Keywords
Synthetic aperture radar, Optical imaging, Optical sensors, Training, Streaming media, Feature extraction, Convolution, Data fusion, deep learning, remote sensing, urban change detection (CD)
National Category
Earth Observation
Identifiers
urn:nbn:se:kth:diva-307314 (URN)10.1109/LGRS.2021.3119856 (DOI)000740006800040 ()2-s2.0-85117749240 (Scopus ID)
Note

QC 20220120

Available from: 2022-01-20 Created: 2022-01-20 Last updated: 2025-02-10Bibliographically approved
3. Semi-Supervised Urban Change Detection Using Multi-Modal Sentinel-1 SAR and Sentinel-2 MSI Data
Open this publication in new window or tab >>Semi-Supervised Urban Change Detection Using Multi-Modal Sentinel-1 SAR and Sentinel-2 MSI Data
2023 (English)In: Remote Sensing, E-ISSN 2072-4292, Vol. 15, no 21, article id 5135Article in journal (Refereed) Published
Abstract [en]

Urbanization is progressing at an unprecedented rate in many places around the world. The Sentinel-1 synthetic aperture radar (SAR) and Sentinel-2 MultiSpectral Instrument (MSI) missions, combined with deep learning, offer new opportunities to accurately monitor urbanization at a global scale. Although the joint use of SAR and optical data has recently been investigated for urban change detection, existing data fusion methods rely heavily on the availability of sufficient training labels. Meanwhile, change detection methods addressing label scarcity are typically designed for single-sensor optical data. To overcome these limitations, we propose a semi-supervised urban change detection method that exploits unlabeled Sentinel-1 SAR and Sentinel-2 MSI data. Using bitemporal SAR and optical image pairs as inputs, the proposed multi-modal Siamese network predicts urban changes and performs built-up area segmentation for both timestamps. Additionally, we introduce a consistency loss, which penalizes inconsistent built-up area segmentation across sensor modalities on unlabeled data, leading to more robust features. To demonstrate the effectiveness of the proposed method, the SpaceNet 7 dataset, comprising multi-temporal building annotations from rapidly urbanizing areas across the globe, was enriched with Sentinel-1 SAR and Sentinel-2 MSI data. Subsequently, network performance was analyzed under label-scarce conditions by training the network on different fractions of the labeled training set. The proposed method achieved an F1 score of 0.555 when using all available training labels, and produced reasonable change detection results (F1 score of 0.491) even with as little as 10% of the labeled training data. In contrast, multi-modal supervised methods and semi-supervised methods using optical data failed to exceed an F1 score of 0.402 under this condition. Code and data are made publicly available.

Place, publisher, year, edition, pages
MDPI AG, 2023
Keywords
consistency regularization, data fusion, deep learning, remote sensing, urbanization monitoring
National Category
Earth Observation Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-340117 (URN)10.3390/rs15215135 (DOI)001099595200001 ()2-s2.0-85176292961 (Scopus ID)
Note

QC 20231128

Available from: 2023-11-28 Created: 2023-11-28 Last updated: 2025-02-10Bibliographically approved
4. Continuous Urban Change Detection from Satellite Image Time Series with Temporal Feature Refinement and Multi-Task Integration
Open this publication in new window or tab >>Continuous Urban Change Detection from Satellite Image Time Series with Temporal Feature Refinement and Multi-Task Integration
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Urbanization advances at unprecedented rates, resulting in negative effects on the environment and human well-being. Remote sensing has the potential to mitigate these effects by supporting sustainable development strategies with accurate information on urban growth. Deep learning-based methods have achieved promising urban change detection results from optical satellite image pairs using convolutional neural networks (ConvNets), transformers, and a multi-task learning setup. However, transformers have not been leveraged for urban change detection with multi-temporal data, i.e., >2 images, and multi-task learning methods lack integration approaches that combine change and segmentation outputs. To fill this research gap, we propose a continuous urban change detection method that identifies changes in each consecutive image pair of a satellite image time series (SITS). Specifically, we propose a temporal feature refinement (TFR) module that utilizes self-attention to improve ConvNet-based multi-temporal building representations. Furthermore, we propose a multi-task integration (MTI) module that utilizes Markov networks to find an optimal building map time series based on segmentation and dense change outputs. The proposed method effectively identifies urban changes based on high-resolution SITS acquired by the PlanetScope constellation (F1 score 0.551) and Gaofen-2 (F1 score 0.440). Moreover, our experiments on two challenging datasets demonstrate the effectiveness of the proposed method compared to bi-temporal and multi-temporal urban change detection and segmentation methods.

Keywords
Earth observation, Remote sensing, Multi-temporal, Multi-task learning, Transformers
National Category
Earth Observation
Research subject
Geodesy and Geoinformatics, Geoinformatics
Identifiers
urn:nbn:se:kth:diva-356873 (URN)10.48550/arXiv.2406.17458 (DOI)
Note

QC 20241202

Available from: 2024-11-25 Created: 2024-11-25 Last updated: 2025-02-10Bibliographically approved
5. DisasterAdaptiveNet: A Robust Network for Multi-Hazard Building Damage Detection from Very-High-Resolution Satellite Imagery
Open this publication in new window or tab >>DisasterAdaptiveNet: A Robust Network for Multi-Hazard Building Damage Detection from Very-High-Resolution Satellite Imagery
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Earth observation satellites play a crucial role in disaster response and management, offering timely and large-scale data for damage assessment. Recent studies have demonstrated the potential of deep learning techniques for automated building damage detection from satellite imagery. Notably, several studies have proposed new network architectures and demonstrated their effectiveness on the xBD dataset featuring bi-temporal very-high-resolution image pairs of several disaster events. Although achieving promising results, many of these methods are highly engineered to achieve high performance. The highly complex architectures may limit their practicality in emergency scenarios where simplicity and generalization are essential. To address this, we construct a strong yet simplified baseline method for multi-hazard building damage detection, based on the key components of the 1st place solution of the xView2 Challenge, which also uses the xBD dataset. Our approach achieves comparable performance to more complex methods on the xView2 split of the xBD dataset, while significantly reducing complexity. Furthermore, we propose incorporating readily available disaster-type information into the strong baseline model to account for disaster-specific damage characteristics. We evaluate our disaster-adaptive model on a new event-based split of the xBD dataset and demonstrate its improved ability to generalize to unseen events compared to several competing methods. These results highlight the potential of our approach for practical and robust building damage assessment in real-world disaster scenarios.

Keywords
Earth observation, Deep learning, Multi-task learning, Model conditioning
National Category
Earth Observation
Research subject
Geodesy and Geoinformatics, Geoinformatics
Identifiers
urn:nbn:se:kth:diva-356874 (URN)
Note

QC 20241129

Available from: 2024-11-25 Created: 2024-11-25 Last updated: 2025-02-10Bibliographically approved

Open Access in DiVA

summary(39765 kB)139 downloads
File information
File name SUMMARY01.pdfFile size 39765 kBChecksum SHA-512
0cb932739c8c4410fe21db2f213190d68acb57ea87f9280be9c2c3d8c98ded5012a82b0f13e5f8a721f3e305c1c5601cafc20e051880d139bfd2fc5f447d002a
Type summaryMimetype application/pdf

Authority records

Hafner, Sebastian

Search in DiVA

By author/editor
Hafner, Sebastian
By organisation
Geoinformatics
Earth Observation

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 710 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf