kth.sePublications KTH
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Multi-Modal Deep Learning for 2D/3D Mapping with Satellite Time Series images: From Floods to Forests to Cities
KTH, School of Architecture and the Built Environment (ABE), Urban Planning and Environment, Geoinformatics.ORCID iD: 0000-0003-3599-3164
2025 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Driven by climate change and rapid urbanization, there is an urgent need for reliable large-scale Earth observation (EO) products that capture both two-dimensional (2D) and three-dimensional (3D) characteristics of the Earth’s surface. Modern satellite missions, particularly Sentinel-1 Synthetic Aperture Radar (SAR) and Sentinel-2 MultiSpectral Instrument (MSI), provide freely accessible global-scale imagery with frequent revisits, offering new opportunities for large-scale mapping such as floods, urban growth, and forest dynamics. Concurrently, deep learning (DL) has become state-of-the-art for EO analysis. However, challenges remain in ensuring generalization across regions, reducing reliance on labeled data, extracting 3D features from mid-resolution imagery, and enhancing reliability through uncertainty estimation. This thesis addresses these challenges by proposing novel DL models for 2D and 3D applications, improving model generalizability, curating benchmark datasets, and integrating uncertainty estimation into EO tasks.

For 2D mapping, this thesis focuses on flood mapping as the primary application. Two supervised segmentation networks were developed for the task. The first, Attentive U-Net, enhances Sentinel-1 VV, VH, and VV/VH ratio inputs using spatial and channel-wise self-attention. The second, a dual-stream Fusion Network, integrates Sentinel-1 data with DEM and permanent water masks for improved contextual learning. Both outperformed supervised baselines on the Sen1Floods11 dataset, achieving 3–5% higher IoU. To further improve model generalizability and reduce dependency on labels, an unsupervised model (CLVAE) was developed that learns spatiotemporal features from Sentinel-1 SAR time series using reconstruction and contrastive learning. Flood maps are derived by detecting changes in latent feature distributions of pre and post-flood time series images. CLVAE achieved 70% IoU, surpassing unsupervised baselines by a minimum margin of 15% IoU and outperforming supervised models when tested on unseen flood sites, showing a higher model generalizability.

For 3D mapping, multiple advances were made. A hybrid CNN-transformer architecture (T-SwinUNet) was proposed for large-scale building height estimation from 12-month Sentinel-1 and Sentinel-2 time series. Leveraging multi-modal spatio-temporal features and multitask learning, it achieved 1.89 m RMSE at 10 m resolution and generalized across diverse European cities. The model outperformed existing global height product GHSL-Built-H.To further improve building height estimation accuracy, the M4Heights benchmark dataset was released, covering sites in Estonia, Switzerland, and the Netherlands. Combining 10 m Sentinel-1&2 time series with 1 m aerial orthophotos enables multi-scale and multitask learning for super-resolution building height estimation. Baseline evaluations confirmed its benefits, and the open dataset supports fair model comparisons and encourages further innovation in the field.Extending 3D mapping from the built environment to natural ecosystems, the BioMassters benchmark dataset for above-ground forest biomass estimation was curated and released. It covers 8.5 million hectares of Finnish forests, with labels derived from high-resolution LiDAR data and inputs from Sentinel-1&2 time series. Released alongside a global challenge with over 1000 model submissions, the results demonstrated the superiority of DL methods over the coarse 100 m ESA CCI Biomass product, enabling biomass mapping at 10 m resolution and underscoring the importance of open, DL-ready datasets.

The thesis further advances 3D mapping by integrating uncertainty quantification into large-scale regression tasks for building height, canopy height, and biomass estimation at 10 m resolution. Two uncertainty quantification approaches were investigated through: (i) a Gaussian uncertainty model, which assumes symmetric error distributions, and (ii) a Quantile uncertainty model, which provides asymmetric intervals and captures the direction of uncertainty. Both methods achieved accuracy comparable to deterministic baselines while additionally providing calibrated confidence intervals. Importantly, they outperformed existing global canopy and biomass products that include uncertainty information. The Gaussian model performed best for canopy height and biomass, while the quantile model proved more robust for building height, where data follow strictly non-Gaussian and skewed distributions. Together, these advances establish uncertainty-aware regression as a critical step toward making EO-derived 3D products more trustworthy for real-world applications.

In conclusion, this thesis addresses key challenges in large-scale 2D and 3D EO tasks, spanning flood detection, building height estimation, biomass estimation, and canopy height estimation. By advancing DL models that leverage time series of Sentinel-1&2 imagery, integrating uncertainty quantification into the model and releasing benchmark datasets, this thesis makes major contributions to scalable, reliable and reproducible EO data products. These advances enhance the trustworthiness of EO-derived products for real-world applications, supporting sustainable urban planning, climate resilience, and the monitoring of Sustainable Development Goals.

Abstract [sv]

Drivna av klimatförändringar och snabb urbanisering finns ett akut behovav tillförlitliga jordobservationsprodukter (EO) i stor skala som fångarbåde tvådimensionella (2D) och tredimensionella (3D) egenskaper hos jordensyta. Moderna satellitmissioner, särskilt Sentinel-1 syntetiska aperturradar(SAR) och Sentinel-2:s MultiSpectral Instrument (MSI), tillhandahållerfritt tillgänglig bilddata i global skala med frekventa återbesök, vilket erbjudernya möjligheter för storskalig kartläggning såsom översvämningar, urbantillväxt och skogsdynamik. Samtidigt har djupinlärning (DL) blivit det ledandetillvägagångssättet för EO-analys. Dock kvarstår utmaningar med attsäkerställa generalisering över olika regioner, minska beroendet av märkta data,utvinna 3D information från bilddata med medelhög upplösning samt ökatillförlitligheten genom osäkerhetsuppskattning. Denna avhandling adresserardessa utmaningar genom att föreslå nya DL-modeller för 2D och 3D applikationer,förbättra modellernas generaliserbarhet, kurera referensdataset samtintegrera osäkerhetsuppskattning i EO uppgifter.

För 2D-kartläggning fokuserar avhandlingen på översvämningskartläggningsom huvudapplikation. Två övervakade segmenteringsnätverk utveckladesför uppgiften. Det första, Attentive U-Net, utnyttjar Sentinel-1 inmatningar(VV, VH samt VV/VH kvot) och förstärker dem med rumslig ochkanalvis självuppmärksamhet. Det andra, ett tvåströms-fusionsnätverk, integrerarSentinel-1-data med digital höjdmodell (DEM) och permanenta vattenmaskerför förbättrad kontextuell inlärning. Båda överträffade övervakadebaslinjemodeller på Sen1Floods11-datasetet och uppnådde 3-5% högre IoU.För att ytterligare förbättra modellernas generaliserbarhet och minska beroendetav märkta data utvecklades en osuperviserad modell (CLVAE) somlär sig spatiotemporala egenskaper från Sentinel-1 SAR tidsserier via rekonstruktionoch kontrastiv inlärning. Översvämningskartor härleds genom attupptäcka förändringar i latenta representationsfördelningar mellan före ochefteröversvämnings-tidsserier. CLVAE uppnådde 70% IoU, överträffade osuperviseradebaslinjer med minst 15% IoU och presterade bättre än övervakademodeller vid test på tidigare osedda översvämningsområden, vilket visar påhögre modellgeneraliserbarhet.

För 3D-kartläggning gjordes flera framsteg. En hybridarkitektur med CNNoch transformer (T-SwinUNet) föreslogs för storskalig skattning av byggnadshöjderfrån 12 månaders Sentinel-1 och Sentinel-2 tidsserier. Genom att utnyttjamultimodala spatiotemporala egenskaper och multitask-inlärning uppnåddesett RMSE på 1.89m vid 10m upplösning och modellen generaliseradeväl över olika europeiska städer. Den överträffade den befintliga globalabyggnadshöjdsprodukten GHSL-Built-H. För att ytterligare förbättra noggrannheteni byggnadshöjdsskattning släpptes referensdatasetet M4Heights,som täcker områden i Estland, Schweiz och Nederländerna. Kombinationenav 10m Sentinel-1&2 tidsserier med 1m flygfotobaserade ortofoton möjliggörmultiskalig och multitask-inlärning för superupplöst byggnadshöjdsskattning.Baslinjeutvärderingar bekräftade dess fördelar, och det öppna datasetet stödjerrättvisa modelljämförelser och uppmuntrar vidare innovation inom området.

Genom att utvidga 3D-kartläggning från den byggda miljön till naturligaekosystem kuraterades och släpptes referensdatasetet BioMassters för skattningav biomassa ovan mark i skogar. Den täcker 8.5 miljoner hektar finskaskogar, med etiketter härledda från högupplöst LiDAR-data och indata frånSentinel-1&2-tidsserier. Datasetet släpptes tillsammans med en global tävlingmed över 1000 modellbidrag. Resultaten visade på DL-metodernas överlägsenhetjämfört med den grova 100m ESA CCI Biomass produkten, vilketmöjliggör biomassakartläggning vid 10m upplösning och understryker viktenav öppna, djupinlärningsklara dataset.

Avhandlingen för 3D kartläggning går vidare genom att integrera osäkerhetskvantifieringi storskaliga regressionsuppgifter för byggnadshöjd, trädhöjdoch biomassa vid 10m upplösning. Två metoder för osäkerhetskvantifieringundersöktes: (i) en gaussisk osäkerhetsmodell, som antar symmetriska fel, och(ii) en kvantilmodell, som ger asymmetriska intervall och fångar riktningen påosäkerheten. Båda metoderna uppnådde noggrannhet jämförbar med deterministiskamodeller samtidigt som de tillhandahöll kalibrerade konfidensintervall.Viktigt är att de presterade bättre än befintliga globala produkter förträdhöjd och biomassa som inkluderar osäkerhetsinformation. Den gaussiskamodellen presterade bäst för trädhöjd och biomassa, medan kvantilmodellenvisade sig mer robust för byggnadshöjd, där data följer icke gaussiskaoch snedfördelade mönster. Tillsammans etablerar dessa framsteg osäkerhetsmedvetenregression som ett avgörande steg för att göra EO-härledda 3Dproduktermer tillförlitliga för verkliga applikationer.

Sammanfattningsvis adresserar denna avhandling centrala utmaningar inomstorskaliga 2D och 3D EO uppgifter, från översvämningsdetektion tillskattning av byggnadshöjd, biomassa och trädhöjd. Genom att utveckla DLmodellersom utnyttjar tidsserier av Sentinel-1&2, integrera osäkerhetskvantifieringi modellerna och släppa referensdataset bidrar avhandlingen medskalbara, tillförlitliga och reproducerbara EO-dataprodukter. Dessa framstegökar tilliten till EO-härledda produkter i praktiska tillämpningar och stödjerhållbar stadsplanering, klimatanpassning samt uppföljning av de Globalamålen för hållbar utveckling (SDG:erna).

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2025. , p. 106
Series
TRITA-ABE-DLT ; 2540
Keywords [en]
2D mapping, 3D mapping, Floods, Building Height, Biomass, Canopy Height, Uncertainty Estimation, Segmentation, Change Detection, Regression, Gaussian, Quantile, Unsupervised Learning, Contrastive Learning, Multi-task Learning, Self-Attention, Convolutional LSTM, VAE, CNN, transformer, SWIN, Remote Sensing, Sentinel-1 SAR, Sentinel-2 MSI, Aerial Orthophotos, DEM, Data Fusion, Time Series, Deep Learning, Generalization, OOD
National Category
Computer Sciences Earth Observation
Research subject
Geodesy and Geoinformatics, Geoinformatics
Identifiers
URN: urn:nbn:se:kth:diva-371709ISBN: 978-91-8106-444-5 (print)OAI: oai:DiVA.org:kth-371709DiVA, id: diva2:2007005
Public defence
2025-11-04, Kollegiesalen, Brinellvägen 8, KTH Campus, public video conference link https://kth-se.zoom.us/j/68698558153, Stockholm, 14:00 (English)
Opponent
Supervisors
Projects
AI4EO, Digital Future
Note

QC 20251017

Available from: 2025-10-17 Created: 2025-10-16 Last updated: 2025-11-03Bibliographically approved
List of papers
1. Deep attentive fusion network for flood detection on uni-temporal Sentinel-1 data
Open this publication in new window or tab >>Deep attentive fusion network for flood detection on uni-temporal Sentinel-1 data
2022 (English)In: Frontiers in Remote Sensing, E-ISSN 2673-6187, Vol. 3Article in journal (Refereed) Published
Abstract [en]

Floods are occurring across the globe, and due to climate change, flood events are expected to increase in the coming years. Current situations urge more focus on efficient monitoring of floods and detecting impacted areas. In this study, we propose two segmentation networks for flood detection on uni-temporal Sentinel-1 Synthetic Aperture Radar data. The first network is “Attentive U-Net”. It takes VV, VH, and the ratio VV/VH as input. The network uses spatial and channel-wise attention to enhance feature maps which help in learning better segmentation. “Attentive U-Net” yields 67% Intersection Over Union (IoU) on the Sen1Floods11 dataset, which is 3% better than the benchmark IoU. The second proposed network is a dual-stream “Fusion network”, where we fuse global low-resolution elevation data and permanent water masks with Sentinel-1 (VV, VH) data. Compared to the previous benchmark on the Sen1Floods11 dataset, our fusion network gave a 4.5% better IoU score. Quantitatively, the performance improvement of both proposed methods is considerable. The quantitative comparison with the benchmark method demonstrates the potential of our proposed flood detection networks. The results are further validated by qualitative analysis, in which we demonstrate that the addition of a low-resolution elevation and a permanent water mask enhances the flood detection results. Through ablation experiments and analysis we also demonstrate the effectiveness of various design choices in proposed networks.

Place, publisher, year, edition, pages
Frontiers Media SA, 2022
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-338766 (URN)10.3389/frsen.2022.1060144 (DOI)001063268700001 ()2-s2.0-85163985571 (Scopus ID)
Note

QC 20231025

Available from: 2023-10-25 Created: 2023-10-25 Last updated: 2025-10-16Bibliographically approved
2. Unsupervised Flood Detection on SAR Time Series using Variational Autoencoder
Open this publication in new window or tab >>Unsupervised Flood Detection on SAR Time Series using Variational Autoencoder
2024 (English)In: International Journal of Applied Earth Observation and Geoinformation, ISSN 1569-8432, E-ISSN 1872-826X, Vol. 126, article id 103635Article in journal (Other academic) Published
Abstract [en]

In this study, we propose a novel unsupervised Change Detection (CD) model to detect flood extent using Synthetic Aperture Radar~(SAR) time series data. The proposed model is based on a spatiotemporal variational autoencoder, trained with reconstruction, and contrastive learning techniques. The change maps are generated with a proposed novel algorithm that utilizes differences in latent feature distributions between pre-flood and post-flood data. The model is evaluated on nine different flood events by comparing the results with reference flood maps collected from the Copernicus Emergency Management Services (CEMS) and Sen1Floods11 dataset. We conducted a range of experiments and ablation studies to investigate the performance of our model. We compared the results with existing unsupervised models. The model achieved an average of 70\% Intersection over Union (IoU) score which is at least 7\% better than the IoU from existing unsupervised CD models. In the generalizability test, the proposed model outperformed supervised models ADS-Net (by 10\% IoU) and DAUSAR (by 8\% IoU), both trained on Sen1Floods11 and tested on CEMS sites.

Place, publisher, year, edition, pages
Elsevier BV, 2024
National Category
Earth Observation
Identifiers
urn:nbn:se:kth:diva-338773 (URN)10.1016/j.jag.2023.103635 (DOI)001143611500001 ()2-s2.0-85181026128 (Scopus ID)
Note

QC 20251029

Available from: 2023-10-25 Created: 2023-10-25 Last updated: 2025-10-29Bibliographically approved
3. How high are we? Large-scale building height estimation at 10 m using Sentinel-1 SAR and Sentinel-2 MSI time series
Open this publication in new window or tab >>How high are we? Large-scale building height estimation at 10 m using Sentinel-1 SAR and Sentinel-2 MSI time series
2025 (English)In: Remote Sensing of Environment, ISSN 0034-4257, E-ISSN 1879-0704, Vol. 318, article id 114556Article in journal (Refereed) Published
Abstract [en]

Accurate building height estimation is essential to support urbanization monitoring, environmental impact analysis and sustainable urban planning. However, conducting large-scale building height estimation remains a significant challenge. While deep learning (DL) has proven effective for large-scale mapping tasks, there is a lack of advanced DL models specifically tailored for height estimation, particularly when using open-source Earth observation data. In this study, we propose T-SwinUNet, an advanced DL model for large-scale building height estimation leveraging Sentinel-1 SAR and Sentinel-2 multispectral time series. T-SwinUNet model contains a feature extractor with local/global feature comprehension capabilities, a temporal attention module to learn the correlation between constant and variable features of building objects over time and an efficient multitask decoder to predict building height at 10 m spatial resolution. The model is trained and evaluated on data from the Netherlands, Switzerland, Estonia, and Germany, and its generalizability is evaluated on an out-of-distribution (OOD) test set from ten additional cities from other European countries. Our study incorporates extensive model evaluations, ablation experiments, and comparisons with established models. T-SwinUNet predicts building height with a Root Mean Square Error (RMSE) of 1.89 m, outperforming state-of-the-art models at 10 m spatial resolution. Its strong generalization to the OOD test set (RMSE of 3.2 m) underscores its potential for low-cost building height estimation across Europe, with future scalability to other regions. Furthermore, the assessment at 100 m resolution reveals that T-SwinUNet (0.29 m RMSE, 0.75 R2) also outperformed the global building height product GHSL-Built-H R2023A product(0.56 m RMSE and 0.37 R2). Our implementation is available at: https://github.com/RituYadav92/Building-Height-Estimation.

Place, publisher, year, edition, pages
Elsevier BV, 2025
Keywords
Building height estimation, Multitask learning, Out-of-distribution generalization, Regression, Sentinel, Time series
National Category
Earth Observation
Identifiers
urn:nbn:se:kth:diva-358166 (URN)10.1016/j.rse.2024.114556 (DOI)001413894800001 ()2-s2.0-85212150378 (Scopus ID)
Note

QC 20250217

Available from: 2025-01-07 Created: 2025-01-07 Last updated: 2025-10-16Bibliographically approved
4. A Multi-Modal, Multi-Temporal, Multi-Resolution Benchmark Dataset for Building Height Estimation
Open this publication in new window or tab >>A Multi-Modal, Multi-Temporal, Multi-Resolution Benchmark Dataset for Building Height Estimation
2025 (English)In: Article in journal (Other academic) Accepted
National Category
Earth Observation Computer Sciences
Identifiers
urn:nbn:se:kth:diva-371708 (URN)
Note

Accepted by Scientific Data (Nature Publishing Group) ISSN  2052-4463

QC 20251020

Available from: 2025-10-16 Created: 2025-10-16 Last updated: 2025-10-20Bibliographically approved
5. BioMassters: A Benchmark Dataset for Forest Biomass Estimation using Multi-modal Satellite Time Series
Open this publication in new window or tab >>BioMassters: A Benchmark Dataset for Forest Biomass Estimation using Multi-modal Satellite Time Series
Show others...
2023 (English)In: Advances in Neural Information Processing Systems 36 - 37th Conference on Neural Information Processing Systems, NeurIPS 2023, Neural Information Processing Systems Foundation , 2023Conference paper, Published paper (Refereed)
Abstract [en]

Above Ground Biomass is an important variable as forests play a crucial role in mitigating climate change as they act as an efficient, natural and cost-effective carbon sink. Traditional field and airborne LiDAR measurements have been proven to provide reliable estimations of forest biomass. Nevertheless, the use of these techniques at a large scale can be challenging and expensive. Satellite data have been widely used as a valuable tool in estimating biomass on a global scale. However, the full potential of dense multi-modal satellite time series data, in combination with modern Deep Learning (DL) approaches, has yet to be fully explored. The aim of the "BioMassters" data challenge and benchmark dataset is to investigate the potential of multi-modal satellite data (Sentinel-1 SAR and Sentinel-2 MSI) to estimate forest biomass at a large scale using the Finnish Forest Centre's open forest and nature airborne LiDAR data as a reference. The performance of the top three baseline models shows the potential of DL to produce accurate and higher-resolution biomass maps. The dataset and the code are available on the project website: https://nascetti-a.github.io/BioMasster/.

Place, publisher, year, edition, pages
Neural Information Processing Systems Foundation, 2023
National Category
Forest Science Earth Observation
Identifiers
urn:nbn:se:kth:diva-346139 (URN)2-s2.0-85191176383 (Scopus ID)
Conference
37th Conference on Neural Information Processing Systems, NeurIPS 2023, Dec 10-16 2023, New Orleans, United States of America,
Note

QC 20240506

Available from: 2024-05-03 Created: 2024-05-03 Last updated: 2025-10-16Bibliographically approved
6. Uncertainty Quantification for Building Height, Tree canopy Height and Above-ground Biomass Estimation
Open this publication in new window or tab >>Uncertainty Quantification for Building Height, Tree canopy Height and Above-ground Biomass Estimation
(English)In: Article in journal (Other academic) Submitted
National Category
Earth Observation Computer Sciences
Identifiers
urn:nbn:se:kth:diva-371706 (URN)
Note

Submitted to Remote Sensing of Environment ISSN 0034-4257 EISSN 1879-0704

QC 20251020

Available from: 2025-10-16 Created: 2025-10-16 Last updated: 2025-10-20Bibliographically approved

Open Access in DiVA

summary(31011 kB)207 downloads
File information
File name FULLTEXT04.pdfFile size 31011 kBChecksum SHA-512
f2a287705454a511f1f197bc95944e9c6c66b9eb18ee28947c588255fc9de5cf0eff47556a90946ee3afdd5decfeb47ddaf1fd4e3fde6c469dcef40d92298fe4
Type summaryMimetype application/pdf

Authority records

Yadav, Ritu

Search in DiVA

By author/editor
Yadav, Ritu
By organisation
Geoinformatics
Computer SciencesEarth Observation

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 640 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf