kth.sePublications KTH
12342 of 4
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Diffusion-Based Learning and Foundation Model Adaptation for Robust Dense Prediction in Earth Observation
KTH, School of Architecture and the Built Environment (ABE), Urban Planning and Environment, Geoinformatics.ORCID iD: 0009-0001-0794-6443
2026 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

Dense prediction tasks such as semantic segmentation, change detection, and wildfire burned-area mapping are central to Earth observation, yet deep learning models trained for these tasks frequently degrade under the geographic, temporal, and spatiotemporal distribution shifts encountered in real-world deployment. This thesis investigates how diffusion-based learning and parameter-efficient adaptation can improve the robustness and generalization of dense prediction models for Earth observation, with a particular focus on wildfire monitoring using Sentinel-2 imagery.

Three complementary studies are presented in this thesis. The first introduces Noise2Map, a discriminative diffusion model that repurposes structured noise as a supervisory signal for semantic segmentation and change detection. Unlike prior diffusion approaches that require iterative sampling, Noise2Map performs single-pass inference while achieving rank-1 performance across three benchmarks and being 13.5× faster than the closest diffusion baseline. The second study proposes a diffusion-based decoder that operates in the representation space of frozen geospatial foundation models (GFMs) to improve zero-shot generalization for wildfire burned-area mapping. The diffusion decoder improves performance in 14 out of 16 backbone–protocol–region combinations, with gains of up to +4.8 F1, and extends to out-of-distribution European wildfires not seen during training. The third study systematically evaluates adaptation strategies for GFMs (full fine-tuning, decoder-only fine-tuning, and Low-Rank Adaptation (LoRA)) for large-scale wildfire mapping across North America. LoRA consistently outperforms all alternatives, improving IoU by up to +9.35 over full fine-tuning for Prithvi-v2, while keeping more than 99% of backbone parameters frozen.

Together, these studies show that constraining how models learn, through structured noise, frozen encoders, or low-rank updates, generalizes better than training more parameters. Diffusion-based learning and parameter-efficient adaptation offer practical, complementary paths toward robust Earth observation.

Abstract [sv]

Täta prediktionsuppgifter såsom semantisk segmentering, förändringsdetektion ochkartläggning av brända områden vid skogsbränder är centrala inom jordobservation. Samtidigt försämras ofta prestandan hos djupinlärningsmodeller som tränats för dessauppgifter när de möter geografiska, temporala eller spatio-temporala distributionsskiftsom uppstår i verkliga tillämpningar. Denna avhandling undersöker hur diffusionsbaserat lärande och parameter-effektiv anpassning kan förbättra robusthet och generaliseringsförmåga hos modeller för tät prediktion inom jordobservation, med särskiltfokus på övervakning av skogsbränder med hjälp av Sentinel-2-satellitbilder.

Tre kompletterande studier presenteras i denna avhandling. Den första introducerarNoise2Map, en diskriminativ diffusionsmodell som återanvänder strukturerat brus somen övervakningssignal för semantisk segmentering och förändringsdetektion. Till skillnad från tidigare diffusionsmetoder som kräver iterativ sampling utför Noise2Map inferens i ett enda steg, samtidigt som modellen uppnår bästa resultat (rank-1) på trebenchmark-dataset och är 13,5 gånger snabbare än den närmaste diffusionsbaseradebaslinjen. Den andra studien föreslår en diffusionsbaserad dekoder som arbetar i representationsrymden hos frysta geospatiala foundation-modeller (GFMs) för att förbättra nollskotts-generaliseringsförmågan vid kartläggning av brända områden efterskogsbränder. Diffusionsdekodern förbättrar resultaten i 14 av 16 kombinationer avbackbone, protokoll och region, med förbättringar på upp till +4,8 i F1-mått, och generaliserar även till europeiska skogsbränder utanför träningsdistributionen. Den tredje studien utvärderar systematiskt olika anpassningsstrategier för GFMs (full finjustering, finjustering endast av dekodern samt Low-Rank Adaptation (LoRA)) för storskalig kartläggning av skogsbränder i Nordamerika. LoRA överträffar konsekvent alla alternativa metoder och förbättrar IoU med upp till +9,35 jämfört med full finjusteringför Prithvi-v2, samtidigt som mer än 99 % av backbone-parametrarna förblir frysta.

Tillsammans visar dessa studier att begränsningar i hur modeller lär sig – genom strukturerat brus, frysta enkodrar eller låg-rank-uppdateringar – kan ge bättre generalisering än att träna fler parametrar. Diffusionsbaserat lärande och parameter-effektiv anpassning erbjuder därmed praktiska och kompletterande vägar mot mer robusta mo-deller för jordobservation.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2026. , p. 66
Series
TRITA-ABE-DLT ; 267
Keywords [en]
Remote sensing, Earth observation, deep learning, semantic segmentation, change detection, building damage detection, wildfire burned-area mapping, foundation models, diffusion models, domain shift
Keywords [sv]
Fjärranalys, jordobservation, djupinlärning, semantisk segmentering, förändringsdetektion, detektering av byggnadsskador, kartläggning av brända områden efter skogsbränder, foundation-modeller, diffusionsmodeller, distributionsskift
National Category
Computer Vision and Learning Systems Computer and Information Sciences Earth Observation
Research subject
Geodesy and Geoinformatics, Geoinformatics
Identifiers
URN: urn:nbn:se:kth:diva-380247ISBN: 978-91-8106-594-7 (print)OAI: oai:DiVA.org:kth-380247DiVA, id: diva2:2055926
Presentation
2026-05-11, D37, Lindstedtsvägen 5, KTH Campus, public video conference link https://kth-se.zoom.us/j/66145987135, Stockholm, 14:00 (English)
Opponent
Supervisors
Note

QC 20260427

Available from: 2026-04-27 Created: 2026-04-27 Last updated: 2026-04-28Bibliographically approved
List of papers
1. Noise2Map: End-to-End Diffusion Model for Semantic Segmentation and Change Detection
Open this publication in new window or tab >>Noise2Map: End-to-End Diffusion Model for Semantic Segmentation and Change Detection
(English)In: IEEE Transactions on Geoscience and Remote Sensing, ISSN 0196-2892, E-ISSN 1558-0644Article in journal (Refereed) Accepted
Abstract [en]

Semantic segmentation and change detection are two fundamental challenges in remote sensing, requiring models to capture either spatial semantics or temporal differences from satellite imagery. Existing deep learning models often struggle with temporal inconsistencies or in capturing fine-grained spatial structures, require extensive pretraining, and offer limited interpretability—especially in real-world remote sensing scenarios. Recent advances in diffusion models show that Gaussian noise can be systematically leveraged to learn expressive data representations through denoising. Motivated by this, we investigate whether the noise process in diffusion models can be effectively utilized for discriminative tasks. We propose Noise2Map, a unified diffusion-based framework that repurposes the denoising process for fast, end-to-end discriminative learning. Unlike prior work that uses diffusion only for generation or feature extraction, Noise2Map directly predicts semantic or change maps using task-specific noise schedules and timestep conditioning, avoiding the costly sampling procedures of traditional diffusion models. The model is pretrained via self-supervised denoising and fine-tuned with supervision, enabling both interpretability and robustness. Our architecture supports both tasks (SS and CD) through a shared backbone and task-specific noise schedulers. Extensive evaluations on the SpaceNet7, WHU, and xView2 buildings damaged by wildfires datasets demonstrate that Noise2Map ranks on average 1st among seven models on semantic segmentation and 1st on change detection by a cross-dataset rank metric (average F1 primary, IoU tie-break), while being 13× faster and 3× smaller than the generative diffusion baseline (DDPM-CD) due to its single-step discriminative inference. Ablation studies highlight the robustness of our model against different training noise schedulers and timestep control in the diffusion process, as well as the ability of the model to perform multi-task learning.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE)
Keywords
Diffusion Models, Change Detection, Semantic Segmentation, Remote Sensing, Deep Learning
National Category
Computer graphics and computer vision
Research subject
Geodesy and Geoinformatics, Geoinformatics; Computer Science
Identifiers
urn:nbn:se:kth:diva-380244 (URN)10.1109/TGRS.2026.3687393 (DOI)
Projects
EO-AI4GlobalChange
Note

QC 20260428

Available from: 2026-04-27 Created: 2026-04-27 Last updated: 2026-04-28
2. Zero-Shot Generalization of Geospatial Foundation Models for Wildfire Mapping Using Diffusion Decoder
Open this publication in new window or tab >>Zero-Shot Generalization of Geospatial Foundation Models for Wildfire Mapping Using Diffusion Decoder
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Accurate and timely burned-area mapping is essen- tial for emergency response, damage assessment, and post-fire recovery. However, strong spatial and temporal domain shifts - driven by variations in fire behavior, land cover, climate, and observation conditions - pose major challenges to generaliza- tion. Consequently, models trained on historical wildfire data often fail to transfer reliably to new regions or future fire seasons. Recently, geospatial foundation models (GFMs) trained on large-scale datasets have emerged as a promising direction for generalization in remote sensing. However, despite their scale and strong in-distribution performance, GFMs can still suffer from performance degradation under domain shifts. To this end, we propose a diffusion decoder that can be integrated with GFM encoders to enhance their cross-domain generalization. The decoder leverages diffusion noise as an implicit regularizer by training over multiple intermediate timesteps along the diffusion trajectory between pre- and post-fire imagery, encouraging the learning of representations that are robust to perturbations and transferable to unseen domains. We design three zero-shot protocols capturing geographic, temporal, and spatiotemporal shifts for wildfire Sentinel-2 imagery across the United States and Canada from 2017–2023. When integrated with three leading GFMs (TerraMind, Clay-v1.5, and Prithvi-v2), the performances improve by up to +4.8 F1 and +6.7 IoU points. Prithvi-v2 achieves the strongest absolute performance, while Clay-v1.5 exhibits the largest relative gains. Overall, our findings demonstrate that our diffusion decoder strengthens the generalization and zero-shot ability of GFMs under diverse environmental conditions.

Keywords
Diffusion Models, Change Detection, Semantic Segmentation, Remote Sensing, Deep Learning
National Category
Computer and Information Sciences Computer Vision and Learning Systems
Identifiers
urn:nbn:se:kth:diva-380246 (URN)
Note

Submitted to IEEE Transactions on Geoscience and Remote Sensing, ISSN 0196-2892, EISSN 1558-0644

QC 20260427

Available from: 2026-04-27 Created: 2026-04-27 Last updated: 2026-04-27Bibliographically approved
3. Low-rank adaptation of geospatial foundation models for wildfire mapping using sentinel-2 data
Open this publication in new window or tab >>Low-rank adaptation of geospatial foundation models for wildfire mapping using sentinel-2 data
2026 (English)Manuscript (preprint) (Other academic)
Abstract [en]

Wildfire burned-area mapping is essential for damage assessment, emissions modeling, and understanding fire–climate interactions across diverse ecological regions. Recent geospatial foundation models provide strong general-purpose representations for satellite imagery, yet there is still no clear understanding of how to efficiently adapt these models for downstream Earth observation tasks, particularly under geographic and temporal domain shift. This study evaluates three state-of-the-art Geospatial Foundation Models (GFMs) - Terramind, DINOv3, and Prithvi-v2 - for burned-area mapping across the United States and Canada using Sentinel-2 data. Leveraging 3,820 wildfire events from 2017–2023, we conduct spatial and temporal generalization tests across diverse biomes. We systematically compare full fine-tuning, decoder-only fine-tuning, and Low-Rank Adaptation (LoRA) for adapting each model. Across all experiments, LoRA provides the strongest cross-domain generalization while updating less than 1% of parameters, demonstrating a favorable trade-off between accuracy and efficiency. Prithvi-v2 with LoRA achieves the highest overall accuracy and the larger improvement compare to full fine-tuning. These findings indicate that geospatial foundation models, when adapted using lightweight parameter-efficient methods such as LoRA, offer a robust and scalable solution for large-scale burned-area mapping. Code is available at https://github.com/alishibli97/wildfire-lora-gfm.

National Category
Earth Observation Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-380245 (URN)
Conference
IEEE International Geoscience and Remote Sensing Symposium
Note

Accepted by IEEE International Geoscience and Remote Sensing Symposium, Washington, D.C.,USA, 9-14 August 2026

QC 20260427

Available from: 2026-04-27 Created: 2026-04-27 Last updated: 2026-04-27Bibliographically approved

Open Access in DiVA

fulltext(36288 kB)51 downloads
File information
File name FULLTEXT01.pdfFile size 36288 kBChecksum SHA-512
4a83824ee54f96b6d953193a5c62b6da44eff5fbdfba5e5277e8d3d64ddf2418f9cebf163ae97db8a173a8dcde1fe8f3f86518bd93fb565d7b49104ed6ccdbf2
Type fulltextMimetype application/pdf

Authority records

Shibli, Ali

Search in DiVA

By author/editor
Shibli, Ali
By organisation
Geoinformatics
Computer Vision and Learning SystemsComputer and Information SciencesEarth Observation

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 1278 hits
12342 of 4
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf