kth.sePublications KTH
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Advancing Geospatial Foundation Models: Generative Representations and Global Benchmarking
KTH, School of Architecture and the Built Environment (ABE), Urban Planning and Environment, Geoinformatics. KU Leuven.ORCID iD: 0009-0008-1174-1054
2026 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

The explosion of Earth Observation (EO) data has driven the rapid development of Geospatial Foundation Models (GFMs) trained via self-supervised learning (SSL). While current discriminative SSL paradigms, such as contrastive learning and masked image modeling, have achieved notable success, they often struggle to capture the fine-grained spatial details and multi-scale complexities inherent in satellite imagery. Furthermore, the rapid architectural advancement of these models has outpaced the methodology used to evaluate them. Existing benchmarks frequently suffer from geographic biases, lack multi-modal and multi-temporal diversity, and rely on overly simplistic image-level classification tasks, thereby obscuring the true real-world capabilities and vulnerabilities of modern GFMs.

To address these representational limitations, this thesis first investigates the untapped potential of generative diffusion models for discriminative representation learning. We introduce SatDiFuser, a novel framework that repurposes a large-scale, pre-trained latent diffusion model for dense remote sensing tasks. By extracting multi-scale, multi-timestep features from the iterative denoising process and systematically aggregating them through advanced fusion strategies—including Global Weighting, Localized Weighting, and a Mixture of Experts (MoE) mechanism—SatDiFuser successfully transforms generative spatial priors into robust discriminative features, demonstrating superior performance on standard geospatial benchmarks.

While exploring novel representation architectures is crucial, accurately assessing the rapidly expanding GFM landscape requires overcoming a second fundamental challenge: the inadequacy of current evaluation protocols. To address this critical evaluation gap, this thesis subsequently introduces PANGAEA: a globally inclusive, standardized benchmark. Encompassing 11 diverse datasets across five critical domains (Urban, Agriculture, Disaster, Marine, and Forestry), PANGAEA evaluates models exclusively on complex, dense pixel-wise tasks while accounting for varying spatial resolutions, multi-modality (Optical and SAR), and multi-temporal dynamics. Extensive benchmarking of representative GFMs reveals critical insights into their generalization capabilities, robustness under data scarcity, and current limitations in multi-sensor fusion. Ultimately, this thesis bridges the gap between generative representation learning and rigorous global evaluation, laying a robust foundation for the development and assessment of the next generation of Geospatial Foundation Models.

Abstract [sv]

Explosionen av jordobservationsdata (Earth Observation, EO) har drivit på en snabb utveckling av geospatiala grundmodeller (Geospatial Foundation Models, GFMs) som tränats via självövervakad inlärning (Self-Supervised Learning, SSL). Även om nuvarande diskriminativa SSL-paradigm, såsom kontrastiv inlärning och maskerad bildmodellering (masked image modeling), har nått betydande framgångar, har de ofta svårt att fånga de finkorniga rumsliga detaljerna och den flerskaliga komplexiteten som är inneboende i satellitbilder. Dessutom har den snabba arkitektoniska utvecklingen av dessa modeller sprungit ifrån den metodik som används för att utvärdera dem. Existerande utvärderingsramverk (benchmarks) lider ofta av geografiska skevheter, saknar multimodal och multitemporal mångfald, och förlitar sig på alltför förenklade klassificeringsuppgifter på bildnivå. Detta döljer de moderna grundmodellernas sanna förmågor och sårbarheter i verkliga tillämpningar.

För att hantera dessa representationsbegränsningar undersöker denna avhandling först den outnyttjade potentialen hos generativa diffusionsmodeller för diskriminativ representationsinlärning. Vi introducerar SatDiFuser, ett nyskapande ramverk som återanvänder en storskalig, förtränad latent diffusionsmodell för täta fjärranalysuppgifter. Genom att extrahera särdrag över flera skalor och tidssteg från den iterativa brusreduceringsprocessen och systematiskt aggregera dem genom avancerade fusionsstrategier – inklusive global viktning, lokaliserad viktning och en Mixture of Experts (MoE)-mekanism – transformerar SatDiFuser framgångsrikt generativa rumsliga förkunskaper till robusta diskriminativa särdrag. Detta demonstrerar överlägsen prestanda på standardiserade geospatiala riktmärken.

Även om utforskandet av nya representationsarkitekturer är avgörande, kräver en noggrann bedömning av det snabbt växande GFM-landskapet att en andra fundamental utmaning övervinns: otillräckligheten i nuvarande utvärderingsprotokoll. För att överbrygga denna kritiska utvärderingsklyfta introducerar denna avhandling därefter PANGAEA: ett globalt inkluderande, standardiserat riktmärke (benchmark). PANGAEA omfattar 11 mångsidiga dataset tvärs över fem kritiska domäner (stadsmiljö, jordbruk, katastrofhantering, marin miljö och skogsbruk) och utvärderar uteslutande modeller på komplexa, täta pixelbaserade uppgifter, samtidigt som hänsyn tas till varierande rumslig upplösning, multimodalitet (optisk och SAR) samt multitemporal dynamik. Omfattande prestandamätningar av representativa GFM:er avslöjar kritiska insikter i deras generaliseringsförmåga, robusthet vid databrist och nuvarande begränsningar i multisensorfusion. I slutändan överbryggar denna avhandling klyftan mellan generativ representationsinlärning och rigorös global utvärdering, vilket lägger en robust grund för utveckling och bedömning av nästa generation av geospatiala grundmodeller.

Place, publisher, year, edition, pages
Stockholm, Sweden: KTH Royal Institute of Technology, 2026. , p. 70
Series
TRITA-ABE-DLT ; 2623
Keywords [en]
Earth Observation, Geospatial Foundation Models, Self-Supervised Learning, Generative Diffusion Models, Representation Learning, Deep Learning, Remote Sensing Benchmark, Multi-modal Data, Feature Fusion
Keywords [sv]
Jordobservation, Geospatiala grundmodeller, Självövervakad inlärning, Generativa diffusionsmodeller, Representationsinlärning, Djupinlärning, Riktmärke för fjärranalys, Multimodala data, Särdragsfusion
National Category
Earth and Related Environmental Sciences Computer and Information Sciences
Research subject
Geodesy and Geoinformatics, Geoinformatics
Identifiers
URN: urn:nbn:se:kth:diva-382116ISBN: 978-91-8106-657-9 (print)OAI: oai:DiVA.org:kth-382116DiVA, id: diva2:2061653
Presentation
2026-06-09, E32, Lindstedtvägen 3, KTH Campus, public video conference link https://kth-se.zoom.us/j/67811409576, Stockholm, 10:00 (English)
Opponent
Supervisors
Note

QC 20260528

Available from: 2026-05-28 Created: 2026-05-21 Last updated: 2026-06-04Bibliographically approved
List of papers
1. Can Generative Geospatial Diffusion Models Excel as Discriminative Geospatial Foundation Models?
Open this publication in new window or tab >>Can Generative Geospatial Diffusion Models Excel as Discriminative Geospatial Foundation Models?
Show others...
2025 (English)Conference paper, Published paper (Refereed)
National Category
Natural Sciences Computer and Information Sciences Earth and Related Environmental Sciences
Identifiers
urn:nbn:se:kth:diva-382112 (URN)
Conference
International Conference on Computer Vision (ICCV), Hawai'i Convention Center Oct 19 – 23th, 2025, Honolulu, Hawai'i
Note

QC 20260522

Available from: 2026-05-21 Created: 2026-05-21 Last updated: 2026-05-22Bibliographically approved
2. PANGAEA: Assessing Geospatial Foundation Models Capabilities through a Global and Inclusive Benchmark
Open this publication in new window or tab >>PANGAEA: Assessing Geospatial Foundation Models Capabilities through a Global and Inclusive Benchmark
Show others...
2025 (English)In: IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE, ISSN 2473-2397, p. 2-43Article in journal (Refereed) Published
Abstract [en]

Geospatial foundation models (GFMs) have emerged as powerful tools for extracting representations from Earth observation (EO) data, but their evaluation remains inconsistent and narrow. Existing works often evaluate on suboptimal downstream datasets and tasks that are often too easy or too narrow, limiting the usefulness of the evaluations to assess the real-world applicability of GFMs. Additionally, there is a distinct lack of diversity in current evaluation protocols, which fail to account for the multiplicity of image resolutions, sensor types, and temporalities, which further complicates the assessment of GFM performance. In particular, most existing benchmarks are geographically biased toward North America and Europe, questioning the global applicability of GFMs. To overcome these challenges, we introduce PANGAEA, a standardized evaluation protocol that covers a diverse set of datasets, tasks, resolutions, sensor modalities, and temporalities. It establishes a robust and widely applicable benchmark for GFMs. We evaluate the most popular GFMs openly available on this benchmark and analyze their performance across several domains. In particular, we compare these models to supervised baselines [e.g., UNet and vanilla vision-Transformer (ViT)] and assess their effectiveness when faced with limited labeled data. Our findings highlight the limitations of GFMs under different scenarios, showing that they do not consistently outperform supervised models. PANGAEA is designed to be highly extensible, allowing for the seamless inclusion of new datasets, models, and tasks in future research. By releasing the evaluation code and benchmark, we aim to enable other researchers to replicate our experiments and build upon our work, fostering a more principled evaluation protocol for large pretrained geospatial models. The code is available at https://github.com/VMarsocci/pangaea-bench.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2025
Keywords
Benchmark testing, Protocols, Geospatial analysis, Spatial resolution, Semantic segmentation, Object detection, Data models, Crops, Computational modeling, Time series analysis
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-376687 (URN)10.1109/MGRS.2025.3628194 (DOI)001632047700001 ()2-s2.0-105023868408 (Scopus ID)
Note

QC 20260218

Available from: 2026-02-18 Created: 2026-02-18 Last updated: 2026-05-28Bibliographically approved

Open Access in DiVA

fulltext(9914 kB)96 downloads
File information
File name FULLTEXT02.pdfFile size 9914 kBChecksum SHA-512
1a3c24f6ab1a471f6634469886122a85a97dde0040a171439c24149488ad7231b68bc0c1381b1e6045ee0ccdf2324c55da0cb7e5d590d27d9cf22289ad23b540
Type fulltextMimetype application/pdf

Authority records

Jia, Yuru

Search in DiVA

By author/editor
Jia, Yuru
By organisation
Geoinformatics
Earth and Related Environmental SciencesComputer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 96 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 416 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf