Foundation Models Applied to Earth Observation
2025 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE credits
Student thesisAlternative title
Foundation Models tillämpade på jordobservation (Swedish)
Abstract [en]
Satellite image analysis plays a crucial role in various fields, including environmental monitoring, disaster management, and urban planning. How- ever, training deep learning models for specific tasks such as land cover classification, semantic segmentation, and object detection typically requires substantial amounts of annotated data, which are costly and laborious to acquire in the remote sensing domain. Foundation models, pre-trained on massive datasets using self-supervised learning, offer a promising solution by learning general-purpose visual representations that can be efficiently adapted to diverse downstream tasks with reduced annotation requirements. This thesis evaluates the performance of vision foundation models in satellite imagery by comparing generalist models pre-trained on natural images (e.g., DINOv2) with domain-specific models specialized in satellite data (e.g., DOFA) and traditional task-specific architectures (e.g., ResNet, U-Net). The evaluation spans three key tasks: land classification, semantic segmentation, and object detection. For each task, performance was assessed using standard metrics such as accuracy and mean Intersection over Union (mIoU). The results show that generalist foundation models often match or surpass traditional models on RGB datasets, while domain-specific models excel on multispectral data thanks to specialized architectures. The findings highlights the potential of foundation models for satellite imagery analysis, while underscoring certain limitations, such as the detection of small objects and generalization across varied multispectral contexts. This work opens avenues for improving the adaptation of foundation models to the specific characteristics of satellite imagery, with potential applications in fields such as natural hazard management and environmental change detection.
Abstract [sv]
Satellitbilder används inom områden som miljöövervakning, katastrofhan- tering och stadsplanering. Traditionella modeller kräver ofta stora mängder märkta data och omfattande träning för varje specifik uppgift, vilket är resurskrävande. Denna studie undersöker användningen av grundmodeller (foundation models) för satellitbildsanalys. Grundmodeller är stora förtränade neurala nätverk som kan extrahera generella egenskaper från bilder och därmed anpassas till flera uppgifter med minimal finjustering. Studien visar att grundmodeller, trots att de är tränade på vanliga bilder, kan prestera i nivå med eller bättre än traditionella metoder vid klassificering, segmentering och objektdetektion. Särskilt specialiserade grundmodeller för fjärranalys visar goda resultat på multispektrala data. Resultaten visar att grundmodeller har stor potential att effektivisera satellitbildsanalys och konkurrera med traditionella, fullt övervakade metoder.
Place, publisher, year, edition, pages
2025. , p. 55
Series
TRITA-EECS-EX ; 2025:74
Keywords [en]
Foundation Models, Deep Learning, Computer Vision, Earth Observation, Multispectral Data, Vision Transformer
Keywords [sv]
Foundation Models, Djupinlärning, Datorseende, Jordobservation, Multi- spektrala data, Vision Transformer
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
URN: urn:nbn:se:kth:diva-362116OAI: oai:DiVA.org:kth-362116DiVA, id: diva2:1950547
External cooperation
Thales Services Numeriques
Supervisors
Examiners
2025-04-232025-04-082025-04-23Bibliographically approved