kth.sePublications KTH
Change search
Link to record
Permanent link

Direct link
Publications (10 of 63) Show all publications
Hafner, S., Fang, H., Azizpour, H. & Ban, Y. (2025). Continuous Urban Change Detection from Satellite Image Time Series with Temporal Feature Refinement and Multi-Task Integration. IEEE Transactions on Geoscience and Remote Sensing, 63, 1-18
Open this publication in new window or tab >>Continuous Urban Change Detection from Satellite Image Time Series with Temporal Feature Refinement and Multi-Task Integration
2025 (English)In: IEEE Transactions on Geoscience and Remote Sensing, ISSN 0196-2892, E-ISSN 1558-0644, Vol. 63, p. 1-18Article in journal (Refereed) Published
Abstract [en]

Urbanization advances at unprecedented rates, leading to negative environmental and societal impacts. Remote sensing can help mitigate these effects by supporting sustainable development strategies with accurate information on urban growth. Deep learning-based methods have achieved promising urban change detection results from optical satellite image pairs using convolutional neural networks (ConvNets), transformers, and a multi-task learning setup. However, bi-temporal methods are limited for continuous urban change detection, i.e., the detection of changes in consecutive image pairs of satellite image time series (SITS), as they fail to fully exploit multi-temporal data (> 2 images). Existing multi-temporal change detection methods, on the other hand, collapse the temporal dimension, restricting their ability to capture continuous urban changes. Additionally, multi-task learning methods lack integration approaches that combine change and segmentation outputs. To address these challenges, we propose a continuous urban change detection framework incorporating two key modules. The temporal feature refinement (TFR) module employs self-attention to improve ConvNet-based multi-temporal building representations. The temporal dimension is preserved in the TFR module, enabling the detection of continuous changes. The multi-task integration (MTI) module utilizes Markov networks to find an optimal building map time series based on segmentation and dense change outputs. The proposed framework effectively identifies urban changes based on high-resolution SITS acquired by the PlanetScope constellation (F1 score 0.551), Gaofen-2 (F1 score 0.440), and WorldView-2 (F1 score 0.543). Moreover, our experiments on three challenging datasets demonstrate the effectiveness of the proposed framework compared to bi-temporal and multi-temporal urban change detection and segmentation methods.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2025
Keywords
Earth observation, Multi-task learning, Multi-temporal, Remote sensing, Transformers
National Category
Earth Observation
Identifiers
urn:nbn:se:kth:diva-366565 (URN)10.1109/TGRS.2025.3578866 (DOI)001512531900009 ()2-s2.0-105007921391 (Scopus ID)
Note

QC 20250710

Available from: 2025-07-10 Created: 2025-07-10 Last updated: 2025-11-03Bibliographically approved
Guastoni, L., Geetha Balasubramanian, A., Foroozan, F., Güemes, A., Ianiro, A., Discetti, S., . . . Vinuesa, R. (2025). Fully convolutional networks for velocity-field predictions based on the wall heat flux in turbulent boundary layers. Theoretical and Computational Fluid Dynamics, 39(1), Article ID 13.
Open this publication in new window or tab >>Fully convolutional networks for velocity-field predictions based on the wall heat flux in turbulent boundary layers
Show others...
2025 (English)In: Theoretical and Computational Fluid Dynamics, ISSN 0935-4964, E-ISSN 1432-2250, Vol. 39, no 1, article id 13Article in journal (Refereed) Published
Abstract [en]

Fully-convolutional neural networks (FCN) were proven to be effective for predicting the instantaneous state of a fully-developed turbulent flow at different wall-normal locations using quantities measured at the wall. In Guastoni et al. (J Fluid Mech 928:A27, 2021. https://doi.org/10.1017/jfm.2021.812), we focused on wall-shear-stress distributions as input, which are difficult to measure in experiments. In order to overcome this limitation, we introduce a model that can take as input the heat-flux field at the wall from a passive scalar. Four different Prandtl numbers Pr=ν/α=(1,2,4,6) are considered (where ν is the kinematic viscosity and α is the thermal diffusivity of the scalar quantity). A turbulent boundary layer is simulated since accurate heat-flux measurements can be performed in experimental settings: first we train the network on aptly-modified DNS data and then we fine-tune it on the experimental data. Finally, we test our network on experimental data sampled in a water tunnel. These predictions represent the first application of transfer learning on experimental data of neural networks trained on simulations. This paves the way for the implementation of a non-intrusive sensing approach for the flow in practical applications.

Place, publisher, year, edition, pages
Springer Nature, 2025
Keywords
Machine learning, Turbulence simulation, Turbulent boundary layers
National Category
Fluid Mechanics
Identifiers
urn:nbn:se:kth:diva-358176 (URN)10.1007/s00162-024-00732-y (DOI)001378464000001 ()2-s2.0-85212435435 (Scopus ID)
Note

Not duplicate with DiVA 1756843

QC 20250114

Available from: 2025-01-07 Created: 2025-01-07 Last updated: 2025-02-09Bibliographically approved
Gutha, S. B., Vinuesa, R. & Azizpour, H. (2025). Inverse Problems with Diffusion Models: A MAP Estimation Perspective. In: Proceedings - 2025 IEEE Winter Conference on Applications of Computer Vision, WACV 2025: . Paper presented at 2025 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2025, Tucson, United States of America, Feb 28 2025 - Mar 4 2025 (pp. 4153-4162). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Inverse Problems with Diffusion Models: A MAP Estimation Perspective
2025 (English)In: Proceedings - 2025 IEEE Winter Conference on Applications of Computer Vision, WACV 2025, Institute of Electrical and Electronics Engineers (IEEE) , 2025, p. 4153-4162Conference paper, Published paper (Refereed)
Abstract [en]

Inverse problems have many applications in science and engineering. In Computer vision, several image restoration tasks such as inpainting, deblurring, and super-resolution can be formally modeled as inverse problems. Recently, methods have been developed for solving inverse problems that only leverage a pre-trained unconditional diffusion model and do not require additional task-specific training. In such methods, however, the inherent intractability of determining the conditional score function during the reverse diffusion process poses a real challenge, leaving the methods to settle with an approximation instead, which affects their performance in practice. Here, we propose a MAP estimation framework to model the reverse conditional generation process of a continuous time diffusion model as an optimization process of the underlying MAP objective, whose gradient term is tractable. In theory, the proposed framework can be applied to solve general inverse problems using gradient-based optimization methods. However, given the highly non-convex nature of the loss objective, finding a perfect gradient-based optimization algorithm can be quite challenging, nevertheless, our framework offers several potential research directions. We use our proposed formulation to develop empirically effective algorithms for image restoration. We validate our proposed algorithms with extensive experiments over multiple datasets across several restoration tasks.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2025
Keywords
conditional generation, consistency models, diffusion models, inverse problems, map estimation, optimization
National Category
Computational Mathematics Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-363209 (URN)10.1109/WACV61041.2025.00408 (DOI)2-s2.0-105003630084 (Scopus ID)
Conference
2025 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2025, Tucson, United States of America, Feb 28 2025 - Mar 4 2025
Note

Part of ISBN 9798331510831

QC 20250509

Available from: 2025-05-07 Created: 2025-05-07 Last updated: 2025-05-09Bibliographically approved
Mehrpanah, A., Englesson, E. & Azizpour, H. (2025). On Spectral Properties of Gradient-Based Explanation Methods. In: Computer Vision – ECCV 2024 - 18th European Conference, Proceedings: . Paper presented at 18th European Conference on Computer Vision, ECCV 2024, Milan, Italy, Sep 29 2024 - Oct 4 2024 (pp. 282-299). Springer Nature
Open this publication in new window or tab >>On Spectral Properties of Gradient-Based Explanation Methods
2025 (English)In: Computer Vision – ECCV 2024 - 18th European Conference, Proceedings, Springer Nature , 2025, p. 282-299Conference paper, Published paper (Refereed)
Abstract [en]

Understanding the behavior of deep networks is crucial to increase our confidence in their results. Despite an extensive body of work for explaining their predictions, researchers have faced reliability issues, which can be attributed to insufficient formalism. In our research, we adopt novel probabilistic and spectral perspectives to formally analyze explanation methods. Our study reveals a pervasive spectral bias stemming from the use of gradient, and sheds light on some common design choices that have been discovered experimentally, in particular, the use of squared gradient and input perturbation. We further characterize how the choice of perturbation hyperparameters in explanation methods, such as SmoothGrad, can lead to inconsistent explanations and introduce two remedies based on our proposed formalism: (i) a mechanism to determine a standard perturbation scale, and (ii) an aggregation method which we call SpectralLens. Finally, we substantiate our theoretical results through quantitative evaluations.

Place, publisher, year, edition, pages
Springer Nature, 2025
Keywords
Deep Neural Networks, Explainability, Gradient-based Explanation Methods, Probabilistic Machine Learning, Probabilistic Pixel Attribution Techniques, Spectral Analysis
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-357698 (URN)10.1007/978-3-031-73021-4_17 (DOI)001416940200017 ()2-s2.0-85210488897 (Scopus ID)
Conference
18th European Conference on Computer Vision, ECCV 2024, Milan, Italy, Sep 29 2024 - Oct 4 2024
Note

Part of ISBN 978-303173020-7

QC 20241213

Available from: 2024-12-12 Created: 2024-12-12 Last updated: 2025-03-17Bibliographically approved
Nilsson, A., Wijk, K., Gutha, S. b., Englesson, E., Hotti, A., Saccardi, C., . . . Azizpour, H. (2024). Indirectly Parameterized Concrete Autoencoders. In: International Conference on Machine Learning, ICML 2024: . Paper presented at 41st International Conference on Machine Learning, ICML 2024, Vienna, Austria, Jul 21 2024 - Jul 27 2024 (pp. 38237-38252). ML Research Press
Open this publication in new window or tab >>Indirectly Parameterized Concrete Autoencoders
Show others...
2024 (English)In: International Conference on Machine Learning, ICML 2024, ML Research Press , 2024, p. 38237-38252Conference paper, Published paper (Refereed)
Abstract [en]

Feature selection is a crucial task in settings where data is high-dimensional or acquiring the full set of features is costly. Recent developments in neural network-based embedded feature selection show promising results across a wide range of applications. Concrete Autoencoders (CAEs), considered state-of-the-art in embedded feature selection, may struggle to achieve stable joint optimization, hurting their training time and generalization. In this work, we identify that this instability is correlated with the CAE learning duplicate selections. To remedy this, we propose a simple and effective improvement: Indirectly Parameterized CAEs (IP-CAEs). IP-CAEs learn an embedding and a mapping from it to the Gumbel-Softmax distributions' parameters. Despite being simple to implement, IP-CAE exhibits significant and consistent improvements over CAE in both generalization and training time across several datasets for reconstruction and classification. Unlike CAE, IP-CAE effectively leverages non-linear relationships and does not require retraining the jointly optimized decoder. Furthermore, our approach is, in principle, generalizable to Gumbel-Softmax distributions beyond feature selection.

Place, publisher, year, edition, pages
ML Research Press, 2024
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-353956 (URN)2-s2.0-85203808876 (Scopus ID)
Conference
41st International Conference on Machine Learning, ICML 2024, Vienna, Austria, Jul 21 2024 - Jul 27 2024
Note

QC 20240926

Available from: 2024-09-25 Created: 2024-09-25 Last updated: 2024-09-26Bibliographically approved
Nilsson, A. & Azizpour, H. (2024). Regularizing and Interpreting Vision Transformers by Patch Selection on Echocardiography Data. In: Proceedings of the 5th Conference on Health, Inference, and Learning, CHIL 2024: . Paper presented at 5th Annual Conference on Health, Inference, and Learning, CHIL 2024, New York, United States of America, Jun 27 2024 - Jun 28 2024 (pp. 155-168). ML Research Press
Open this publication in new window or tab >>Regularizing and Interpreting Vision Transformers by Patch Selection on Echocardiography Data
2024 (English)In: Proceedings of the 5th Conference on Health, Inference, and Learning, CHIL 2024, ML Research Press , 2024, p. 155-168Conference paper, Published paper (Refereed)
Abstract [en]

This work introduces a novel approach to model regularization and explanation in Vision Transformers (ViTs), particularly beneficial for small-scale but high-dimensional data regimes, such as in healthcare. We introduce stochastic embedded feature selection in the context of echocardiography video analysis, specifically focusing on the EchoNet-Dynamic dataset for the prediction of Left Ventricular Ejection Fraction (LVEF). Our proposed method, termed Gumbel Video Vision-Transformers (G-ViTs), augments Video Vision-Transformers (V-ViTs), a performant transformer architecture for videos with Concrete Autoencoders (CAEs), a common dataset-level feature selection technique, to enhance V-ViT’s generalization and interpretability. The key contribution lies in the incorporation of stochastic token selection individually for each video frame during training. Such token selection regularizes the training of V-ViT, improves its interpretability, and is achieved by differentiable sampling of categoricals using the Gumbel-Softmax distribution. Our experiments on EchoNet-Dynamic demonstrate a consistent and notable regularization effect. The G-ViT model outperforms both a random selection baseline and standard V-ViT. The G-ViT is also compared against recent works on EchoNet-Dynamic where it exhibits state-of-the-art performance among end-to-end learned methods. Finally, we explore model ex-plainability by visualizing selected patches, providing insights into how the G-ViT utilizes regions known to be crucial for LVEF prediction for humans. This proposed approach, therefore, extends beyond regularization, offering enhanced interpretability for ViTs.

Place, publisher, year, edition, pages
ML Research Press, 2024
National Category
Computer graphics and computer vision Computer Sciences
Identifiers
urn:nbn:se:kth:diva-353944 (URN)2-s2.0-85203788338 (Scopus ID)
Conference
5th Annual Conference on Health, Inference, and Learning, CHIL 2024, New York, United States of America, Jun 27 2024 - Jun 28 2024
Note

QC 20240926

Available from: 2024-09-25 Created: 2024-09-25 Last updated: 2025-02-01Bibliographically approved
Nilsson, A. & Azizpour, H. (2024). Regularizing and Interpreting Vision Transformers by Patch Selection on Echocardiography Data. In: Pollard, T Choi, E Singhal, P Hughes, M Sizikova, E Mortazavi, B Chen, I Wang, F Sarker, T McDermott, M Ghassemi, M (Ed.), CONFERENCE ON HEALTH, INFERENCE, AND LEARNING: . Paper presented at 5th Annual Conference on Health, Inference, and Learning (CHIL), JUN 27-28, 2024, New York, NY (pp. 155-168). The Journal of Machine Learning Research (JMLR), 248
Open this publication in new window or tab >>Regularizing and Interpreting Vision Transformers by Patch Selection on Echocardiography Data
2024 (English)In: CONFERENCE ON HEALTH, INFERENCE, AND LEARNING / [ed] Pollard, T Choi, E Singhal, P Hughes, M Sizikova, E Mortazavi, B Chen, I Wang, F Sarker, T McDermott, M Ghassemi, M, The Journal of Machine Learning Research (JMLR) , 2024, Vol. 248, p. 155-168Conference paper, Published paper (Refereed)
Abstract [en]

This work introduces a novel approach to model regularization and explanation in Vision Transformers (ViTs), particularly beneficial for small-scale but high-dimensional data regimes, such as in healthcare. We introduce stochastic embedded feature selection in the context of echocardiography video analysis, specifically focusing on the EchoNet-Dynamic dataset for the prediction of Left Ventricular Ejection Fraction (LVEF). Our proposed method, termed Gumbel Video Vision-Transformers (G-ViTs), augments Video Vision-Transformers (V-ViTs), a performant transformer architecture for videos with Concrete Autoencoders (CAEs), a common dataset-level feature selection technique, to enhance V-ViT's generalization and interpretability. The key contribution lies in the incorporation of stochastic token selection individually for each video frame during training. Such token selection regularizes the training of V-ViT, improves its interpretability, and is achieved by differentiable sampling of categoricals using the Gumbel-Softmax distribution. Our experiments on EchoNet-Dynamic demonstrate a consistent and notable regularization effect. The G-ViT model outperforms both a random selection baseline and standard V-ViT. The G-ViT is also compared against recent works on EchoNet-Dynamic where it exhibits state-of-the-art performance among end-to-end learned methods. Finally, we explore model explainability by visualizing selected patches, providing insights into how the G-ViT utilizes regions known to be crucial for LVEF prediction for humans. This proposed approach, therefore, extends beyond regularization, offering enhanced interpretability for ViTs.

Place, publisher, year, edition, pages
The Journal of Machine Learning Research (JMLR), 2024
Series
Proceedings of Machine Learning Research, ISSN 2640-3498
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-357514 (URN)001347132700011 ()
Conference
5th Annual Conference on Health, Inference, and Learning (CHIL), JUN 27-28, 2024, New York, NY
Note

QC 20241209

Available from: 2024-12-09 Created: 2024-12-09 Last updated: 2025-02-07Bibliographically approved
Englesson, E. & Azizpour, H. (2024). Robust Classification via Regression for Learning with Noisy Labels. In: Proceedings ICLR 2024 - The Twelfth International Conference on Learning Representations: . Paper presented at ICLR 2024 - The Twelfth International Conference on Learning Representations, Messe Wien Exhibition and Congress Center, Vienna, Austria, May 7-11t, 2024 .
Open this publication in new window or tab >>Robust Classification via Regression for Learning with Noisy Labels
2024 (English)In: Proceedings ICLR 2024 - The Twelfth International Conference on Learning Representations, 2024Conference paper, Poster (with or without abstract) (Refereed)
Abstract [en]

Deep neural networks and large-scale datasets have revolutionized the field of machine learning. However, these large networks are susceptible to overfitting to label noise, resulting in reduced generalization. To address this challenge, two promising approaches have emerged: i) loss reweighting, which reduces the influence of noisy examples on the training loss, and ii) label correction that replaces noisy labels with estimated true labels. These directions have been pursued separately or combined as independent methods, lacking a unified approach. In this work, we present a unified method that seamlessly combines loss reweighting and label correction to enhance robustness against label noise in classification tasks. Specifically, by leveraging ideas from compositional data analysis in statistics, we frame the problem as a regression task, where loss reweighting and label correction can naturally be achieved with a shifted Gaussian label noise model. Our unified approach achieves strong performance compared to recent baselines on several noisy labelled datasets. We believe this work is a promising step towards robust deep learning in the presence of label noise. Our code is available at: https://github.com/ErikEnglesson/SGN.

Keywords
label noise, noisy labels, robustness, Gaussian noise, classification, log-ratio transform, compositional data analysis
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-346452 (URN)
Conference
ICLR 2024 - The Twelfth International Conference on Learning Representations, Messe Wien Exhibition and Congress Center, Vienna, Austria, May 7-11t, 2024 
Note

QC 20240515

Available from: 2024-05-15 Created: 2024-05-15 Last updated: 2024-05-16Bibliographically approved
Englesson, E. & Azizpour, H. (2024). Robust classification via regression for learning with noisy labels. In: 12th International Conference on Learning Representations, ICLR 2024: . Paper presented at 12th International Conference on Learning Representations, ICLR 2024, Hybrid, Vienna, Austria, May 7-11, 2024. International Conference on Learning Representations, ICLR
Open this publication in new window or tab >>Robust classification via regression for learning with noisy labels
2024 (English)In: 12th International Conference on Learning Representations, ICLR 2024, International Conference on Learning Representations, ICLR , 2024Conference paper, Published paper (Refereed)
Abstract [en]

Deep neural networks and large-scale datasets have revolutionized the field of machine learning. However, these large networks are susceptible to overfitting to label noise, resulting in reduced generalization. To address this challenge, two promising approaches have emerged: i) loss reweighting, which reduces the influence of noisy examples on the training loss, and ii) label correction that replaces noisy labels with estimated true labels. These directions have been pursued separately or combined as independent methods, lacking a unified approach. In this work, we present a unified method that seamlessly combines loss reweighting and label correction to enhance robustness against label noise in classification tasks. Specifically, by leveraging ideas from compositional data analysis in statistics, we frame the problem as a regression task, where loss reweighting and label correction can naturally be achieved with a shifted Gaussian label noise model. Our unified approach achieves strong performance compared to recent baselines on several noisy labelled datasets. We believe this work is a promising step towards robust deep learning in the presence of label noise. Our code is available at: github.com/ErikEnglesson/SGN.

Place, publisher, year, edition, pages
International Conference on Learning Representations, ICLR, 2024
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-367431 (URN)2-s2.0-85190096539 (Scopus ID)
Conference
12th International Conference on Learning Representations, ICLR 2024, Hybrid, Vienna, Austria, May 7-11, 2024
Note

QC 20250718

Available from: 2025-07-18 Created: 2025-07-18 Last updated: 2025-07-18Bibliographically approved
Yadav, R., Nascetti, A., Azizpour, H. & Ban, Y. (2024). Unsupervised Flood Detection on SAR Time Series using Variational Autoencoder. International Journal of Applied Earth Observation and Geoinformation, 126, Article ID 103635.
Open this publication in new window or tab >>Unsupervised Flood Detection on SAR Time Series using Variational Autoencoder
2024 (English)In: International Journal of Applied Earth Observation and Geoinformation, ISSN 1569-8432, E-ISSN 1872-826X, Vol. 126, article id 103635Article in journal (Other academic) Published
Abstract [en]

In this study, we propose a novel unsupervised Change Detection (CD) model to detect flood extent using Synthetic Aperture Radar~(SAR) time series data. The proposed model is based on a spatiotemporal variational autoencoder, trained with reconstruction, and contrastive learning techniques. The change maps are generated with a proposed novel algorithm that utilizes differences in latent feature distributions between pre-flood and post-flood data. The model is evaluated on nine different flood events by comparing the results with reference flood maps collected from the Copernicus Emergency Management Services (CEMS) and Sen1Floods11 dataset. We conducted a range of experiments and ablation studies to investigate the performance of our model. We compared the results with existing unsupervised models. The model achieved an average of 70\% Intersection over Union (IoU) score which is at least 7\% better than the IoU from existing unsupervised CD models. In the generalizability test, the proposed model outperformed supervised models ADS-Net (by 10\% IoU) and DAUSAR (by 8\% IoU), both trained on Sen1Floods11 and tested on CEMS sites.

Place, publisher, year, edition, pages
Elsevier BV, 2024
National Category
Earth Observation
Identifiers
urn:nbn:se:kth:diva-338773 (URN)10.1016/j.jag.2023.103635 (DOI)001143611500001 ()2-s2.0-85181026128 (Scopus ID)
Note

QC 20251029

Available from: 2023-10-25 Created: 2023-10-25 Last updated: 2025-10-29Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0001-5211-6388

Search in DiVA

Show all publications