kth.sePublications KTH
Change search
Link to record
Permanent link

Direct link
Sullivan, Josephine, Prof.ORCID iD iconorcid.org/0000-0003-2784-7300
Publications (10 of 42) Show all publications
Sigurdsson, J. H., Crotty, D., Holmin, S., Sullivan, J. & Persson, M. (2025). Deep-Learning-Based Iodine Map Prediction with Photon-Counting CT Images. In: Medical Imaging 2025: Physics of Medical Imaging: . Paper presented at Medical Imaging 2025: Physics of Medical Imaging, San Diego, United States of America, Feb 17 2025 - Feb 21 2025. SPIE-Intl Soc Optical Eng, Article ID 134053O.
Open this publication in new window or tab >>Deep-Learning-Based Iodine Map Prediction with Photon-Counting CT Images
Show others...
2025 (English)In: Medical Imaging 2025: Physics of Medical Imaging, SPIE-Intl Soc Optical Eng , 2025, article id 134053OConference paper, Published paper (Refereed)
Abstract [en]

Energy-resolving photon-counting CT promises improved material-separation capabilities compared to conventional CT. However, accurate separation of iodine and calcium, which is especially important for imaging atherosclerotic plaques, remains a challenge. In this proof-of-concept study, we present a deep-learning based method that takes a pair of basis images from a photon-counting CT as input and produces a map of the iodine distribution where the contamination from other materials such as calcium is minimized, and demonstrate its performance on clinical images of the carotid arteries. As training data, we used 13 pairs of image slices of the neck from a silicon-based photon-counting spectral CT, with one non-contrast and one contrast-enhanced slice in each pair. To generate a ground-truth iodine maps as training labels, 40 keV virtual monoenergetic non-contrast images were registered to align with the corresponding 40 keV contrast-enhanced images, and the difference between those two image slices was used to generate an iodine concentration map. We trained a ResUNet++ deep convolutional neural network using water-iodine basis image pairs resulting from a two-basis material decomposition as inputs and the iodine concentration map obtained from subtraction as label. The resulting method was evaluated on a previously unseen photon-counting CT image slice of the neck from the same patient. Our results show that the trained network correctly highlights the image features containing iodinated contrast agent and quantifies the concentration accurately. The contamination from calcium and other tissues is significantly reduced compared to the original iodine basis image. Our results demonstrate that the proposed method can successfully separate iodine from calcium and other tissues on clinical silicon-based photon-counting CT, with important potential implications for imaging of atherosclerosis.

Place, publisher, year, edition, pages
SPIE-Intl Soc Optical Eng, 2025
Keywords
Carotid arteries, Deep neural network, Iodine Map, Material decomposition, Photon-counting CT
National Category
Radiology and Medical Imaging Medical Imaging
Identifiers
urn:nbn:se:kth:diva-363778 (URN)10.1117/12.3047898 (DOI)001487074500113 ()2-s2.0-105004584370 (Scopus ID)
Conference
Medical Imaging 2025: Physics of Medical Imaging, San Diego, United States of America, Feb 17 2025 - Feb 21 2025
Note

Part of ISBN 9781510685888

QC 20250527

Available from: 2025-05-21 Created: 2025-05-21 Last updated: 2025-07-04Bibliographically approved
Hafner, S., Gerard, S., Sullivan, J. & Ban, Y. (2025). DisasterAdaptiveNet: A robust network for multi-hazard building damage detection from very-high-resolution satellite imagery. International Journal of Applied Earth Observation and Geoinformation, 143, Article ID 104756.
Open this publication in new window or tab >>DisasterAdaptiveNet: A robust network for multi-hazard building damage detection from very-high-resolution satellite imagery
2025 (English)In: International Journal of Applied Earth Observation and Geoinformation, ISSN 1569-8432, E-ISSN 1872-826X, Vol. 143, article id 104756Article in journal (Refereed) Published
Abstract [en]

Earth observation satellites play a crucial role in disaster response and management, offering timely and large-scale data for damage assessment. Recent studies have demonstrated the potential of deep learning techniques for automated building damage detection from satellite imagery, often based on the xBD dataset. This high-quality dataset features bi-temporal very-high-resolution image pairs of several disaster events. Notably, several studies have proposed new network architectures and demonstrated their improved performance on xBD. Although such highly engineered model-centric approaches achieve promising results on the original dataset split of xBD, we show that they underperform on a new event-based split, which evaluates them on unseen events. To reduce this generalization gap, we propose to follow a data-centric approach. For this, we first derive a simplified baseline method from the winning solution of the xView2 competition, with greatly reduced complexity. With a simple adjustment to this baseline method, we incorporate readily available disaster-type information, allowing it to account for disaster-specific damage characteristics. We evaluate the resulting disaster-adaptive model on the event-based split of xBD and demonstrate its improved ability to generalize to unseen events compared to several competing methods. These results highlight the potential of our data-centric approach for practical and robust building damage assessment in real-world disaster scenarios. Code including the strong baseline model is available at: https://github.com/SebastianHafner/DisasterAdaptiveNet.

Place, publisher, year, edition, pages
Elsevier BV, 2025
Keywords
Deep learning, Earth observation, Model conditioning, Multi-task learning
National Category
Earth Observation
Identifiers
urn:nbn:se:kth:diva-369922 (URN)10.1016/j.jag.2025.104756 (DOI)2-s2.0-105013632237 (Scopus ID)
Note

Not duplicate with DiVA 1915661

QC 20250918

Available from: 2025-09-18 Created: 2025-09-18 Last updated: 2025-09-18Bibliographically approved
Björkstrand, D., Sullivan, J., Bretzner, L., Loy, G. & Wang, T. (2023). Cross-attention Masked Auto-Encoder for Human 3D Motion Infilling and Denoising. In: : . Paper presented at The 34th British Machine Vision Conference, 20th - 24th November 2023, Aberdeen, UK. BMVC
Open this publication in new window or tab >>Cross-attention Masked Auto-Encoder for Human 3D Motion Infilling and Denoising
Show others...
2023 (English)Conference paper, Published paper (Refereed)
Abstract [en]

Human 3D pose and motion capture have numerous applications in fields such as augmented and virtual reality, animation, robotics and sports. However, even the best capturing methods suffer from artifacts such as missed joints and noisy or inaccurate joint positions. To address this we propose the Cross-attention Masked Auto-Encoder (XMAE) for human 3D motion infilling and denoising. XMAE extends the original Masked Auto-Encoder design by introducing cross-attention in the decoder to deal with the train-test gap common in methods utilizing masking and mask tokens. Furthermore, we introduce joint displacement as an additional noise source during training, enabling XMAE to learn to correct incorrect joint positions. Through extensive experiments, we show XMAE's effectiveness compared to state-of-the-art approaches across three public datasets and its ability to denoise real-world data, reducing limb length standard deviation by 28\% when applied on our in-the-wild professional soccer dataset.

Place, publisher, year, edition, pages
BMVC, 2023
Keywords
3D Human pose estimation
National Category
Computer graphics and computer vision
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-352209 (URN)
Conference
The 34th British Machine Vision Conference, 20th - 24th November 2023, Aberdeen, UK
Note

QC 20240829

Available from: 2024-08-26 Created: 2024-08-26 Last updated: 2025-02-07Bibliographically approved
Zhao, Y., Ban, Y. & Sullivan, J. (2023). Tokenized Time-Series in Satellite Image Segmentation With Transformer Network for Active Fire Detection. IEEE Transactions on Geoscience and Remote Sensing, 61, Article ID 4405513.
Open this publication in new window or tab >>Tokenized Time-Series in Satellite Image Segmentation With Transformer Network for Active Fire Detection
2023 (English)In: IEEE Transactions on Geoscience and Remote Sensing, ISSN 0196-2892, E-ISSN 1558-0644, Vol. 61, article id 4405513Article in journal (Refereed) Published
Abstract [en]

The Visible Infrared Imaging Radiometer Suite (VIIRS) onboard the Suomi National Polar-orbiting Partnership (Suomi-NPP) satellite has been used for the early detection and daily monitoring of active wildfires. How to effectively segment the active fire (AF) pixels from VIIRS image time-series in a reliable manner remains a challenge because of the low precision associated with high recall using automatic methods. For AF detection, multicriteria thresholding is often applied to both low-resolution and mid-resolution Earth observation images. Deep learning approaches based on convolutional neural networks (ConvNets) are also well-studied on mid-resolution images. However, ConvNet-based approaches have poor performance on low-resolution images because of the coarse spatial features. On the other hand, the high temporal resolution of VIIRS images highlights the potential of using sequential models for AF detection. Transformer networks, a recent deep learning architecture based on self-attention, offer hope as they have shown strong performance on image segmentation and sequential modeling tasks within computer vision. In this research, we propose a transformer-based solution to segment AF pixels from the VIIRS time-series. The solution feeds a time-series of tokenized pixels into a transformer network to identify AF pixels at each timestamp and achieves a significantly higher F1-score than prior approaches for AFs within the study areas in California, New Mexico, and Oregon in the U.S., and in British Columbia and Alberta in Canada, as well as in Australia, and Sweden.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers Inc., 2023
Keywords
Active fire (AF) detection, image segmentation, remote sensing, transformer, Visible Infrared Imaging Radiometer Suite (VIIRS)
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-334367 (URN)10.1109/TGRS.2023.3287498 (DOI)001030654100010 ()2-s2.0-85162916865 (Scopus ID)
Note

QC 20230821

Available from: 2023-08-18 Created: 2023-08-18 Last updated: 2025-02-07Bibliographically approved
Gerard, S., Zhao, Y. & Sullivan, J. (2023). WildfireSpreadTS: A dataset of multi-modal time series for wildfire spread prediction. In: Advances in Neural Information Processing Systems 36 - 37th Conference on Neural Information Processing Systems, NeurIPS 2023: . Paper presented at 37th Conference on Neural Information Processing Systems, NeurIPS 2023, New Orleans, United States of America, Dec 10 2023 - Dec 16 2023. Neural Information Processing Systems Foundation
Open this publication in new window or tab >>WildfireSpreadTS: A dataset of multi-modal time series for wildfire spread prediction
2023 (English)In: Advances in Neural Information Processing Systems 36 - 37th Conference on Neural Information Processing Systems, NeurIPS 2023, Neural Information Processing Systems Foundation , 2023Conference paper, Published paper (Refereed)
Abstract [en]

We present a multi-temporal, multi-modal remote-sensing dataset for predicting how active wildfires will spread at a resolution of 24 hours.The dataset consists of 13 607 images across 607 fire events in the United States from January 2018 to October 2021.For each fire event, the dataset contains a full time series of daily observations, containing detected active fires and variables related to fuel, topography and weather conditions.The dataset is challenging due to: a) its inputs being multi-temporal, b) the high number of 23 multi-modal input channels, c) highly imbalanced labels and d) noisy labels, due to smoke, clouds, and inaccuracies in the active fire detection.The underlying complexity of the physical processes adds to these challenges.Compared to existing public datasets in this area, WILDFIRESPREADTS allows for multi-temporal modeling of spreading wildfires, due to its time series structure.Furthermore, we provide additional input modalities and a high spatial resolution of 375m for the active fire maps.We publish this dataset to encourage further research on this important task with multi-temporal, noise-resistant or generative methods, uncertainty estimation or advanced optimization techniques that deal with the high-dimensional input space.

Place, publisher, year, edition, pages
Neural Information Processing Systems Foundation, 2023
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-346140 (URN)001230083405038 ()2-s2.0-85191155663 (Scopus ID)
Conference
37th Conference on Neural Information Processing Systems, NeurIPS 2023, New Orleans, United States of America, Dec 10 2023 - Dec 16 2023
Note

QC 20240506

Available from: 2024-05-03 Created: 2024-05-03 Last updated: 2024-08-20Bibliographically approved
Gamba, M., Chmielewski-Anders, A., Sullivan, J., Azizpour, H. & Björkman, M. (2022). Are All Linear Regions Created Equal?. In: Camps-Valls, G Ruiz, FJR Valera, I (Ed.), Proceedings 25th International Conference on Artificial Intelligence and Statistics, AISTATS 2022: . Paper presented at 25th International Conference on Artificial Intelligence and Statistics, AISTATS 2022, Virtual, Online, MAR 28-30, 2022. ML Research Press, 151
Open this publication in new window or tab >>Are All Linear Regions Created Equal?
Show others...
2022 (English)In: Proceedings 25th International Conference on Artificial Intelligence and Statistics, AISTATS 2022 / [ed] Camps-Valls, G Ruiz, FJR Valera, I, ML Research Press , 2022, Vol. 151Conference paper, Published paper (Refereed)
Abstract [en]

The number of linear regions has been studied as a proxy of complexity for ReLU networks. However, the empirical success of network compression techniques like pruning and knowledge distillation, suggest that in the overparameterized setting, linear regions density might fail to capture the effective nonlinearity. In this work, we propose an efficient algorithm for discovering linear regions and use it to investigate the effectiveness of density in capturing the nonlinearity of trained VGGs and ResNets on CIFAR-10 and CIFAR-100. We contrast the results with a more principled nonlinearity measure based on function variation, highlighting the shortcomings of linear regions density. Furthermore, interestingly, our measure of nonlinearity clearly correlates with model-wise deep double descent, connecting reduced test error with reduced nonlinearity, and increased local similarity of linear regions.

Place, publisher, year, edition, pages
ML Research Press, 2022
Series
Proceedings of Machine Learning Research, ISSN 2640-3498
National Category
Computer Sciences Computer Systems
Identifiers
urn:nbn:se:kth:diva-320995 (URN)000841852301002 ()2-s2.0-85163053252 (Scopus ID)
Conference
25th International Conference on Artificial Intelligence and Statistics, AISTATS 2022, Virtual, Online, MAR 28-30, 2022
Note

QC 20221104

Available from: 2022-11-04 Created: 2022-11-04 Last updated: 2024-10-17Bibliographically approved
Maki, A., Kragic, D., Kjellström, H., Azizpour, H., Sullivan, J., Björkman, M., . . . Sundblad, Y. (2022). In Memoriam: Jan-Olof Eklundh. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9), 4488-4489
Open this publication in new window or tab >>In Memoriam: Jan-Olof Eklundh
Show others...
2022 (English)In: IEEE Transactions on Pattern Analysis and Machine Intelligence, ISSN 0162-8828, E-ISSN 1939-3539, Vol. 44, no 9, p. 4488-4489Article in journal (Refereed) Published
Place, publisher, year, edition, pages
IEEE COMPUTER SOC, 2022
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-316696 (URN)10.1109/TPAMI.2022.3183266 (DOI)000836666600005 ()
Note

QC 20220905

Available from: 2022-09-05 Created: 2022-09-05 Last updated: 2022-09-05Bibliographically approved
Bujwid, S. & Sullivan, J. (2021). Large-Scale Zero-Shot Image Classification from Rich and Diverse Textual Descriptions. In: Proceedings of the Third Workshop on Beyond Vision and LANguage: inTEgrating Real-world kNowledge (LANTERN). Paper presented at Third Workshop on Beyond Vision and LANguage: inTEgrating Real-world kNowledge (LANTERN), Kyiv, Ukraine [Online] April 20, 2021 (pp. 38-52). Association for Computational Linguistics
Open this publication in new window or tab >>Large-Scale Zero-Shot Image Classification from Rich and Diverse Textual Descriptions
2021 (English)In: Proceedings of the Third Workshop on Beyond Vision and LANguage: inTEgrating Real-world kNowledge (LANTERN), Association for Computational Linguistics , 2021, p. 38-52Conference paper, Published paper (Refereed)
Abstract [en]

We study the impact of using rich and diverse textual descriptions of classes for zero-shot learning (ZSL) on ImageNet. We create a new dataset ImageNet-Wiki that matches each ImageNet class to its corresponding Wikipedia article. We show that merely employing these Wikipedia articles as class descriptions yields much higher ZSL performance than prior works. Even a simple model using this type of auxiliary data outperforms state-of-the-art models that rely on standard features of word embedding encodings of class names. These results highlight the usefulness and importance of textual descriptions for ZSL, as well as the relative importance of auxiliary data type compared to the algorithmic progress. Our experimental results also show that standard zero-shot learning approaches generalize poorly across categories of classes.

Place, publisher, year, edition, pages
Association for Computational Linguistics, 2021
Keywords
zero-shot learning, image classification, textual descriptions, Wikipedia
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-295711 (URN)
Conference
Third Workshop on Beyond Vision and LANguage: inTEgrating Real-world kNowledge (LANTERN), Kyiv, Ukraine [Online] April 20, 2021
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

QC 20210609

Available from: 2021-05-25 Created: 2021-05-25 Last updated: 2025-02-07Bibliographically approved
Mohlin, D., Bianchi, G. & Sullivan, J. (2021). Probabilistic Regression with Huber Distributions. In: 32nd British Machine Vision Conference, BMVC 2021: . Paper presented at 32nd British Machine Vision Conference, BMVC 2021, Virtual, Online, NA, Nov 22 2021 - Nov 25 2021. British Machine Vision Association, BMVA
Open this publication in new window or tab >>Probabilistic Regression with Huber Distributions
2021 (English)In: 32nd British Machine Vision Conference, BMVC 2021, British Machine Vision Association, BMVA , 2021Conference paper, Published paper (Refereed)
Abstract [en]

In this paper we describe a probabilistic method for estimating the position of an object along with its covariance matrix using neural networks. Our method is designed to be robust to outliers, have bounded gradients with respect to the network outputs, among other desirable properties. To achieve this we introduce a novel probability distribution inspired by the Huber loss. We also introduce a new way to parameterize positive definite matrices to ensure invariance to the choice of orientation for the coordinate system we regress over. We evaluate our method on popular body pose and facial landmark datasets and get performance on par or exceeding the performance of non-heatmap methods. Our code is available at github.com/Davmo049/Public_prob_regression_with_huber_distributions.

Place, publisher, year, edition, pages
British Machine Vision Association, BMVA, 2021
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-350592 (URN)2-s2.0-85176102845 (Scopus ID)
Conference
32nd British Machine Vision Conference, BMVC 2021, Virtual, Online, NA, Nov 22 2021 - Nov 25 2021
Note

QC 20240718

Available from: 2024-07-18 Created: 2024-07-18 Last updated: 2025-02-07Bibliographically approved
Baldassarre, F., Smith, K., Sullivan, J. & Azizpour, H. (2020). Explanation-Based Weakly-Supervised Learning of Visual Relations with Graph Networks. In: Proceedings, Part XXVIII Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020: . Paper presented at Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020 (pp. 612-630). Springer Nature
Open this publication in new window or tab >>Explanation-Based Weakly-Supervised Learning of Visual Relations with Graph Networks
2020 (English)In: Proceedings, Part XXVIII Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Springer Nature , 2020, p. 612-630Conference paper, Published paper (Refereed)
Abstract [en]

Visual relationship detection is fundamental for holistic image understanding. However, the localization and classification of (subject, predicate, object) triplets remain challenging tasks, due to the combinatorial explosion of possible relationships, their long-tailed distribution in natural images, and an expensive annotation process. This paper introduces a novel weakly-supervised method for visual relationship detection that relies on minimal image-level predicate labels. A graph neural network is trained to classify predicates in images from a graph representation of detected objects, implicitly encoding an inductive bias for pairwise relations. We then frame relationship detection as the explanation of such a predicate classifier, i.e. we obtain a complete relation by recovering the subject and object of a predicted predicate. We present results comparable to recent fully- and weakly-supervised methods on three diverse and challenging datasets: HICO-DET for human-object interaction, Visual Relationship Detection for generic object-to-object relations, and UnRel for unusual triplets; demonstrating robustness to non-comprehensive annotations and good few-shot generalization.

Place, publisher, year, edition, pages
Springer Nature, 2020
Series
Lecture Notes in Computer Science book series ; 12373
Keywords
Computer vision, Image coding, Supervised learning, Combinatorial explosion, Graph neural networks, Graph representation, Human-object interaction, Long-tailed distributions, Object to objects, Supervised methods, Weakly supervised learning, Object detection
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-290838 (URN)10.1007/978-3-030-58604-1_37 (DOI)001500584400037 ()2-s2.0-85097054926 (Scopus ID)
Conference
Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020
Note

Part of ISBN 9783030586034

QC 20210323

Available from: 2021-03-23 Created: 2021-03-23 Last updated: 2025-12-08Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0003-2784-7300

Search in DiVA

Show all publications