kth.sePublications
Change search
Link to record
Permanent link

Direct link
Fredin Haslum, JohanORCID iD iconorcid.org/0000-0003-2920-8510
Publications (6 of 6) Show all publications
Huix, J. P., Ganeshan, A. R., Fredin Haslum, J., Söderberg, M., Matsoukas, C. & Smith, K. (2024). Are Natural Domain Foundation Models Useful for Medical Image Classification?. In: Proceedings - 2024 IEEE Winter Conference on Applications of Computer Vision, WACV 2024: . Paper presented at 2024 IEEE Winter Conference on Applications of Computer Vision, WACV 2024, Waikoloa, United States of America, Jan 4 2024 - Jan 8 2024 (pp. 7619-7628). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Are Natural Domain Foundation Models Useful for Medical Image Classification?
Show others...
2024 (English)In: Proceedings - 2024 IEEE Winter Conference on Applications of Computer Vision, WACV 2024, Institute of Electrical and Electronics Engineers (IEEE) , 2024, p. 7619-7628Conference paper, Published paper (Refereed)
Abstract [en]

The deep learning field is converging towards the use of general foundation models that can be easily adapted for diverse tasks. While this paradigm shift has become common practice within the field of natural language processing, progress has been slower in computer vision. In this paper we attempt to address this issue by investigating the transferability of various state-of-the-art foundation models to medical image classification tasks. Specifically, we evaluate the performance of five foundation models, namely Sam, Seem, Dinov2, BLIP, and OpenCLIP across four well-established medical imaging datasets. We explore different training settings to fully harness the potential of these models. Our study shows mixed results. Dinov2 consistently outperforms the standard practice of ImageNet pretraining. However, other foundation models failed to consistently beat this established baseline indicating limitations in their transferability to medical image classification tasks.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
Keywords
Algorithms, Algorithms, and algorithms, Applications, Biomedical / healthcare / medicine, Datasets and evaluations, formulations, Machine learning architectures
National Category
Computer Sciences Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-350585 (URN)10.1109/WACV57701.2024.00746 (DOI)2-s2.0-85184972028 (Scopus ID)
Conference
2024 IEEE Winter Conference on Applications of Computer Vision, WACV 2024, Waikoloa, United States of America, Jan 4 2024 - Jan 8 2024
Note

Part of ISBN 9798350318920

QC 20240718

Available from: 2024-07-18 Created: 2024-07-18 Last updated: 2025-02-01Bibliographically approved
Fredin Haslum, J., Matsoukas, C., Leuchowius, K.-J. & Smith, K. (2024). Bridging Generalization Gaps in High Content Imaging Through Online Self-Supervised Domain Adaptation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 2024,: . Paper presented at the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 03-08 January 2024 (pp. 7723-7732).
Open this publication in new window or tab >>Bridging Generalization Gaps in High Content Imaging Through Online Self-Supervised Domain Adaptation
2024 (English)In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 2024,, 2024, p. 7723-7732Conference paper, Published paper (Refereed)
Abstract [en]

High Content Imaging (HCI) plays a vital role in modern drug discovery and development pipelines, facilitating various stages from hit identification to candidate drug characterization. Applying machine learning models to these datasets can prove challenging as they typically consist of multiple batches, affected by experimental variation, especially if different imaging equipment have been used. Moreover, as new data arrive, it is preferable that they are analyzed in an online fashion. To overcome this, we propose CODA, an online self-supervised domain adaptation approach. CODA divides the classifier’s role into a generic feature extractor and a task-specific model. We adapt the feature extractor’s weights to the new domain using cross-batch self-supervision while keeping the task-specific model unchanged. Our results demonstrate that this strategy significantly reduces the generalization gap, achieving up to a 300% improvement when applied to data from different labs utilizing different microscopes. CODA can be applied to new, unlabeled out-of-domain data sources of different sizes, from a single plate to multiple experimental batches.

National Category
Computer graphics and computer vision
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-346570 (URN)10.1109/WACV57701.2024.00756 (DOI)2-s2.0-85192009362 (Scopus ID)
Conference
the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 03-08 January 2024
Note

QC 20240522

Available from: 2024-05-17 Created: 2024-05-17 Last updated: 2025-02-07Bibliographically approved
Fredin Haslum, J., Lardeau, C. H., Karlsson, J., Turkki, R., Leuchowius, K. J., Smith, K. & Müllers, E. (2024). Cell Painting-based bioactivity prediction boosts high-throughput screening hit-rates and compound diversity. Nature Communications, 15(1), Article ID 3470.
Open this publication in new window or tab >>Cell Painting-based bioactivity prediction boosts high-throughput screening hit-rates and compound diversity
Show others...
2024 (English)In: Nature Communications, E-ISSN 2041-1723, Vol. 15, no 1, article id 3470Article in journal (Refereed) Published
Abstract [en]

Identifying active compounds for a target is a time- and resource-intensive task in early drug discovery. Accurate bioactivity prediction using morphological profiles could streamline the process, enabling smaller, more focused compound screens. We investigate the potential of deep learning on unrefined single-concentration activity readouts and Cell Painting data, to predict compound activity across 140 diverse assays. We observe an average ROC-AUC of 0.744 ± 0.108 with 62% of assays achieving ≥0.7, 30% ≥0.8, and 7% ≥0.9. In many cases, the high prediction performance can be achieved using only brightfield images instead of multichannel fluorescence images. A comprehensive analysis shows that Cell Painting-based bioactivity prediction is robust across assay types, technologies, and target classes, with cell-based assays and kinase targets being particularly well-suited for prediction. Experimental validation confirms the enrichment of active compounds. Our findings indicate that models trained on Cell Painting data, combined with a small set of single-concentration data points, can reliably predict the activity of a compound library across diverse targets and assays while maintaining high hit rates and scaffold diversity. This approach has the potential to reduce the size of screening campaigns, saving time and resources, and enabling primary screening with more complex assays.

Place, publisher, year, edition, pages
Springer Nature, 2024
National Category
Biological Sciences
Identifiers
urn:nbn:se:kth:diva-346401 (URN)10.1038/s41467-024-47171-1 (DOI)38658534 (PubMedID)2-s2.0-85191297869 (Scopus ID)
Note

QC 20240516

Available from: 2024-05-14 Created: 2024-05-14 Last updated: 2024-05-20Bibliographically approved
Fredin Haslum, J. (2024). Machine Learning Methods for Image-based Phenotypic Profiling in Early Drug Discovery. (Doctoral dissertation). Stockholm: KTH Royal Institute of Technology
Open this publication in new window or tab >>Machine Learning Methods for Image-based Phenotypic Profiling in Early Drug Discovery
2024 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

In the search for new therapeutic treatments, strategies to make the drug discovery process more efficient are crucial. Image-based phenotypic profiling, with its millions of pictures of fluorescent stained cells, is a rich and effective means to capture the morphological effects of potential treatments on living systems. Within this complex data await biological insights and new therapeutic opportunities – but computational tools are needed to unlock them.

This thesis examines the role of machine learning in improving the utility and analysis of phenotypic screening data. It focuses on challenges specific to this domain, such as the lack of reliable labels that are essential for supervised learning, as well as confounding factors present in the data that are often unavoidable due to experimental variability. We explore transfer learning to boost model generalization and robustness, analyzing the impact of domain distance, initialization, dataset size, and architecture on the effectiveness of applying natural domain pre-trained weights to biomedical contexts. Building upon this, we delve into self-supervised pretraining for phenotypic image data, but find its direct application is inadequate in this context as it fails to differentiate between various biological effects. To overcome this, we develop new self-supervised learning strategies designed to enable the network to disregard confounding experimental noise, thus enhancing its ability to discern the impacts of various treatments. We further develop a technique that allows a model trained for phenotypic profiling to be adapted to new, unseen data without the need for any labels or supervised learning. Using this approach, a general phenotypic profiling model can be readily adapted to data from different sites without the need for any labels. Beyond our technical contributions, we also show that bioactive compounds identified using the approaches outlined in this thesis have been subsequently confirmed in biological assays through replication in an industrial setting. Our findings indicate that while phenotypic data and biomedical imaging present complex challenges, machine learning techniques can play a pivotal role in making early drug discovery more efficient and effective.

Abstract [sv]

I jakten på nya mediciner är strategier för att effektivisera processen för läkemedelsupptäckt avgörande. Bildbaserad fenotypisk profilering, med sina miljontals bilder på fluorescent färgade celler, erbjuder ett rikt och effektivt sätt att fånga de morfologiska effekterna av potentiella behandlingar på levande system. Inom sådan komplex data kan okända biologiska insikter identifieras och nya läkemedelsbehandlingar upptäckas, men analysmetoder kapabla att extrahera informationen krävs för att urskilja dem.

Denna avhandling utforskar maskininlärningens roll i att förbättra användbarheten och analysen av fenotypisk data. Den tar sig an utmaningar specifika för denna typ av data, såsom bristen på tillförlitliga annoteringar som krävs för övervakad inlärning, samt förväxlingsfaktorer i datan som ofta är oundvikliga på grund av experimentell variation. Vi utforskar överföringsinlärning för att öka modellernas generaliseringsförmåga och robusthet, samt analyserar hur faktorer som domänavstånd, initialisering, datamängd och modellarkitektur påverkar effektiviteten i att tillämpa förtränade vikter från naturliga domäner på biomedicinska.

Vidare fördjupar vi oss i oövervakad inlärning för fenotypiska bilddata, men upptäcker att dess direkta tillämpning är otillräcklig i detta sammanhang eftersom den inte lyckas skilja mellan olika biologiska effekter. För att hantera detta utvecklar vi nya strategier för oövervakat lärande, designade för att modellen ska kunna ignorera experimentellt brus, vilket förbättrar dess förmåga att urskilja effekterna av olika behandlingar. Vi utvecklar även en teknik som gör det möjligt för en modell tränad för fenotypisk profilering att anpassas till ny data från en okänd källa utan behov av några annoteringar eller övervakat lärande. Med denna metod kan en generell fenotypisk profilmodell enkelt anpassas till data från olika källor utan annoteringar.

Utöver våra tekniska bidrag visar vi också att bioaktiva substanser identifierade med metoderna i denna avhandling har bekräftats experimentellt. Våra resultat tyder på att även om fenotypiska data och biomedicinsk bilddata utgör komplexa utmaningar, kan maskininlärning spela en avgörande roll i att göra den tidiga fasen av läkemedelsupptäckt mer effektiv.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2024. p. 79
Series
TRITA-EECS-AVL ; 2024:53
Keywords
Phenotypic Profiling, Drug Discovery, Biomedical Imaging, Fenotypisk profilering, läkemedelsupptäckt, biomedicinsk avbildning
National Category
Computer graphics and computer vision
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-346574 (URN)978-91-8040-954-4 (ISBN)
Public defence
2024-06-12, https://kth-se.zoom.us/j/67796518372, D3, Lindstedtsvägen 9, Stockholm, 14:00 (English)
Opponent
Supervisors
Note

QC 20240520

Available from: 2024-05-20 Created: 2024-05-20 Last updated: 2025-02-07Bibliographically approved
Fredin Haslum, J., Matsoukas, C., Leuchowius, K.-J., Müllers, E. & Smith, K. (2023). Metadata-guided Consistency Learning for High Content Images. In: Proceedings of Machine Learning Research, Volume 227: Medical Imaging with Deep Learning: . Paper presented at 6th International Conference on Medical Imaging with Deep Learning, MIDL 2023, Nashville, United States of America, Jul 10 2023 - Jul 12 2023. ML Research Press
Open this publication in new window or tab >>Metadata-guided Consistency Learning for High Content Images
Show others...
2023 (English)In: Proceedings of Machine Learning Research, Volume 227: Medical Imaging with Deep Learning, ML Research Press , 2023Conference paper, Published paper (Refereed)
Abstract [en]

High content imaging assays can capture rich phenotypic response data for large sets of compound treatments, aiding in the characterization and discovery of novel drugs. However, extracting representative features from high content images that can capture subtle nuances in phenotypes remains challenging. The lack of high-quality labels makes it difficult to achieve satisfactory results with supervised deep learning. Self-Supervised learning methods have shown great success on natural images, and offer an attractive alternative also to microscopy images. However, we find that self-supervised learning techniques underperform on high content imaging assays. One challenge is the undesirable domain shifts present in the data known as batch effects, which are caused by biological noise or uncontrolled experimental conditions. To this end, we introduce Cross-Domain Consistency Learning (CDCL), a self-supervised approach that is able to learn in the presence of batch effects. CDCL enforces the learning of biological similarities while disregarding undesirable batch-specific signals, leading to more useful and versatile representations. These features are organised according to their morphological changes and are more useful for downstream tasks – such as distinguishing treatments and mechanism of action.

Place, publisher, year, edition, pages
ML Research Press, 2023
National Category
Computer and Information Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-346566 (URN)001221108600055 ()2-s2.0-85189329755 (Scopus ID)
Conference
6th International Conference on Medical Imaging with Deep Learning, MIDL 2023, Nashville, United States of America, Jul 10 2023 - Jul 12 2023
Note

QC 20240521

Available from: 2024-05-17 Created: 2024-05-17 Last updated: 2025-02-27Bibliographically approved
Matsoukas, C., Fredin Haslum, J., Sorkhei, M., Soderberg, M. & Smith, K. (2022). What Makes Transfer Learning Work for Medical Images: Feature Reuse & Other Factors. In: 2022 IEEE/CVF conference on computer vision and pattern recognition (CVPR): . Paper presented at IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), JUN 18-24, 2022, New Orleans, LA (pp. 9215-9224). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>What Makes Transfer Learning Work for Medical Images: Feature Reuse & Other Factors
Show others...
2022 (English)In: 2022 IEEE/CVF conference on computer vision and pattern recognition (CVPR), Institute of Electrical and Electronics Engineers (IEEE) , 2022, p. 9215-9224Conference paper, Published paper (Refereed)
Abstract [en]

Transfer learning is a standard technique to transfer knowledge from one domain to another. For applications in medical imaging, transfer from ImageNet has become the de-facto approach, despite differences in the tasks and image characteristics between the domains. However, it is unclear what factors determine whether - and to what extent transfer learning to the medical domain is useful. The longstanding assumption that features from the source domain get reused has recently been called into question. Through a series of experiments on several medical image benchmark datasets, we explore the relationship between transfer learning, data size, the capacity and inductive bias of the model, as well as the distance between the source and target domain. Our findings suggest that transfer learning is beneficial in most cases, and we characterize the important role feature reuse plays in its success.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2022
Series
IEEE Conference on Computer Vision and Pattern Recognition, ISSN 1063-6919
National Category
Computational Mathematics
Identifiers
urn:nbn:se:kth:diva-322794 (URN)10.1109/CVPR52688.2022.00901 (DOI)000870759102028 ()2-s2.0-85137378486 (Scopus ID)
Conference
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), JUN 18-24, 2022, New Orleans, LA
Note

Part of proceedings ISBN 978-1-6654-6946-3

QC 20230131

Available from: 2023-01-31 Created: 2023-01-31 Last updated: 2024-05-20Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0003-2920-8510

Search in DiVA

Show all publications