kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Bridging Generalization Gaps in High Content Imaging Through Online Self-Supervised Domain Adaptation
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST). KTH, Centres, Science for Life Laboratory, SciLifeLab.ORCID iD: 0000-0003-2920-8510
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).ORCID iD: 0000-0003-1401-3497
AstraZeneca, Gothenburg, Sweden.
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST). KTH, Centres, Science for Life Laboratory, SciLifeLab.ORCID iD: 0000-0002-6163-191X
2024 (English)In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 2024,, 2024, p. 7723-7732Conference paper, Published paper (Refereed)
Abstract [en]

High Content Imaging (HCI) plays a vital role in modern drug discovery and development pipelines, facilitating various stages from hit identification to candidate drug characterization. Applying machine learning models to these datasets can prove challenging as they typically consist of multiple batches, affected by experimental variation, especially if different imaging equipment have been used. Moreover, as new data arrive, it is preferable that they are analyzed in an online fashion. To overcome this, we propose CODA, an online self-supervised domain adaptation approach. CODA divides the classifier’s role into a generic feature extractor and a task-specific model. We adapt the feature extractor’s weights to the new domain using cross-batch self-supervision while keeping the task-specific model unchanged. Our results demonstrate that this strategy significantly reduces the generalization gap, achieving up to a 300% improvement when applied to data from different labs utilizing different microscopes. CODA can be applied to new, unlabeled out-of-domain data sources of different sizes, from a single plate to multiple experimental batches.

Place, publisher, year, edition, pages
2024. p. 7723-7732
National Category
Computer graphics and computer vision
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-346570DOI: 10.1109/WACV57701.2024.00756Scopus ID: 2-s2.0-85192009362OAI: oai:DiVA.org:kth-346570DiVA, id: diva2:1858712
Conference
the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 03-08 January 2024
Note

QC 20240522

Available from: 2024-05-17 Created: 2024-05-17 Last updated: 2025-02-07Bibliographically approved
In thesis
1. Machine Learning Methods for Image-based Phenotypic Profiling in Early Drug Discovery
Open this publication in new window or tab >>Machine Learning Methods for Image-based Phenotypic Profiling in Early Drug Discovery
2024 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

In the search for new therapeutic treatments, strategies to make the drug discovery process more efficient are crucial. Image-based phenotypic profiling, with its millions of pictures of fluorescent stained cells, is a rich and effective means to capture the morphological effects of potential treatments on living systems. Within this complex data await biological insights and new therapeutic opportunities – but computational tools are needed to unlock them.

This thesis examines the role of machine learning in improving the utility and analysis of phenotypic screening data. It focuses on challenges specific to this domain, such as the lack of reliable labels that are essential for supervised learning, as well as confounding factors present in the data that are often unavoidable due to experimental variability. We explore transfer learning to boost model generalization and robustness, analyzing the impact of domain distance, initialization, dataset size, and architecture on the effectiveness of applying natural domain pre-trained weights to biomedical contexts. Building upon this, we delve into self-supervised pretraining for phenotypic image data, but find its direct application is inadequate in this context as it fails to differentiate between various biological effects. To overcome this, we develop new self-supervised learning strategies designed to enable the network to disregard confounding experimental noise, thus enhancing its ability to discern the impacts of various treatments. We further develop a technique that allows a model trained for phenotypic profiling to be adapted to new, unseen data without the need for any labels or supervised learning. Using this approach, a general phenotypic profiling model can be readily adapted to data from different sites without the need for any labels. Beyond our technical contributions, we also show that bioactive compounds identified using the approaches outlined in this thesis have been subsequently confirmed in biological assays through replication in an industrial setting. Our findings indicate that while phenotypic data and biomedical imaging present complex challenges, machine learning techniques can play a pivotal role in making early drug discovery more efficient and effective.

Abstract [sv]

I jakten på nya mediciner är strategier för att effektivisera processen för läkemedelsupptäckt avgörande. Bildbaserad fenotypisk profilering, med sina miljontals bilder på fluorescent färgade celler, erbjuder ett rikt och effektivt sätt att fånga de morfologiska effekterna av potentiella behandlingar på levande system. Inom sådan komplex data kan okända biologiska insikter identifieras och nya läkemedelsbehandlingar upptäckas, men analysmetoder kapabla att extrahera informationen krävs för att urskilja dem.

Denna avhandling utforskar maskininlärningens roll i att förbättra användbarheten och analysen av fenotypisk data. Den tar sig an utmaningar specifika för denna typ av data, såsom bristen på tillförlitliga annoteringar som krävs för övervakad inlärning, samt förväxlingsfaktorer i datan som ofta är oundvikliga på grund av experimentell variation. Vi utforskar överföringsinlärning för att öka modellernas generaliseringsförmåga och robusthet, samt analyserar hur faktorer som domänavstånd, initialisering, datamängd och modellarkitektur påverkar effektiviteten i att tillämpa förtränade vikter från naturliga domäner på biomedicinska.

Vidare fördjupar vi oss i oövervakad inlärning för fenotypiska bilddata, men upptäcker att dess direkta tillämpning är otillräcklig i detta sammanhang eftersom den inte lyckas skilja mellan olika biologiska effekter. För att hantera detta utvecklar vi nya strategier för oövervakat lärande, designade för att modellen ska kunna ignorera experimentellt brus, vilket förbättrar dess förmåga att urskilja effekterna av olika behandlingar. Vi utvecklar även en teknik som gör det möjligt för en modell tränad för fenotypisk profilering att anpassas till ny data från en okänd källa utan behov av några annoteringar eller övervakat lärande. Med denna metod kan en generell fenotypisk profilmodell enkelt anpassas till data från olika källor utan annoteringar.

Utöver våra tekniska bidrag visar vi också att bioaktiva substanser identifierade med metoderna i denna avhandling har bekräftats experimentellt. Våra resultat tyder på att även om fenotypiska data och biomedicinsk bilddata utgör komplexa utmaningar, kan maskininlärning spela en avgörande roll i att göra den tidiga fasen av läkemedelsupptäckt mer effektiv.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2024. p. 79
Series
TRITA-EECS-AVL ; 2024:53
Keywords
Phenotypic Profiling, Drug Discovery, Biomedical Imaging, Fenotypisk profilering, läkemedelsupptäckt, biomedicinsk avbildning
National Category
Computer graphics and computer vision
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-346574 (URN)978-91-8040-954-4 (ISBN)
Public defence
2024-06-12, https://kth-se.zoom.us/j/67796518372, D3, Lindstedtsvägen 9, Stockholm, 14:00 (English)
Opponent
Supervisors
Note

QC 20240520

Available from: 2024-05-20 Created: 2024-05-20 Last updated: 2025-02-07Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Fredin Haslum, JohanMatsoukas, ChristosSmith, Kevin

Search in DiVA

By author/editor
Fredin Haslum, JohanMatsoukas, ChristosSmith, Kevin
By organisation
Computational Science and Technology (CST)Science for Life Laboratory, SciLifeLab
Computer graphics and computer vision

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 67 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf