kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Cell Painting-based bioactivity prediction boosts high-throughput screening hit-rates and compound diversity
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST). KTH, Centres, Science for Life Laboratory, SciLifeLab. Research and Early Development, Cardiovascular, Renal and Metabolism (CVRM), BioPharmaceuticals R&D, AstraZeneca, Gothenburg, Sweden.ORCID iD: 0000-0003-2920-8510
Discovery Sciences, R&D, AstraZeneca, Alderley Park, UK.
Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden.
Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden.
Show others and affiliations
2024 (English)In: Nature Communications, E-ISSN 2041-1723, Vol. 15, no 1, article id 3470Article in journal (Refereed) Published
Abstract [en]

Identifying active compounds for a target is a time- and resource-intensive task in early drug discovery. Accurate bioactivity prediction using morphological profiles could streamline the process, enabling smaller, more focused compound screens. We investigate the potential of deep learning on unrefined single-concentration activity readouts and Cell Painting data, to predict compound activity across 140 diverse assays. We observe an average ROC-AUC of 0.744 ± 0.108 with 62% of assays achieving ≥0.7, 30% ≥0.8, and 7% ≥0.9. In many cases, the high prediction performance can be achieved using only brightfield images instead of multichannel fluorescence images. A comprehensive analysis shows that Cell Painting-based bioactivity prediction is robust across assay types, technologies, and target classes, with cell-based assays and kinase targets being particularly well-suited for prediction. Experimental validation confirms the enrichment of active compounds. Our findings indicate that models trained on Cell Painting data, combined with a small set of single-concentration data points, can reliably predict the activity of a compound library across diverse targets and assays while maintaining high hit rates and scaffold diversity. This approach has the potential to reduce the size of screening campaigns, saving time and resources, and enabling primary screening with more complex assays.

Place, publisher, year, edition, pages
Springer Nature , 2024. Vol. 15, no 1, article id 3470
National Category
Biological Sciences
Identifiers
URN: urn:nbn:se:kth:diva-346401DOI: 10.1038/s41467-024-47171-1PubMedID: 38658534Scopus ID: 2-s2.0-85191297869OAI: oai:DiVA.org:kth-346401DiVA, id: diva2:1857595
Note

QC 20240516

Available from: 2024-05-14 Created: 2024-05-14 Last updated: 2024-05-20Bibliographically approved
In thesis
1. Machine Learning Methods for Image-based Phenotypic Profiling in Early Drug Discovery
Open this publication in new window or tab >>Machine Learning Methods for Image-based Phenotypic Profiling in Early Drug Discovery
2024 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

In the search for new therapeutic treatments, strategies to make the drug discovery process more efficient are crucial. Image-based phenotypic profiling, with its millions of pictures of fluorescent stained cells, is a rich and effective means to capture the morphological effects of potential treatments on living systems. Within this complex data await biological insights and new therapeutic opportunities – but computational tools are needed to unlock them.

This thesis examines the role of machine learning in improving the utility and analysis of phenotypic screening data. It focuses on challenges specific to this domain, such as the lack of reliable labels that are essential for supervised learning, as well as confounding factors present in the data that are often unavoidable due to experimental variability. We explore transfer learning to boost model generalization and robustness, analyzing the impact of domain distance, initialization, dataset size, and architecture on the effectiveness of applying natural domain pre-trained weights to biomedical contexts. Building upon this, we delve into self-supervised pretraining for phenotypic image data, but find its direct application is inadequate in this context as it fails to differentiate between various biological effects. To overcome this, we develop new self-supervised learning strategies designed to enable the network to disregard confounding experimental noise, thus enhancing its ability to discern the impacts of various treatments. We further develop a technique that allows a model trained for phenotypic profiling to be adapted to new, unseen data without the need for any labels or supervised learning. Using this approach, a general phenotypic profiling model can be readily adapted to data from different sites without the need for any labels. Beyond our technical contributions, we also show that bioactive compounds identified using the approaches outlined in this thesis have been subsequently confirmed in biological assays through replication in an industrial setting. Our findings indicate that while phenotypic data and biomedical imaging present complex challenges, machine learning techniques can play a pivotal role in making early drug discovery more efficient and effective.

Abstract [sv]

I jakten på nya mediciner är strategier för att effektivisera processen för läkemedelsupptäckt avgörande. Bildbaserad fenotypisk profilering, med sina miljontals bilder på fluorescent färgade celler, erbjuder ett rikt och effektivt sätt att fånga de morfologiska effekterna av potentiella behandlingar på levande system. Inom sådan komplex data kan okända biologiska insikter identifieras och nya läkemedelsbehandlingar upptäckas, men analysmetoder kapabla att extrahera informationen krävs för att urskilja dem.

Denna avhandling utforskar maskininlärningens roll i att förbättra användbarheten och analysen av fenotypisk data. Den tar sig an utmaningar specifika för denna typ av data, såsom bristen på tillförlitliga annoteringar som krävs för övervakad inlärning, samt förväxlingsfaktorer i datan som ofta är oundvikliga på grund av experimentell variation. Vi utforskar överföringsinlärning för att öka modellernas generaliseringsförmåga och robusthet, samt analyserar hur faktorer som domänavstånd, initialisering, datamängd och modellarkitektur påverkar effektiviteten i att tillämpa förtränade vikter från naturliga domäner på biomedicinska.

Vidare fördjupar vi oss i oövervakad inlärning för fenotypiska bilddata, men upptäcker att dess direkta tillämpning är otillräcklig i detta sammanhang eftersom den inte lyckas skilja mellan olika biologiska effekter. För att hantera detta utvecklar vi nya strategier för oövervakat lärande, designade för att modellen ska kunna ignorera experimentellt brus, vilket förbättrar dess förmåga att urskilja effekterna av olika behandlingar. Vi utvecklar även en teknik som gör det möjligt för en modell tränad för fenotypisk profilering att anpassas till ny data från en okänd källa utan behov av några annoteringar eller övervakat lärande. Med denna metod kan en generell fenotypisk profilmodell enkelt anpassas till data från olika källor utan annoteringar.

Utöver våra tekniska bidrag visar vi också att bioaktiva substanser identifierade med metoderna i denna avhandling har bekräftats experimentellt. Våra resultat tyder på att även om fenotypiska data och biomedicinsk bilddata utgör komplexa utmaningar, kan maskininlärning spela en avgörande roll i att göra den tidiga fasen av läkemedelsupptäckt mer effektiv.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2024. p. 79
Series
TRITA-EECS-AVL ; 2024:53
Keywords
Phenotypic Profiling, Drug Discovery, Biomedical Imaging, Fenotypisk profilering, läkemedelsupptäckt, biomedicinsk avbildning
National Category
Computer graphics and computer vision
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-346574 (URN)978-91-8040-954-4 (ISBN)
Public defence
2024-06-12, https://kth-se.zoom.us/j/67796518372, D3, Lindstedtsvägen 9, Stockholm, 14:00 (English)
Opponent
Supervisors
Note

QC 20240520

Available from: 2024-05-20 Created: 2024-05-20 Last updated: 2025-02-07Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textPubMedScopus

Authority records

Fredin Haslum, JohanSmith, Kevin

Search in DiVA

By author/editor
Fredin Haslum, JohanSmith, Kevin
By organisation
Computational Science and Technology (CST)Science for Life Laboratory, SciLifeLab
In the same journal
Nature Communications
Biological Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
pubmed
urn-nbn

Altmetric score

doi
pubmed
urn-nbn
Total: 167 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf