kth.sePublications KTH
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Designing Target-specific Data Sets for Regioselectivity Predictions on Complex Substrates
The Warren and Katharine Schlinger Laboratory for Chemistry and Chemical Engineering, Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, United States.
KTH, School of Engineering Sciences in Chemistry, Biotechnology and Health (CBH), Chemistry, Theoretical Chemistry and Biology. The Warren and Katharine Schlinger Laboratory for Chemistry and Chemical Engineering, Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, United States.ORCID iD: 0000-0002-8284-6856
The Warren and Katharine Schlinger Laboratory for Chemistry and Chemical Engineering, Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, United States.
Department of Chemistry, Ben-Gurion University of the Negev, Beer-Sheva 841051, Israel.
Show others and affiliations
2025 (English)In: Journal of the American Chemical Society, ISSN 0002-7863, E-ISSN 1520-5126, Vol. 147, no 9, p. 7476-7484Article in journal (Refereed) Published
Abstract [en]

The development of machine learning models to predict the regioselectivity of C(sp3)-H functionalization reactions is reported. A data set for dioxirane oxidations was curated from the literature and used to generate a model to predict the regioselectivity of C-H oxidation. To assess whether smaller, intentionally designed data sets could provide accuracy on complex targets, a series of acquisition functions were developed to select the most informative molecules for the specific target. Active learning-based acquisition functions that leverage predicted reactivity and model uncertainty were found to outperform those based on molecular and site similarity alone. The use of acquisition functions for data set elaboration significantly reduced the number of data points needed to perform accurate prediction, and it was found that smaller, machine-designed data sets can give accurate predictions when larger, randomly selected data sets fail. Finally, the workflow was experimentally validated on five complex substrates and shown to be applicable to predicting the regioselectivity of arene C-H radical borylation. These studies provide a quantitative alternative to the intuitive extrapolation from “model substrates” that is frequently used to estimate reactivity on complex molecules.

Place, publisher, year, edition, pages
American Chemical Society (ACS) , 2025. Vol. 147, no 9, p. 7476-7484
National Category
Bioinformatics and Computational Biology Bioinformatics (Computational Biology) Theoretical Chemistry
Identifiers
URN: urn:nbn:se:kth:diva-361456DOI: 10.1021/jacs.4c15902ISI: 001437834600001PubMedID: 39982221Scopus ID: 2-s2.0-86000184966OAI: oai:DiVA.org:kth-361456DiVA, id: diva2:1945886
Note

QC 20250324

Available from: 2025-03-19 Created: 2025-03-19 Last updated: 2025-03-24Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textPubMedScopus

Authority records

Carretero Cerdán, Alba

Search in DiVA

By author/editor
Carretero Cerdán, Alba
By organisation
Theoretical Chemistry and Biology
In the same journal
Journal of the American Chemical Society
Bioinformatics and Computational BiologyBioinformatics (Computational Biology)Theoretical Chemistry

Search outside of DiVA

GoogleGoogle Scholar

doi
pubmed
urn-nbn

Altmetric score

doi
pubmed
urn-nbn
Total: 77 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf