Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Correlation-compressed direct-coupling analysis
Chinese Acad Sci, Inst Theoret Phys, Key Lab Theoret Phys, Beijing 100190, Peoples R China.;Univ Chinese Acad Sci, Sch Phys Sci, Beijing 100049, Peoples R China..
Chinese Acad Sci, Inst Theoret Phys, Key Lab Theoret Phys, Beijing 100190, Peoples R China.;Univ Chinese Acad Sci, Sch Phys Sci, Beijing 100049, Peoples R China.;Hunan Normal Univ, Synerget Innovat Ctr Quantum Effects & Applicat, Changsha 410081, Hunan, Peoples R China..
KTH, School of Computer Science and Communication (CSC), Computational Biology, CB. Aalto Univ, Dept Appl Phys, Aalto 00076, Finland.;Aalto Univ, Dept Comp Sci, Aalto 00076, Finland..
2018 (English)In: Physical review. E, ISSN 2470-0045, E-ISSN 2470-0053, Vol. 98, no 3, article id 032407Article in journal (Refereed) Published
Abstract [en]

Learning Ising or Potts models from data has become an important topic in statistical physics and computational biology, with applications to predictions of structural contacts in proteins and other areas of biological data analysis. The corresponding inference problems are challenging since the normalization constant (partition function) of the Ising or Potts distribution cannot be computed efficiently on large instances. Different ways to address this issue have resulted in a substantial amount of methodological literature. In this paper we investigate how these methods could be used on much larger data sets than studied previously. We focus on a central aspect, that in practice these inference problems are almost always severely under-sampled, and the operational result is almost always a small set of leading predictions. We therefore explore an approach where the data are prefiltered based on empirical correlations, which can be computed directly even for very large problems. Inference is only used on the much smaller instance in a subsequent step of the analysis. We show that in several relevant model classes such a combined approach gives results of almost the same quality as inference on the whole data set. It can therefore provide a potentially very large computational speedup at the price of only marginal decrease in prediction quality. We also show that the results on whole-genome epistatic couplings that were obtained in a recent computation-intensive study can be retrieved by our approach. The method of this paper hence opens up the possibility to learn parameters describing pairwise dependences among whole genomes in a computationally feasible and expedient manner.

Place, publisher, year, edition, pages
American Physical Society, 2018. Vol. 98, no 3, article id 032407
National Category
Physical Sciences
Identifiers
URN: urn:nbn:se:kth:diva-235441DOI: 10.1103/PhysRevE.98.032407ISI: 000444574600006Scopus ID: 2-s2.0-85053241828OAI: oai:DiVA.org:kth-235441DiVA, id: diva2:1251500
Note

QC 20180927

Available from: 2018-09-27 Created: 2018-09-27 Last updated: 2018-10-02Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Search in DiVA

By author/editor
Aurell, Erik
By organisation
Computational Biology, CB
In the same journal
Physical review. E
Physical Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 53 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf