Change search
ReferencesLink to record
Permanent link

Direct link
Improved contact prediction in proteins: Using pseudolikelihoods to infer Potts models
KTH, School of Computer Science and Communication (CSC). (Engineering Physics Program)
KTH, School of Computer Science and Communication (CSC), Computational Biology, CB.
Show others and affiliations
2013 (English)In: Physical Review E. Statistical, Nonlinear, and Soft Matter Physics, ISSN 1539-3755, Vol. 87, no 1, 012707- p.Article in journal (Refereed) Published
Abstract [en]

Spatially proximate amino acids in a protein tend to coevolve. A protein's three-dimensional (3D) structure hence leaves an echo of correlations in the evolutionary record. Reverse engineering 3D structures from such correlations is an open problem in structural biology, pursued with increasing vigor as more and more protein sequences continue to fill the data banks. Within this task lies a statistical inference problem, rooted in the following: correlation between two sites in a protein sequence can arise from firsthand interaction but can also be network-propagated via intermediate sites; observed correlation is not enough to guarantee proximity. To separate direct from indirect interactions is an instance of the general problem of inverse statistical mechanics, where the task is to learn model parameters (fields, couplings) from observables (magnetizations, correlations, samples) in large systems. In the context of protein sequences, the approach has been referred to as direct-coupling analysis. Here we show that the pseudolikelihood method, applied to 21-state Potts models describing the statistical properties of families of evolutionarily related proteins, significantly outperforms existing approaches to the direct-coupling analysis, the latter being based on standard mean-field techniques. This improved performance also relies on a modified score for the coupling strength. The results are verified using known crystal structures of specific sequence instances of various protein families. Code implementing the new method can be found at

Place, publisher, year, edition, pages
2013. Vol. 87, no 1, 012707- p.
Keyword [en]
3D Structure, Coupling strengths, Data bank, Indirect interactions, Large system, Mean-field, Model parameters, Protein family, Protein sequences, Pseudo-likelihood, Specific sequences, Statistical inference, Statistical properties, Structural biology, Three dimensional (3D) structures
National Category
Other Physics Topics Biological Sciences
URN: urn:nbn:se:kth:diva-118215DOI: 10.1103/PhysRevE.87.012707ISI: 000314151700005ScopusID: 2-s2.0-84872521100OAI: diva2:605059

QC 20130213

Available from: 2013-02-13 Created: 2013-02-13 Last updated: 2013-03-08Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textScopus

Search in DiVA

By author/editor
Ekeberg, MagnusLövkvist, CeciliaAurell, Erik
By organisation
School of Computer Science and Communication (CSC)Computational Biology, CBACCESS Linnaeus Centre
In the same journal
Physical Review E. Statistical, Nonlinear, and Soft Matter Physics
Other Physics TopicsBiological Sciences

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 79 hits
ReferencesLink to record
Permanent link

Direct link