Change search
ReferencesLink to record
Permanent link

Direct link
Protein domain versatility scoring methods
KTH, School of Biotechnology (BIO).
2014 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Protein domains are modules of conserved protein structure, which is used in many types of studies in evolutionary proteomics and neighboring fields. The Pfam database contains a large set of protein domains constructed using Hidden Markov Models. There has been several attempts to define a metric for "versatility" or "promiscuity"  of protein domains. These methods have used different approaches towards finding and ranking domains that are present and abundant in a large variety of proteins. Here an attempt has been made to summarize and compare these methods. The methods has been applied to the latest version of the Pfam database and compared using the Spearman and Jaccard distance metrics. By enriching GO terms hypergeometrically, an attempt has been made to get a more objective way of identifying the similarities and differences between the methods in terms of biological functions and processes. The results show that the methods have a weak but significant similarity between each other. The GO terms enriched among the versatile domains show a bias towards central regulatory and metabolistic/catabolistic pathways and key enzymes in all kingdoms. One deviation from the enrichment results are the DVI method in Eukaryota, which show a bias towards membrance processes in the enriched terms. The NTRP method show the best rank correlation to the number of associated GO terms, but is not directly corrected for abundance. In conclusion it should be possible to modify one of the existing methods or create a new method for finding versatile domains that show a more consistent association with certain biological functions or processes.

Place, publisher, year, edition, pages
Keyword [en]
Protein domains, Domain versatility, Domain promiscuity, Bioinformatics, Pfam, Gene ontology
National Category
Engineering and Technology
URN: urn:nbn:se:kth:diva-163675OAI: diva2:801744
Available from: 2015-04-14 Created: 2015-04-10 Last updated: 2015-09-17Bibliographically approved

Open Access in DiVA

No full text

By organisation
School of Biotechnology (BIO)
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 6 hits
ReferencesLink to record
Permanent link

Direct link