Protein domain versatility scoring methods
Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Protein domains are modules of conserved protein structure, which is used in many types of studies in evolutionary proteomics and neighboring fields. The Pfam database contains a large set of protein domains constructed using Hidden Markov Models. There has been several attempts to define a metric for "versatility" or "promiscuity" of protein domains. These methods have used different approaches towards finding and ranking domains that are present and abundant in a large variety of proteins. Here an attempt has been made to summarize and compare these methods. The methods has been applied to the latest version of the Pfam database and compared using the Spearman and Jaccard distance metrics. By enriching GO terms hypergeometrically, an attempt has been made to get a more objective way of identifying the similarities and differences between the methods in terms of biological functions and processes. The results show that the methods have a weak but significant similarity between each other. The GO terms enriched among the versatile domains show a bias towards central regulatory and metabolistic/catabolistic pathways and key enzymes in all kingdoms. One deviation from the enrichment results are the DVI method in Eukaryota, which show a bias towards membrance processes in the enriched terms. The NTRP method show the best rank correlation to the number of associated GO terms, but is not directly corrected for abundance. In conclusion it should be possible to modify one of the existing methods or create a new method for finding versatile domains that show a more consistent association with certain biological functions or processes.
Place, publisher, year, edition, pages
Protein domains, Domain versatility, Domain promiscuity, Bioinformatics, Pfam, Gene ontology
Engineering and Technology
IdentifiersURN: urn:nbn:se:kth:diva-163675OAI: oai:DiVA.org:kth-163675DiVA: diva2:801744