Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Protein domain versatility scoring methods
KTH, School of Biotechnology (BIO).
2014 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Protein domains are modules of conserved protein structure, which is used in many types of studies in evolutionary proteomics and neighboring fields. The Pfam database contains a large set of protein domains constructed using Hidden Markov Models. There has been several attempts to define a metric for "versatility" or "promiscuity"  of protein domains. These methods have used different approaches towards finding and ranking domains that are present and abundant in a large variety of proteins. Here an attempt has been made to summarize and compare these methods. The methods has been applied to the latest version of the Pfam database and compared using the Spearman and Jaccard distance metrics. By enriching GO terms hypergeometrically, an attempt has been made to get a more objective way of identifying the similarities and differences between the methods in terms of biological functions and processes. The results show that the methods have a weak but significant similarity between each other. The GO terms enriched among the versatile domains show a bias towards central regulatory and metabolistic/catabolistic pathways and key enzymes in all kingdoms. One deviation from the enrichment results are the DVI method in Eukaryota, which show a bias towards membrance processes in the enriched terms. The NTRP method show the best rank correlation to the number of associated GO terms, but is not directly corrected for abundance. In conclusion it should be possible to modify one of the existing methods or create a new method for finding versatile domains that show a more consistent association with certain biological functions or processes.

Place, publisher, year, edition, pages
2014.
Keyword [en]
Protein domains, Domain versatility, Domain promiscuity, Bioinformatics, Pfam, Gene ontology
National Category
Engineering and Technology
Identifiers
URN: urn:nbn:se:kth:diva-163675OAI: oai:DiVA.org:kth-163675DiVA: diva2:801744
Supervisors
Examiners
Available from: 2015-04-14 Created: 2015-04-10 Last updated: 2015-09-17Bibliographically approved

Open Access in DiVA

No full text

By organisation
School of Biotechnology (BIO)
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 6 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf