Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Genome-wide survey for biologically functional pseudogenes
KTH, School of Computer Science and Communication (CSC).ORCID iD: 0000-0002-5896-473X
KTH, School of Computer Science and Communication (CSC).
KTH, School of Computer Science and Communication (CSC).
2006 (English)In: PloS Computational Biology, ISSN 1553-734X, E-ISSN 1553-7358, Vol. 2, no 5, 358-369 p.Article in journal (Refereed) Published
Abstract [en]

According to current estimates there exist about 20,000 pseudogenes in a mammalian genome. The vast majority of these are disabled and nonfunctional copies of protein-coding genes which, therefore, evolve neutrally. Recent findings that a Makorin1 pseudogene, residing on mouse Chromosome 5, is, indeed, in vivo vital and also evolutionarily preserved, encouraged us to conduct a genome-wide survey for other functional pseudogenes in human, mouse, and chimpanzee. We identify to our knowledge the first examples of conserved pseudogenes common to human and mouse, originating from one duplication predating the human-mouse species split and having evolved as pseudogenes since the species split. Functionality is one possible way to explain the apparently contradictory properties of such pseudogene pairs, i. e., high conservation and ancient origin. The hypothesis of functionality is tested by comparing expression evidence and synteny of the candidates with proper test sets. The tests suggest potential biological function. Our candidate set includes a small set of long-lived pseudogenes whose unknown potential function is retained since before the human - mouse species split, and also a larger group of primate-specific ones found from human - chimpanzee searches. Two processed sequences are notable, their conservation since the human - mouse split being as high as most protein-coding genes; one is derived from the protein Ataxin 7- like 3 ( ATX7NL3), and one from the Spinocerebellar ataxia type 1 protein (ATX1). Our approach is comparative and can be applied to any pair of species. It is implemented by a semi-automated pipeline based on cross- species BLAST comparisons and maximum-likelihood phylogeny estimations. To separate pseudogenes from protein- coding genes, we use standard methods, utilizing in- frame disablements, as well as a probabilistic filter based on Ka/ Ks ratios.

Place, publisher, year, edition, pages
2006. Vol. 2, no 5, 358-369 p.
Keyword [en]
ataxin, ataxin 7, chimpanzee, chromosome 5, gene expression, genome, human, medical research, mouse, nonhuman, phylogeny, pseudogene, review, synteny
National Category
Bioinformatics and Systems Biology
Identifiers
URN: urn:nbn:se:kth:diva-8462DOI: 10.1371/journal.pcbi.0020046ISI: 000239493900005Scopus ID: 2-s2.0-33646941894OAI: oai:DiVA.org:kth-8462DiVA: diva2:13792
Note
QC 20100916Available from: 2008-05-16 Created: 2008-05-16 Last updated: 2017-12-14Bibliographically approved
In thesis
1. Taking advantage of phylogenetic trees in comparative genomics
Open this publication in new window or tab >>Taking advantage of phylogenetic trees in comparative genomics
2008 (English)Doctoral thesis, comprehensive summary (Other scientific)
Abstract [en]

Phylogenomics can be regarded as evolution and genomics in co-operation. Various kinds of evolutionary studies, gene family analysis among them, demand access to genome-scale datasets. But it is also clear that many genomics studies, such as assignment of gene function, are much improved by evolutionary analysis. The work leading to this thesis is a contribution to the phylogenomics field. We have used phylogenetic relationships between species in genome-scale searches for two intriguing genomic features, namely and A-to-I RNA editing. In the first case we used pairwise species comparisons, specifically human-mouse and human-chimpanzee, to infer existence of functional mammalian pseudogenes. In the second case we profited upon later years' rapid growth of the number of sequenced genomes, and used 17-species multiple sequence alignments. In both these studies we have used non-genomic data, gene expression data and synteny relations among these, to verify predictions. In the A-to-I editing project we used 454 sequencing for experimental verification.

We have further contributed a maximum a posteriori (MAP) method for fast and accurate dating analysis of speciations and other evolutionary events. This work follows recent years' trend of leaving the strict molecular clock when performing phylogenetic inference. We discretised the time interval from the leaves to the root in the tree, and used a dynamic programming (DP) algorithm to optimally factorise branch lengths into substitution rates and divergence times. We analysed two biological datasets and compared our results with recent MCMC-based methodologies. The dating point estimates that our method delivers were found to be of high quality while the gain in speed was dramatic.

Finally we applied the DP strategy in a new setting. This time we used a grid laid out on a species tree instead of on an interval. The discretisation gives together with speciation times a common timeframe for a gene tree and the corresponding species tree. This is the key to integration of the sequence evolution process and the gene evolution process. Out of several potential application areas we chose gene tree reconstruction. We performed genome-wide analysis of yeast gene families and found that our methodology performs very well.

Place, publisher, year, edition, pages
Stockholm: KTH, 2008. 53 p.
Series
Trita-CSC-A, ISSN 1653-5723 ; 2008:09
Keyword
Computer Science
National Category
Bioinformatics (Computational Biology)
Identifiers
urn:nbn:se:kth:diva-4757 (URN)978-91-7178-987-7 (ISBN)
Public defence
2008-06-04, FD05, Albanova, Roslagstullsbacken 21, Stockholm, 09:30
Opponent
Supervisors
Note
QC 20100923Available from: 2008-05-16 Created: 2008-05-16 Last updated: 2010-09-23Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textScopus

Search in DiVA

By author/editor
Svensson, ÖrjanArvestad, LarsLagergren, Jens
By organisation
School of Computer Science and Communication (CSC)
In the same journal
PloS Computational Biology
Bioinformatics and Systems Biology

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 119 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf