Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
The use of grid computing to drive data-intensive genetic research
KTH, School of Biotechnology (BIO), Gene Technology.
KTH, School of Biotechnology (BIO), Gene Technology.
Show others and affiliations
2007 (English)In: European Journal of Human Genetics, ISSN 1018-4813, E-ISSN 1476-5438, Vol. 15, no 6, 694-702 p.Article in journal (Refereed) Published
Abstract [en]

In genetics, with increasing data sizes and more advanced algorithms for mining complex data, a point is reached where increased computational capacity or alternative solutions becomes unavoidable. Most contemporary methods for linkage analysis are based on the Lander-Green hidden Markov model (HMM), which scales exponentially with the number of pedigree members. In whole genome linkage analysis, genotype simulations become prohibitively time consuming to perform on single computers. We have developed 'Grid-Allegro', a Grid aware implementation of the Allegro software, by which several thousands of genotype simulations can be performed in parallel in short time. With temporary installations of the Allegro executable and datasets on remote nodes at submission, the need of predefined Grid run-time environments is circumvented. We evaluated the performance, efficiency and scalability of this implementation in a genome scan on Swedish multiplex Alzheimer's disease families. We demonstrate that 'Grid-Allegro' allows for the full exploitation of the features available in Allegro for genome-wide linkage. The implementation of existing bioinformatics applications on Grids (Distributed Computing) represent a cost-effective alternative for addressing highly resource-demanding and data-intensive bioinformatics task, compared to acquiring and setting up clusters of computational hardware in house (Parallel Computing), a resource not available to most geneticists today.

Place, publisher, year, edition, pages
2007. Vol. 15, no 6, 694-702 p.
Keyword [en]
grid, bioinformatics, genome-wide, linkage analysis, genotype simulation
National Category
Bioinformatics and Systems Biology
Identifiers
URN: urn:nbn:se:kth:diva-7797DOI: 10.1038/sj.ejhg.5201815ISI: 000246792100012Scopus ID: 2-s2.0-34249727262OAI: oai:DiVA.org:kth-7797DiVA: diva2:12926
Note
QC 20101004Available from: 2007-12-10 Created: 2007-12-10 Last updated: 2012-03-20Bibliographically approved
In thesis
1. Grid and High-Performance Computing for Applied Bioinformatics
Open this publication in new window or tab >>Grid and High-Performance Computing for Applied Bioinformatics
2007 (English)Doctoral thesis, comprehensive summary (Other scientific)
Abstract [en]

The beginning of the twenty-first century has been characterized by an explosion of biological information. The avalanche of data grows daily and arises as a consequence of advances in the fields of molecular biology and genomics and proteomics. The challenge for nowadays biologist lies in the de-codification of this huge and complex data, in order to achieve a better understanding of how our genes shape who we are, how our genome evolved, and how we function.

Without the annotation and data mining, the information provided by for example high throughput genomic sequencing projects is not very useful. Bioinformatics is the application of computer science and technology to the management and analysis of biological data, in an effort to address biological questions. The work presented in this thesis has focused on the use of Grid and High Performance Computing for solving computationally expensive bioinformatics tasks, where, due to the very large amount of available data and the complexity of the tasks, new solutions are required for efficient data analysis and interpretation.

Three major research topics are addressed; First, the use of grids for distributing the execution of sequence based proteomic analysis, its application in optimal epitope selection and in a proteome-wide effort to map the linear epitopes in the human proteome. Second, the application of grid technology in genetic association studies, which enabled the analysis of thousand of simulated genotypes, and finally the development and application of a economic based model for grid-job scheduling and resource administration.

The applications of the grid based technology developed in the present investigation, results in successfully tagging and linking chromosomes regions in Alzheimer disease, proteome-wide mapping of the linear epitopes, and the development of a Market-Based Resource Allocation in Grid for Scientific Applications.

Place, publisher, year, edition, pages
Stockholm: KTH, 2007
Series
Trita-BIO-Report, ISSN 1654-2312 ; 2007:9
Keyword
Grid computing, bioinformatics, genomics, proteomics
National Category
Bioinformatics (Computational Biology)
Identifiers
urn:nbn:se:kth:diva-4573 (URN)978-91-7178-782-8 (ISBN)
Public defence
2007-12-21, FD5, AlbaNova, oslagstullsbacken 21, Stockholm, 10:00
Opponent
Supervisors
Note
QC 20100622Available from: 2007-12-10 Created: 2007-12-10 Last updated: 2012-03-20Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textScopus

Search in DiVA

By author/editor
Andrade, JorgeAndersen, MalinOdeberg, Jacob
By organisation
Gene Technology
In the same journal
European Journal of Human Genetics
Bioinformatics and Systems Biology

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 118 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf