Disease Gene Candidate Discovery by Genome Sequencing: Improved Variant Filtering Tools.
Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Genome sequencing is a powerful tool promising to aid many aspects of human health, among others through disease gene candidate discovery in Mendelian disorders. But no two human beings are alike. Sequencing any individual returns tens of thousands of variants in coding regions, when compared to a reference genome. Variant filtering helps us understand the causative of disease. Variant databases help us remove common variation, to leave only rare and novel variants for further investigation. This is a recent technique with much room for improvement. Local background variation is extensive, and confounds disease gene discovery unless filtered. In this thesis, a set of user friendly tools to build human background variation frequency database have been developed. The key features include variant tagging to allow easy exclusion of subgroups and database merging to share data between sequencing centers. Moreover, we also developed a model to enhance the current variant effect prediction result. The model was developed using supervised learning paradigm, multilayer perceptron learning, to combine the result from several variant effect predictors. The developed combiner showed better accuracy in all ROC regions than individual effect predictors except at false positive rate between 0.65 and 0.75.
Place, publisher, year, edition, pages
Trita-CSC-E, ISSN 1653-5715 ; 2012:097
IdentifiersURN: urn:nbn:se:kth:diva-130988OAI: oai:DiVA.org:kth-130988DiVA: diva2:654434
Master of Science - Computational and Systems Biology