Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A Secure and Reliable Platform for Storing and Processing Genomic Data on Hadoop
KTH, School of Information and Communication Technology (ICT).
2014 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Since 2007, the cost of sequencing a whole human genome has decreased by roughly half every 4 months. As of 2014, whole genome sequencing would cost only 1,000 dollars, and, as such, Next-Generation sequencing (NGS) machines are now a source of Big Data - the Illumina HiSeq X Ten can produce up to 20 PB of data per year. The dominant open-source platform for storing and processing Big Data is Apache Hadoop. However, Hadoop does not support user identity natively, and, as genomic data is sensitive data, there are no existing solutions for multi-tenancy that meet the needs of organizations to securely store and process genomic data.

In this thesis, we address the problem for how to enable Biobank users to securely store, access, and share genomic data in Hadoop. The proposed solution of the work is based on leveraging security support in the J2EE framework, and by constraining access to Hadoop through a web application built in this project. However, HTTP(S) limits the size of files that can be transferred into web applications, and we address the follow-on problem of how to enable users to efficiently, easily, and securely copy genomic data into Hadoop. Our prototype demonstrates how Hadoop can be secured to support sensitive data, and how Big Data can be securely transported over HTTP.

Place, publisher, year, edition, pages
2014. , 50 p.
Series
TRITA-ICT-EX, 2014:99
National Category
Computer and Information Science
Identifiers
URN: urn:nbn:se:kth:diva-177202OAI: oai:DiVA.org:kth-177202DiVA: diva2:871989
Examiners
Available from: 2015-12-08 Created: 2015-11-17 Last updated: 2017-08-03Bibliographically approved

Open Access in DiVA

fulltext(1306 kB)6 downloads
File information
File name FULLTEXT01.pdfFile size 1306 kBChecksum SHA-512
6c77f83e5e16eead957b36cfb22153064582c3a8ddae53814732aee788d3caa2120187c47a86fe21c08f47dc98f844e9ec1ce1fec216d9bb615a875cdee34385
Type fulltextMimetype application/pdf

By organisation
School of Information and Communication Technology (ICT)
Computer and Information Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 6 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 57 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf